首页
社区
课程
招聘
M is for MACRO: ASCII to Unicode strings
发表于: 2006-4-7 12:47 5444

M is for MACRO: ASCII to Unicode strings

2006-4-7 12:47
5444
呵呵,我认为这是好东西,怎么没人顶呢
M is for MACRO: ASCII to Unicode strings
Ernest Murphy ernie@surfree.com

Get the L.inc file

Abstract:
------------------------------------------------------------------------------  
Documentation on several compile-time macros is lacking or unclear from the  
usual sources. This is a report on some macro functions I have been using  
as of late. Demonstrates a macro to convert ASCII string to Unicode at compile
time.

Introduction:
------------------------------------------------------------------------------  

Lately I've been doing lots of work with COM methods from assembly. Scattered
inside these methods as a mixture of our well known friend the ASCII text  
string, and our new arch-nemesis, the Unicode wide string. As MASM has no built
in functions to define Unicode strings, I was constrained to define ASCII  
strings, then use a buffer and a MultiByteToWideChar API call to get Unicode  
strings. Inefficient, annoying, dreadful, but a quick and dirty way to keep  
going where I wanted to go.

I spent a recent weekend just playing around in MASM with wide strings, and
made a happy discovery. One can define a Unicode string like so:

wszSomeString WORD "H","E","L","L","O"," ","W","O","R","L","D",0
The way Unicode is defined an ASCII character maps to the same Unicode  
character, with just the size of the data being different. You can try this out
yourself in a simple test app by using MessageBoxW instead of just MessageBox.  
MessageBox equates with MessageBoxA, the ASCII version of this API. MessageBoxW
expects Unicode strings. Just add a proto def for it in your code (MASM32  
includes the correct library, but only defines ACSII protos).
Now that's useful! But it's damn annoying too... very prone to errors, hard  
to read, even harder to type. It just cries out for a macro to do the conversion.
How shall we proceed?

Well, when doing something new, I like to do it the same way as something old
or already known. When writing in C++, one defines a string like this:

wszSomeString wchar L"MyString"
It would be great if we could make our macro function look the same. Well,  
we can't. MASM first, wants a macro functions parameters inside parenthesis, and
it also wants text enclosed in angle brackets. So the best we can do is:
wszSomeString wchar L(<MyString>)
Still, not half bad. By now you should be used to doing all your TEXTEQU's  
like this anyway, with those surrounding angle brackets. You're not doing this?  
This is a great way to define constants in your own .inc files because you only  
make .data entries for the constants you use. It works like this: Inside your  
.inc file, you define a text constant like this:
sRadiusOfEarthInMiles TEXTEQU <3959>
Then, inside your source code .data area you use it like:  
RadiusOfEarthInMiles DWORD sRadiusOfEarthInMiles
Which the compiler will re-arrange into:
RadiusOfEarthInMiles DWORD 3959
Hey! That's what we wanted, and we didn't have to do any EXTERNs or anything.
Simple, bulletproof. I use this a lot to define GUIDS (a huge structure of  
differently typed numbers).
So... back to ASCII to Unicode. Let's make a simple text macro to surround
ASCII characters with quotes so we can equate them to Unicode Strings. "FORC"
is a macro command that loops through once for each letter in a text equate.  
Whoops... we send it text, will that work? Almost, if we surround our sText
variable with angle brackets. AND... add an "&" in front of the variable; this  
directs MASM to look up the value, not use the value's name.

wchar TYPEDEF DWORD
L MACRO sText:REQ
LOCAL str, chr
FORC chr, <&sText>  
   str CATSTR str, <">, <&chr>, <"> ; surround each char with  
   ; quotes, and add trailing  
   ; comma for the next character
ENDM
str CATSTR str, <,0> ; almost done, just add  
; the terminating zero
EXITM str
ENDM

Simple, direct, and has a BIG PROBLEM. If you compile:
wszSomeString wchar L(<Hello World>)
That works fine. But what is we try:
wszSomeString wchar L(<Hello World!>)
Whoops... an error. Easy you say, knowing that an exclamation point has a  
special meaning in text macros. It means take the next character as a literal,  
just in case you wanted to include angle brackets inside your text equate (and  
you will at times). Well... just doing this also fails:
wszSomeString wchar L(<Hello World!!>)
Why would that fail? It all works fine for a while, the correct string gets  
passed to the macro, and eventually the "!" character is parsed. Then trouble  
happens when we try to do the CATSTR, because <&chr> will expand to: <!> And
that's an imbalance equate. We need an odd number of !'s to get the CATSTR to  
work, but need to send an even number in the macro function invoke line...  
No matter what you do, it ain't gonna work. No problem if you never use an  
exclamation point, but dang, I sure want to. So...

So we are left with doing some sort of alias for "!". C++ uses a backslash to
do this, so we will too. Let's define "\|" as an exclamation point. All we have  
to do is compare the chr value in the loop to "\" and we can... but wait a sec.

Compare chr? To what? The implementation of IF in MASM is pretty lame. There
is no way it can do that comparison. It just wants numbers. But wait...

In the "good old days," MASM was a real product and sold on shelves and  
had... REAL BOOKS. (MS still sells MASM direct to MSVC and Studio owners for 20  
bucks, I do not if it still ships with books. Well worth the call). Inside  
the Programmer's Guide to MASM come a few more macro definitions you will find  
useful. These are:

The Directive               Grants Assembly If
===================================================================
IF {expression}             {expression} is true
IFE {expression}            {expression} is false
IFDEF {name}                name has been previously defined
IFNDEF {name}               name has not been previously defined
IFB {argument}              {argument} is blank
IFNB {argument}             {argument} is not blank
IFIDN[I] {arg 1}, {arg2}    {arg 1} equals {arg2}
IFDIF[I] {arg 1}, {arg2}    {arg 1} does not equal {arg2}
the optional [I] in IFIDN and IFDIF make comparisons  
insensitive to differences in case

Wow. Some of these look very useful. IF looks good, except after trying it I
can tell you it wants numeric only args. The expressions in IF are of the form  
"IF num1 EQ num2" where num1 & num2 are numeric constants. IFIDN (the best  
acronym I can come up for this command is "IF IS DIFFERENT NOT," I hate  
senseless command names) actually does what we want, compare two text values.  
Each value must be a text string or a text equate variable.
Since we're looping through character by character, we need some sort of state  
information from loop to loop to remember we're processing multi-character  
information. Here again, we can use text variables to do this for us. Let's try
a revised macro:

L MACRO sText:REQ
LOCAL str, chr, flag
flag TEXTEQU < >
FORC chr, <&sText>  
   IFDIF flag, <\> ; if == we're processing a normal char
      IFIDN <&chr>, <\> ; see if char is a backslash
         flag CATSTR <\> ; and remember it in flag
      ELSE
         str CATSTR str, <">, <&chr>, <",>
         ; just add the character normally
      ENDIF
   ELSE ; !=, we're processing a command
   str CATSTR str, <"!!",> ; add the exclamation point
   flag CATSTR < > ; clear the flag
ENDIF
ENDM
str CATSTR str, <0>  
EXITM str
ENDM
Well, this works a little better. We get exclamation points back, but we lost
the backslash at the same time. We need to check the 2nd character! Let's fix that.  
L MACRO sText:REQ
LOCAL str, chr, flag
flag TEXTEQU < >
FORC chr, <&sText>
   IFDIF flag, <\>  
      IFIDN <&chr>, <\>  
         flag CATSTR <\>  
      ELSE
         str CATSTR str, <">, <&chr>, <",>
      ENDIF
   ELSE  
      IFIDN <&chr>, <|> ; check the 2nd command char
         str CATSTR str, <"!!",> ; add the exclamation point
      ELSE
         str CATSTR str, <">, <&chr>, <",>
      ENDIF
      flag CATSTR < >  
   ENDIF
ENDM
str CATSTR str, <0>  
EXITM str
ENDM
Now we're getting somewhere... but not too far. MASM has a single line text  
limit of 256 characters. This means we can have a string of 57 characters  
maximum before this technique bombs out on us. One quick fix would be to take
out the automatic trailing zero, then we can define lots of strings in a row,
and they all become one string until that final terminating zero is met. Let's
add that:
L MACRO sText:REQ
LOCAL str, chr, flag
str CATSTR < > ; define the initial str
flag TEXTEQU < >
FORC chr, <&sText>  
   IFDIF str, < >
      str CATSTR str, <,> ; add a training comma ONLY to  
      ; non-null strings
   ENDIF
   IFDIF flag, <\>  
      IFIDN <&chr>, <\>  
         flag CATSTR <\>  
      ELSE
         str CATSTR str, <">, <&chr>, <"> ; no trailing comma
      ENDIF
   ELSE  
      IFIDN <&chr>, <|>
         str CATSTR str, <"!!"> ; no trailing comma
      ELSE
         str CATSTR str, <">, <&chr>, <"> ; no trailing comma
      ENDIF
      flag CATSTR < >  
   ENDIF
ENDM
; no trailing zero here either
EXITM str
ENDM
Pretty neat, just use the macro function with a trailing zero like this:
L(<Hello World>),0
and we get a single string, or put them together for longer strings. But... as long
as we made all this bother just so we can insert an exclamation point, why not  
keep going and make this function really work for us?
Let's add two more commands: a newline (\n) command, and a terminating zero  
(\0) command. We'll keep things simple by not checking the trailing zero is at  
the end. And as we thing of more functions, these get easy to add.

Before we launch into this, one thing has to be worked out, since there is no
matching ELSEIF to IFDIF. We need and "ELSE" clause to make non-command  
characters print least we loose our backslash character. To do this, let's make
the flag variable do something else: in the command code arm flag has already
done it's job of remembering the previous character. So we can re-define it.  

Here is the final macro:

L MACRO sText:REQ
LOCAL str, chr, flag
;; generates a wide character string  
;; usage: sztext wchar L(<Hello World \|\|\0>)
;; generates: sztext WORD "H","e","l","l","o","," ",
                          "W","o","r","l","d","!","!",0
;; max string length is 57 chars (MASM line length limit)
;; use multiple non-zero term strings in sequence for longer strings
;; (zero term the last of course)
str TEXTEQU < >
flag TEXTEQU <.>
FORC chr, <&sText>  
   IFDIF flag, <\>
      IFDIF str, < >
         str CATSTR str, <,>
      ENDIF
   ENDIF
   IFDIF flag, <\>
      IFIDN <&chr>, <\>
         flag CATSTR <\>
      ELSE
         str CATSTR str, <">, <&chr>, <">
      ENDIF
   ELSE
      flag CATSTR <X>  
      ;; check for a pipe (exclamation point)
      IFIDN <&chr>, <|>
         str CATSTR str, <"!!">
         flag CATSTR < >
      ENDIF
      ;; check for an "n" (new line)
      IFIDN <&chr>, <n>
         str CATSTR str, <13,10>
         flag CATSTR < >
      ENDIF
      ;; check for an "0" (terminating zero)
      IFIDN <&chr>, <0>
         str CATSTR str, <0>
         flag CATSTR < >
      ENDIF
      ;; now check if no special chars were issued
      IFIDN <&flag>, <X>  
         str CATSTR str, <">, <&chr>, <">
      ENDIF
      flag CATSTR < >
   ENDIF
ENDM
EXITM str
ENDM
Well, here it is. Works very good, the only drawback is the HUGE amount of
code (over 1,150 just to translate "Hello World!") added to your listing file.  
We'll just have to live with that, once can't have everything. The only way  
around that would be to compile a pre-processor, which is a messy affair anyway.

[课程]Android-CTF解题方法汇总!

收藏
免费 0
支持
分享
最新回复 (2)
雪    币: 291
活跃值: (213)
能力值: ( LV12,RANK:210 )
在线值:
发帖
回帖
粉丝
2
不支持中文的
2006-4-8 15:03
0
雪    币: 308
活跃值: (362)
能力值: ( LV12,RANK:370 )
在线值:
发帖
回帖
粉丝
3
应该是支持的,刚才查了一下,这个贴里面就是用这个的!
http://bbs.pediy.com/showthread.php?threadid=21988
2006-4-8 15:14
0
游客
登录 | 注册 方可回帖
返回
//