呵呵,我认为这是好东西,怎么没人顶呢
M is for MACRO: ASCII to Unicode strings
Ernest Murphy ernie@surfree.com
Get the L.inc file
Abstract:
------------------------------------------------------------------------------
Documentation on several compile-time macros is lacking or unclear from the
usual sources. This is a report on some macro functions I have been using
as of late. Demonstrates a macro to convert ASCII string to Unicode at compile
time.
Lately I've been doing lots of work with COM methods from assembly. Scattered
inside these methods as a mixture of our well known friend the ASCII text
string, and our new arch-nemesis, the Unicode wide string. As MASM has no built
in functions to define Unicode strings, I was constrained to define ASCII
strings, then use a buffer and a MultiByteToWideChar API call to get Unicode
strings. Inefficient, annoying, dreadful, but a quick and dirty way to keep
going where I wanted to go.
I spent a recent weekend just playing around in MASM with wide strings, and
made a happy discovery. One can define a Unicode string like so:
wszSomeString WORD "H","E","L","L","O"," ","W","O","R","L","D",0
The way Unicode is defined an ASCII character maps to the same Unicode
character, with just the size of the data being different. You can try this out
yourself in a simple test app by using MessageBoxW instead of just MessageBox.
MessageBox equates with MessageBoxA, the ASCII version of this API. MessageBoxW
expects Unicode strings. Just add a proto def for it in your code (MASM32
includes the correct library, but only defines ACSII protos).
Now that's useful! But it's damn annoying too... very prone to errors, hard
to read, even harder to type. It just cries out for a macro to do the conversion.
How shall we proceed?
Well, when doing something new, I like to do it the same way as something old
or already known. When writing in C++, one defines a string like this:
wszSomeString wchar L"MyString"
It would be great if we could make our macro function look the same. Well,
we can't. MASM first, wants a macro functions parameters inside parenthesis, and
it also wants text enclosed in angle brackets. So the best we can do is:
wszSomeString wchar L(<MyString>)
Still, not half bad. By now you should be used to doing all your TEXTEQU's
like this anyway, with those surrounding angle brackets. You're not doing this?
This is a great way to define constants in your own .inc files because you only
make .data entries for the constants you use. It works like this: Inside your
.inc file, you define a text constant like this:
sRadiusOfEarthInMiles TEXTEQU <3959>
Then, inside your source code .data area you use it like:
RadiusOfEarthInMiles DWORD sRadiusOfEarthInMiles
Which the compiler will re-arrange into:
RadiusOfEarthInMiles DWORD 3959
Hey! That's what we wanted, and we didn't have to do any EXTERNs or anything.
Simple, bulletproof. I use this a lot to define GUIDS (a huge structure of
differently typed numbers).
So... back to ASCII to Unicode. Let's make a simple text macro to surround
ASCII characters with quotes so we can equate them to Unicode Strings. "FORC"
is a macro command that loops through once for each letter in a text equate.
Whoops... we send it text, will that work? Almost, if we surround our sText
variable with angle brackets. AND... add an "&" in front of the variable; this
directs MASM to look up the value, not use the value's name.
wchar TYPEDEF DWORD
L MACRO sText:REQ
LOCAL str, chr
FORC chr, <&sText>
str CATSTR str, <">, <&chr>, <"> ; surround each char with
; quotes, and add trailing
; comma for the next character
ENDM
str CATSTR str, <,0> ; almost done, just add
; the terminating zero
EXITM str
ENDM
Simple, direct, and has a BIG PROBLEM. If you compile:
wszSomeString wchar L(<Hello World>)
That works fine. But what is we try:
wszSomeString wchar L(<Hello World!>)
Whoops... an error. Easy you say, knowing that an exclamation point has a
special meaning in text macros. It means take the next character as a literal,
just in case you wanted to include angle brackets inside your text equate (and
you will at times). Well... just doing this also fails:
wszSomeString wchar L(<Hello World!!>)
Why would that fail? It all works fine for a while, the correct string gets
passed to the macro, and eventually the "!" character is parsed. Then trouble
happens when we try to do the CATSTR, because <&chr> will expand to: <!> And
that's an imbalance equate. We need an odd number of !'s to get the CATSTR to
work, but need to send an even number in the macro function invoke line...
No matter what you do, it ain't gonna work. No problem if you never use an
exclamation point, but dang, I sure want to. So...
So we are left with doing some sort of alias for "!". C++ uses a backslash to
do this, so we will too. Let's define "\|" as an exclamation point. All we have
to do is compare the chr value in the loop to "\" and we can... but wait a sec.
Compare chr? To what? The implementation of IF in MASM is pretty lame. There
is no way it can do that comparison. It just wants numbers. But wait...
In the "good old days," MASM was a real product and sold on shelves and
had... REAL BOOKS. (MS still sells MASM direct to MSVC and Studio owners for 20
bucks, I do not if it still ships with books. Well worth the call). Inside
the Programmer's Guide to MASM come a few more macro definitions you will find
useful. These are:
The Directive Grants Assembly If
===================================================================
IF {expression} {expression} is true
IFE {expression} {expression} is false
IFDEF {name} name has been previously defined
IFNDEF {name} name has not been previously defined
IFB {argument} {argument} is blank
IFNB {argument} {argument} is not blank
IFIDN[I] {arg 1}, {arg2} {arg 1} equals {arg2}
IFDIF[I] {arg 1}, {arg2} {arg 1} does not equal {arg2}
the optional [I] in IFIDN and IFDIF make comparisons
insensitive to differences in case
Wow. Some of these look very useful. IF looks good, except after trying it I
can tell you it wants numeric only args. The expressions in IF are of the form
"IF num1 EQ num2" where num1 & num2 are numeric constants. IFIDN (the best
acronym I can come up for this command is "IF IS DIFFERENT NOT," I hate
senseless command names) actually does what we want, compare two text values.
Each value must be a text string or a text equate variable.
Since we're looping through character by character, we need some sort of state
information from loop to loop to remember we're processing multi-character
information. Here again, we can use text variables to do this for us. Let's try
a revised macro:
L MACRO sText:REQ
LOCAL str, chr, flag
flag TEXTEQU < >
FORC chr, <&sText>
IFDIF flag, <\> ; if == we're processing a normal char
IFIDN <&chr>, <\> ; see if char is a backslash
flag CATSTR <\> ; and remember it in flag
ELSE
str CATSTR str, <">, <&chr>, <",>
; just add the character normally
ENDIF
ELSE ; !=, we're processing a command
str CATSTR str, <"!!",> ; add the exclamation point
flag CATSTR < > ; clear the flag
ENDIF
ENDM
str CATSTR str, <0>
EXITM str
ENDM
Well, this works a little better. We get exclamation points back, but we lost
the backslash at the same time. We need to check the 2nd character! Let's fix that.
L MACRO sText:REQ
LOCAL str, chr, flag
flag TEXTEQU < >
FORC chr, <&sText>
IFDIF flag, <\>
IFIDN <&chr>, <\>
flag CATSTR <\>
ELSE
str CATSTR str, <">, <&chr>, <",>
ENDIF
ELSE
IFIDN <&chr>, <|> ; check the 2nd command char
str CATSTR str, <"!!",> ; add the exclamation point
ELSE
str CATSTR str, <">, <&chr>, <",>
ENDIF
flag CATSTR < >
ENDIF
ENDM
str CATSTR str, <0>
EXITM str
ENDM
Now we're getting somewhere... but not too far. MASM has a single line text
limit of 256 characters. This means we can have a string of 57 characters
maximum before this technique bombs out on us. One quick fix would be to take
out the automatic trailing zero, then we can define lots of strings in a row,
and they all become one string until that final terminating zero is met. Let's
add that:
L MACRO sText:REQ
LOCAL str, chr, flag
str CATSTR < > ; define the initial str
flag TEXTEQU < >
FORC chr, <&sText>
IFDIF str, < >
str CATSTR str, <,> ; add a training comma ONLY to
; non-null strings
ENDIF
IFDIF flag, <\>
IFIDN <&chr>, <\>
flag CATSTR <\>
ELSE
str CATSTR str, <">, <&chr>, <"> ; no trailing comma
ENDIF
ELSE
IFIDN <&chr>, <|>
str CATSTR str, <"!!"> ; no trailing comma
ELSE
str CATSTR str, <">, <&chr>, <"> ; no trailing comma
ENDIF
flag CATSTR < >
ENDIF
ENDM
; no trailing zero here either
EXITM str
ENDM
Pretty neat, just use the macro function with a trailing zero like this:
L(<Hello World>),0
and we get a single string, or put them together for longer strings. But... as long
as we made all this bother just so we can insert an exclamation point, why not
keep going and make this function really work for us?
Let's add two more commands: a newline (\n) command, and a terminating zero
(\0) command. We'll keep things simple by not checking the trailing zero is at
the end. And as we thing of more functions, these get easy to add.
Before we launch into this, one thing has to be worked out, since there is no
matching ELSEIF to IFDIF. We need and "ELSE" clause to make non-command
characters print least we loose our backslash character. To do this, let's make
the flag variable do something else: in the command code arm flag has already
done it's job of remembering the previous character. So we can re-define it.
Here is the final macro:
L MACRO sText:REQ
LOCAL str, chr, flag
;; generates a wide character string
;; usage: sztext wchar L(<Hello World \|\|\0>)
;; generates: sztext WORD "H","e","l","l","o","," ",
"W","o","r","l","d","!","!",0
;; max string length is 57 chars (MASM line length limit)
;; use multiple non-zero term strings in sequence for longer strings
;; (zero term the last of course)
str TEXTEQU < >
flag TEXTEQU <.>
FORC chr, <&sText>
IFDIF flag, <\>
IFDIF str, < >
str CATSTR str, <,>
ENDIF
ENDIF
IFDIF flag, <\>
IFIDN <&chr>, <\>
flag CATSTR <\>
ELSE
str CATSTR str, <">, <&chr>, <">
ENDIF
ELSE
flag CATSTR <X>
;; check for a pipe (exclamation point)
IFIDN <&chr>, <|>
str CATSTR str, <"!!">
flag CATSTR < >
ENDIF
;; check for an "n" (new line)
IFIDN <&chr>, <n>
str CATSTR str, <13,10>
flag CATSTR < >
ENDIF
;; check for an "0" (terminating zero)
IFIDN <&chr>, <0>
str CATSTR str, <0>
flag CATSTR < >
ENDIF
;; now check if no special chars were issued
IFIDN <&flag>, <X>
str CATSTR str, <">, <&chr>, <">
ENDIF
flag CATSTR < >
ENDIF
ENDM
EXITM str
ENDM
Well, here it is. Works very good, the only drawback is the HUGE amount of
code (over 1,150 just to translate "Hello World!") added to your listing file.
We'll just have to live with that, once can't have everything. The only way
around that would be to compile a pre-processor, which is a messy affair anyway.