Some time ago i started to write series of blog posts about assembly programming for x86_64. You can find it by asm tag. Unfortunately i was busy last time and there were not new post, so today I continue to write posts about assembly, and will try to do it every week.
今天我们要看字符串和一些字符串操作。我们仍然使用NASM汇编和Linux X86_64。
Today we will look at strings and some strings operations. We still use nasm assembler, and linux x86_64.
Of course when we talk about assembly programming language we can’t talk about string data type, actually we’re dealing with array of bytes. Let’s try to write simple example, we will define string data and try to reverse and write result to stdout. This tasks seems pretty simple and popular when we start to learn new programming language. Let’s look on implementation.
First of all, I define initialized data. It will be placed in data section (You can read about sections in part):
这里我们可以看到四个常数:
你可以在这里找到系统调用列表。也有定义:
接下来,我们为缓冲区定义bss部分,在这里我们将放置反向字符串:
Here we can see four constants:
syscall list you can find - here. Also there defined:
Next we define bss section for our buffer, where we will put reversed string:
好的,我们有一些数据和缓冲区用来存放结果,现在我们可以定义代码的文本段。让我们从主启动程序开始:
Ok we have some data and buffer where to put result, now we can define text section for code. Let’s start from main _start routine:
Here are some new things. Let’s see how it works: First of all we put INPUT address to si register at line 2, as we did for writing to stdout and write zeros to rcx register, it will be counter for calculating length of our string. At line 4 we can see cld operator. It resets df flag to zero. We need in it because when we will calculate length of string, we will go through symbols of this string, and if df flag will be 0, we will handle symbols of string from left to right. Next we call calculateStrLength function. I missed line 5 with mov rdi, $ + 15 instruction, i will tell about it little later. And now let’s look at calculateStrLength implementation:
As you can understand by it’s name, it just calculates length of INPUT string and store result in rcx register. First of all we check that rsi register doesn’t point to zero, if so this is the end of string and we can exit from function. Next is lodsb instruction. It’s simple, it just put 1 byte to al register (low part of 16 bit ax) and changes rsi pointer. As we executed cld instruction, lodsb everytime will move rsi to one byte from left to right, so we will move by string symbols. After it we push rax value to stack, now it contains symbol from our string (lodsb puts byte from si to al, al is low 8 bit of rax). Why we did push symbol to stack? You must remember how stack works, it works by principle LIFO (last input, first output). It is very good for us. We will take first symbol from si, push it to stack, than second and so on. So there will be last symbol of string at the stack top. Than we just pop symbol by symbol from stack and write to OUTPUT buffer. After it we increment our counter (rcx) and loop again to the start of routine.
Ok, we pushed all symbols from string to stack, now we can jump to exitFromRoutine return to _start there. How to do it? We have ret instruction for this. But if code will be like this:
It will not work. Why? It is tricky. Remember we called calculateStrLength at _start. What occurs when we call a function? First of all function’s parameters pushes to stack from right to left. After it return address pushes to stack. So function will know where to return after end of execution. But look at calculateStrLength, we pushed symbols from our string to stack and now there is no return address of stack top and function doesn’t know where to return. How to be with it. Now we must take a look to the weird instruction before call:
So we have position of mov rdi, $ + 15, but why we add 15 here? Look, we need to know position of next line after calculateStrLength. Let’s open our file with objdump util:
We can see here that line 12 (our mov rdi, $ + 15) takes 10 bytes and function call at line 16 - 5 bytes, so it takes 15 bytes. That’s why our return address will be mov rdi, $ + 15. Now we can push return address from rdi to stack and return from function:
Now we return to start. After call of the calculateStrLength we write zeros to rax and rdi and jump to reverseStr label. It’s implementation is following:
Here we check our counter which is length of string and if it is zero we wrote all symbols to buffer and can print it. After checking counter we pop from stack to rax register first symbol and write it to OUTPUT buffer. We add rdi because in other way we’ll write symbol to first byte of buffer. After this we increase rdi for moving next by OUTPUT buffer, decrease length counter and jump to the start of label.
After execution of reverseStr we have reversed string in OUTPUT buffer and can write result to stdout with new line:
从我们的程序退出:
and exit from the our program:
就这些,现在我们可以编译我们的程序:
That’s all, now we can compile our program with:
并运行它:
and run it:
当然,还有许多其他的字符串/字节操作说明:
Of course there are many other instructions for string/bytes manipulations: