首页
社区
课程
招聘
[翻译]Say hello to x86_64 Assembly [part 4]
发表于: 2020-1-13 18:38 7240

[翻译]Say hello to x86_64 Assembly [part 4]

2020-1-13 18:38
7240

最近在学习x64汇编,在github上面找到了一点学习资料,入门级别的,因为想细致的学习一下,所以顺便久把作者的内容都翻译了一下,也不知道自己翻译的是否合适,请大家看看有问题的地方请批评指正.第一次做翻译,做的不好请大家原谅,

作者原文

不久前,我开始写一系列关于x86_64汇编编程的博客文章。你可以通过asm标签找到它。不幸的是,我上次很忙,没有新的帖子,所以今天我继续写关于大会的帖子,并将努力做到每周。

Some time ago i started to write series of blog posts about assembly programming for x86_64. You can find it by asm tag. Unfortunately i was busy last time and there were not new post, so today I continue to write posts about assembly, and will try to do it every week.

今天我们要看字符串和一些字符串操作。我们仍然使用NASM汇编和Linux X86_64。

Today we will look at strings and some strings operations. We still use nasm assembler, and linux x86_64.

Reverse string

当然,当我们谈论汇编语言时,我们不能谈论字符串数据类型,实际上我们是在处理字节数组。让我们尝试编写一个简单的示例,我们将定义字符串数据,并尝试将结果反向并通过stdout输出。当我们开始学习新的编程语言时,这些任务看起来非常简单和流行。让我们看看实现。
首先,我定义初始化数据。它将放在数据部分(您可以阅读有关部分的内容):

Of course when we talk about assembly programming language we can’t talk about string data type, actually we’re dealing with array of bytes. Let’s try to write simple example, we will define string data and try to reverse and write result to stdout. This tasks seems pretty simple and popular when we start to learn new programming language. Let’s look on implementation.

First of all, I define initialized data. It will be placed in data section (You can read about sections in part):

这里我们可以看到四个常数:

你可以在这里找到系统调用列表。也有定义:

接下来,我们为缓冲区定义bss部分,在这里我们将放置反向字符串:

Here we can see four constants:

syscall list you can find - here. Also there defined:

Next we define bss section for our buffer, where we will put reversed string:

好的,我们有一些数据和缓冲区用来存放结果,现在我们可以定义代码的文本段。让我们从主启动程序开始:

Ok we have some data and buffer where to put result, now we can define text section for code. Let’s start from main _start routine:

这里有一些新东西。让我们看看它是如何工作的:首先,我们把INPUT的地址放在第2行的rsi寄存器中,就像我们写stdout和写零到rcx寄存器一样,它将是计算字符串长度的计数器。在第四行我们可以看到cld操作符。它将df flag重置为零。因为当我们计算字符串的长度时需要它,我们将遍历该字符串的内容元素,如果df flag为0,我们将从左到右处理字符串的符号。接下来我们调用calculateStrLength函数。我略过了有mov rdi指令的第5行,$+15指令,我稍后会告诉您。现在让我们看一下calculateStrength的实现:

Here are some new things. Let’s see how it works: First of all we put INPUT address to si register at line 2, as we did for writing to stdout and write zeros to rcx register, it will be counter for calculating length of our string. At line 4 we can see cld operator. It resets df flag to zero. We need in it because when we will calculate length of string, we will go through symbols of this string, and if df flag will be 0, we will handle symbols of string from left to right. Next we call calculateStrLength function. I missed line 5 with mov rdi, $ + 15 instruction, i will tell about it little later. And now let’s look at calculateStrLength implementation:

正如您可以通过它的名称理解他的含义,它只计算输入字符串的长度并将结果存储在rcx寄存器中。首先,我们检查RSI寄存器不指向零,如果是,这是字符串的结尾,我们可以从函数中退出。接下来是lodsb指令。很简单, 他只是把1字节放到al寄存器(16位ax的低位)并更改rsi指针。当我们执行cld指令时,lodsb每次都将rsi从左到右移动到一个字节,因此我们将按字符串元素移动。之后,我们将rax值推送到堆栈,现在它包含字符串中的符号(lodsb将字节从si放到al,al是rax的低8位)。为什么我们要把符号推到堆栈上?你必须记住堆栈是如何工作的,它是按照后进先出的原则工作的。这对我们很有好处。我们将从si中获取第一个元素,将其推到堆栈中,而不是第二个元素,依此类推。所以在堆栈顶部会有字符串的最后一个元素。而不仅仅是从堆栈中逐元素弹出并写入输出缓冲区。在它之后,我们增加计数器(rcx)并再次循环到例程的开始。

As you can understand by it’s name, it just calculates length of INPUT string and store result in rcx register. First of all we check that rsi register doesn’t point to zero, if so this is the end of string and we can exit from function. Next is lodsb instruction. It’s simple, it just put 1 byte to al register (low part of 16 bit ax) and changes rsi pointer. As we executed cld instruction, lodsb everytime will move rsi to one byte from left to right, so we will move by string symbols. After it we push rax value to stack, now it contains symbol from our string (lodsb puts byte from si to al, al is low 8 bit of rax). Why we did push symbol to stack? You must remember how stack works, it works by principle LIFO (last input, first output). It is very good for us. We will take first symbol from si, push it to stack, than second and so on. So there will be last symbol of string at the stack top. Than we just pop symbol by symbol from stack and write to OUTPUT buffer. After it we increment our counter (rcx) and loop again to the start of routine.

好的,我们把所有的符号从字符串推到栈,现在我们可以跳转到exitFromRoutine返回到_start 这里。怎么做?我们有ret指令。代码是这样的:

Ok, we pushed all symbols from string to stack, now we can jump to exitFromRoutine return to _start there. How to do it? We have ret instruction for this. But if code will be like this:

它不会起作用的。为什么?这很棘手。记得我们在开始时调用calculateStrLength。当我们调用一个函数时会发生什么?首先,函数的参数从右向左推到堆栈。返回地址将推送到堆栈。所以函数将知道在执行结束后返回哪里。但是看看calculateStrength,我们将符号从字符串推送到堆栈,现在堆栈顶部没有返回地址,函数不知道返回到哪里。如何面对它。现在,我们必须在调用之前查看奇怪的指令:

It will not work. Why? It is tricky. Remember we called calculateStrLength at _start. What occurs when we call a function? First of all function’s parameters pushes to stack from right to left. After it return address pushes to stack. So function will know where to return after end of execution. But look at calculateStrLength, we pushed symbols from our string to stack and now there is no return address of stack top and function doesn’t know where to return. How to be with it. Now we must take a look to the weird instruction before call:

首先:

所以我们有mov rdi的位置,$+15,但是为什么我们在这里加15?听着,我们需要知道下一行在计算长度后的位置。让我们使用objdump util打开文件:

First all:

So we have position of mov rdi, $ + 15, but why we add 15 here? Look, we need to know position of next line after calculateStrLength. Let’s open our file with objdump util:

我们可以看到,第12行(mov rdi,$+15)占用10个字节,第16-5行的函数调用占用15个字节。所以我们的回信地址是mov rdi,$+15。现在,我们可以将返回地址从rdi推送到堆栈并从函数返回:

We can see here that line 12 (our mov rdi, $ + 15) takes 10 bytes and function call at line 16 - 5 bytes, so it takes 15 bytes. That’s why our return address will be mov rdi, $ + 15. Now we can push return address from rdi to stack and return from function:

现在我们回到起点。在调用calculateStrength之后,我们向rax和rdi写入零并跳转到reverseStr标签。具体实施如下:

Now we return to start. After call of the calculateStrLength we write zeros to rax and rdi and jump to reverseStr label. It’s implementation is following:

在这里,我们检查计数器,它是字符串的长度,如果它是零,我们将所有元素写入缓冲区,并可以打印它。检查完计数器后,我们从堆栈弹出到rax寄存器的第一个元素,并将其写入输出缓冲区。我们添加rdi是因为用另一种方法我们将符号写入缓冲区的第一个字节。在此之后,我们通过输出缓冲区增加移动下一个的rdi,减少长度计数器并跳到标签的开始。
执行reverseStr之后,我们在输出缓冲区中反转了字符串,可以用新行将结果写入stdout:

Here we check our counter which is length of string and if it is zero we wrote all symbols to buffer and can print it. After checking counter we pop from stack to rax register first symbol and write it to OUTPUT buffer. We add rdi because in other way we’ll write symbol to first byte of buffer. After this we increase rdi for moving next by OUTPUT buffer, decrease length counter and jump to the start of label.

After execution of reverseStr we have reversed string in OUTPUT buffer and can write result to stdout with new line:

从我们的程序退出:

and exit from the our program:

就这些,现在我们可以编译我们的程序:

That’s all, now we can compile our program with:

并运行它:

and run it:

当然,还有许多其他的字符串/字节操作说明:

Of course there are many other instructions for string/bytes manipulations:

 
 
 
 
 
 
 
 
 
 
 
 

[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

收藏
免费 4
支持
分享
最新回复 (1)
雪    币: 641
活跃值: (404)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
2
没雪币了,还差四篇文章没办法上传啊.管理员能帮忙么?
2020-1-13 18:41
1
游客
登录 | 注册 方可回帖
返回
//