首页
社区
课程
招聘
[翻译]Say hello to x86_64 Assembly [part 3]
2020-1-13 18:37 6411

[翻译]Say hello to x86_64 Assembly [part 3]

2020-1-13 18:37
6411

Say hello to x86_64 Assembly [part 3]

最近在学习x64汇编,在github上面找到了一点学习资料,入门级别的,因为想细致的学习一下,所以顺便久把作者的内容都翻译了一下,也不知道自己翻译的是否合适,请大家看看有问题的地方请批评指正.第一次做翻译,做的不好请大家原谅,

 

作者原文


 

堆栈是存储器中的一个特殊区域,其工作原理是后进先出。

 

The stack is special region in memory, which operates on the principle lifo (Last Input, First Output).

 

我们有16个通用寄存器用于临时数据存储。它们是RAX、RBX、RCX、RDX、RDI、RSI、RBP、RSP和R8-R15。对于应用程序来说太少了。所以我们可以在堆栈中存储数据。堆栈的另一个用法是:当我们调用函数时,返回地址的副本保存在堆栈中。函数执行结束后,返回地址被复制到(RIP)中应用程序将继续从函数后的下一个位置执行。

 

We have 16 general-purpose registers for temporary data storage. They are RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP and R8-R15. It’s too few for serious applications. So we can store data in the stack. Yet another usage of stack is following: When we call a function, return address copied in stack. After end of function execution, address copied in commands counter (RIP) and application continue to executes from next place after function.

 

例子 :

global _start

section .text

_start:
        mov rax, 1
        call incRax
        cmp rax, 2
        jne exit
        ;;
        ;; Do something
        ;;

incRax:
        inc rax
        ret

这里我们可以看到,在应用程序运行之后,rax等于1。然后我们调用incRax函数,它将rax值增加到1,现在rax值必须是2。在这个执行之后,继续从8行开始,我们将rax值与2进行比较。同样,我们可以在System V AMD64 ABI中读取,前六个函数参数在寄存器中传递。他们是:

 

Here we can see that after application runnning, rax is equal to 1. Then we call a function incRax, which increases rax value to 1, and now rax value must be 2. After this execution continues from 8 line, where we compare rax value with 2. Also as we can read in System V AMD64 ABI, the first six function arguments passed in registers. They are:

  • rdi-第一个参数

  • rsi-第二个参数

  • rdx-第三个参数

  • rcx-第四个参数

  • r8-第五个参数

  • r9-第六个参数

  • rdi - first argument

  • rsi - second argument
  • rdx - third argument
  • rcx - fourth argument
  • r8 - fifth argument
  • r9 - sixth

下一个参数将在堆栈中传递。如果我们有这样的函数:

 

Next arguments will be passed in stack. So if we have function like this:

int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7)
{
    return (a1 + a2 - a3 - a4 + a5 - a6) * a7;
}

然后,前六个参数将在寄存器中传递,但7个参数将在堆栈中传递。

 

Then first six arguments will be passed in registers, but 7 argument will be passed in stack.

堆栈指针

Stack pointer

 

正如我所说,我们有16个通用寄存器,有两个有趣的寄存器-RSP和RBP。RBP是基指针寄存器。它指向当前堆栈帧的基部。RSP是堆栈指针,它指向当前堆栈帧的顶部。

 

As i wroute about we have 16 general-purpose registers, and there are two interesting registers - RSP and RBP. RBP is the base pointer register. It points to the base of the current stack frame. RSP is the stack pointer, which points to the top of current stack frame.

 

命令

 

Commands

 

我们有两个使用stack的命令:

  • push 参数-递增堆栈指针(RSP)并将参数存储在堆栈指针所指向的位置

  • pop 参数-将数据从堆栈指针指向的位置复制到参数

We have two commands for work with stack:

  • push argument - increments stack pointer (RSP) and stores argument in location pointed by stack pointer
  • pop argument - copied data to argument from location pointed by stack pointer

让我们看一个简单的例子:

 

Let’s look on one simple example:

global _start

section .text

_start:
        mov rax, 1
        mov rdx, 2
        push rax
        push rdx

        mov rax, [rsp + 8]

        ;;
        ;; Do something
        ;;

这里我们可以看到,我们把1放到rax寄存器,2放到rdx寄存器。之后,我们推送这些寄存器的值到堆栈中。堆栈起后进先出的作用。因此,在这个堆栈之后,我们的应用程序将具有以下结构:

 

Here we can see that we put 1 to rax register and 2 to rdx register. After it we push to stack values of these registers. Stack works as LIFO (Last In First Out). So after this stack or our application will have following structure:

 

Diagram

 

然后我们从地址为rsp+8的堆栈中复制值。这意味着我们得到栈顶的地址,再加上8,然后按这个地址将数据复制到rax。之后rax值为1。

 

Then we copy value from stack which has address rsp + 8. It means we get address of top of stack, add 8 to it and copy data by this address to rax. After it rax value will be 1.

例子

Example

 

让我们看一个例子。我们将编写简单的程序,它将得到两个命令行参数。将获取此参数的和并打印结果。

 

Let’s see one example. We will write simple program, which will get two command line arguments. Will get sum of this arguments and print result.

section .data
        SYS_WRITE equ 1
        STD_IN    equ 1
        SYS_EXIT  equ 60
        EXIT_CODE equ 0

        NEW_LINE   db 0xa
        WRONG_ARGC db "Must be two command line argument", 0xa

首先,我们用一些值定义.data节。这里有四个用于Linux syscalls的常量,sys_write、sys_exit等等...我们还有两个字符串:第一个是新的行符号,第二个是错误消息。
让我们看看.text部分,它由程序代码组成:

 

First of all we define .data section with some values. Here we have four constants for linux syscalls, for sys_write, sys_exit and etc… And also we have two strings: First is just new line symbol and second is error message.

 

Let’s look on the .text section, which consists from code of program:

section .text
        global _start

_start:
        pop rcx
        cmp rcx, 3
        jne argcError

        add rsp, 8
        pop rsi
        call str_to_int

        mov r10, rax
        pop rsi
        call str_to_int
        mov r11, rax

        add r10, r11

让我们试着理解,这里发生了什么:在 _start label 指令之后,从堆栈中获取第一个值并将其放入rcx寄存器。如果使用命令行参数运行应用程序,则在按以下顺序运行之后,所有这些参数都将位于堆栈中:

 

Let’s try to understand, what is happening here: After _start label first instruction get first value from stack and puts it to rcx register. If we run application with command line arguments, all of their will be in stack after running in following order:

    [rsp] - top of stack will contain arguments count.
    [rsp + 8] - will contain argv[0]
    [rsp + 16] - will contain argv[1]
    and so on...

所以我们得到命令行参数count并把它放到rcx中。之后我们将rcx与3进行比较。如果它们不相等,我们跳到argcError标签,它只打印错误消息:

 

So we get command line arguments count and put it to rcx. After it we compare rcx with 3. And if they are not equal we jump to argcError label which just prints error message:

argcError:
    ;; sys_write syscall
    mov     rax, 1
    ;; file descritor, standard output
    mov     rdi, 1
    ;; message address
    mov     rsi, WRONG_ARGC
    ;; length of message
    mov     rdx, 34
    ;; call write syscall
    syscall
    ;; exit from program
    jmp exit

当我们有两个参数时为什么我们要和3比较。很简单。第一个参数是程序名,后面都是我们传递给程序的命令行参数。好的,如果我们传递了两个命令行参数,我们将转到10行。这里我们将rsp移到8,从而丢失了第一个参数——程序名。现在rsp指向我们传递的第一个命令行参数。我们用pop命令得到它,并将其放入rsi寄存器,然后调用函数将其转换为整数。接下来我们将阅读 str_to_int 实现。函数结束后,我们在rax寄存器中有整数值,并将其保存在r10寄存器中。在这之后我们做同样的操作但是用r11。最后,我们在r10和r11寄存器中有两个整数值,现在可以用add命令求出它的和。现在我们必须将结果转换为字符串并打印出来。让我们看看怎么做:

 

Why we compare with 3 when we have two arguments. It’s simple. First argument is a program name, and all after it are command line arguments which we passed to program. Ok, if we passed two command line arguments we go next to 10 line. Here we shift rsp to 8 and thereby missing the first argument - the name of the program. Now rsp points to first command line argument which we passed. We get it with pop command and put it to rsi register and call function for converting it to integer. Next we read about str_to_int implementation. After our function ends to work we have integer value in rax register and we save it in r10 register. After this we do the same operation but with r11. In the end we have two integer values in r10 and r11 registers, now we can get sum of it with add command. Now we must convert result to string and print it. Let’s see how to do it:

mov rax, r10
;; number counter
xor r12, r12
;; convert to string
jmp int_to_str

在这里,我们将命令行参数的和放入rax寄存器,将r12设置为零,并跳转到int_to_str。我们已经知道如何打印字符串,我们有什么可以打印。让我们看看str_to_int和int_to_str的实现。

 

Here we put sum of command line arguments to rax register, set r12 to zero and jump to int_to_str. Ok now we have base of our program. We already know how to print string and we have what to print. Let’s see at str_to_int and int_to_str implementation.

str_to_int:
            xor rax, rax
            mov rcx,  10
next:
        cmp [rsi], byte 0
        je return_str
        mov bl, [rsi]
            sub bl, 48
        mul rcx
        add rax, rbx
        inc rsi
        jmp next

return_str:
        ret

在str_to_int开始时,我们将rax设置为0,rcx设置为10。然后我们转到下一个标签。正如您在上面的示例中所看到的(str_to_int调用之前的第一行),我们从堆栈中将argv[1]放在rsi中。现在我们将rsi的第一个字节与0进行比较,因为每个字符串都以NULL结尾,如果是,我们将返回。如果不是0,我们将它的值复制到一个字节的bl寄存器,并从中减去48。为什么是48?从0到9的所有数字在asci表中都有48到57个代码。所以如果我们从数字符号48(例如从57)中减去,我们得到数字。然后在rcx上乘以rax(值为-10)。在此之后,我们增加rsi以获得下一个字节并再次循环。算法很简单。例如,如果rsi指向‘5’ ‘7’ ‘6’ ‘\000’序列,则将执行以下步骤:

 

At the start of str_to_int, we set up rax to 0 and rcx to 10. Then we go to next label. As you can see in above example (first line before first call of str_to_int) we put argv[1] in rsi from stack. Now we compare first byte of rsi with 0, because every string ends with NULL symbol and if it is we return. If it is not 0 we copy it’s value to one byte bl register and substract 48 from it. Why 48? All numbers from 0 to 9 have 48 to 57 codes in asci table. So if we substract from number symbol 48 (for example from 57) we get number. Then we multiply rax on rcx (which has value - 10). After this we increment rsi for getting next byte and loop again. Algorthm is simple. For example if rsi points to ‘5’ ‘7’ ‘6’ ‘\000’ sequence, then will be following steps:

    rax = 0
    get first byte - 5 and put it to rbx
    rax * 10 --> rax = 0 * 10
    rax = rax + rbx = 0 + 5
    Get second byte - 7 and put it to rbx
    rax * 10 --> rax = 5 * 10 = 50
    rax = rax + rbx = 50 + 7 = 57
    and loop it while rsi is not \000

在str_to_int之后我们将得到rax中的数字。现在让我们看看int_to_str:

 

After str_to_int we will have number in rax. Now let’s look at int_to_str:

int_to_str:
        mov rdx, 0
        mov rbx, 10
        div rbx
        add rdx, 48
        add rdx, 0x0
        push rdx
        inc r12
        cmp rax, 0x0
        jne int_to_str
        jmp print

这里我们把0放入rdx,10放入rbx。然后我们执行除以rbx。如果我们在str_to_int调用之前查看上面的代码。我们将看到rax包含两个命令行参数的整数和。通过这个指令,我们将rax值除以rbx值,并在rdx中得到提醒,在rax中得到整个部分。接下来我们添加到rdx 48和0x0。在添加48之后,我们将得到这个数字的asci符号,所有字符串都以0x0结尾。在此之后,我们将符号保存到堆栈中,增量r12(在第一次迭代时为0,在开始时设置为0)并将rax与0进行比较,如果是0,则意味着我们结束将整数转换为字符串。算法的一步一步如下:例如我们有23

 

Here we put 0 to rdx and 10 to rbx. Than we exeute div rbx. If we look above at code before str_to_int call. We will see that rax contains integer number - sum of two command line arguments. With this instruction we devide rax value on rbx value and get reminder in rdx and whole part in rax. Next we add to rdx 48 and 0x0. After adding 48 we’ll get asci symbol of this number and all strings much be ended with 0x0. After this we save symbol to stack, increment r12 (it’s 0 at first iteration, we set it to 0 at the _start) and compare rax with 0, if it is 0 it means that we ended to convert integer to string. Algorithm step by step is following: For example we have number 23

    123 / 10. rax = 12; rdx = 3
    rdx + 48 = "3"
    push "3" to stack
    compare rax with 0 if no go again
    12 / 10. rax = 1; rdx = 2
    rdx + 48 = "2"
    push "2" to stack
    compare rax with 0, if yes we can finish function execution and we will have "2" "3" ... in stack

我们实现了两个有用的函数int_to_str和str_to_int,用于将整数转换为字符串,反之亦然。现在我们有两个整数的和,它们被转换成字符串并保存在堆栈中。我们可以打印结果:

 

We implemented two useful function int_to_str and str_to_int for converting integer number to string and vice versa. Now we have sum of two integers which was converted into string and saved in the stack. We can print result:

print:
    ;;;; calculate number length
    mov rax, 1
    mul r12
    mov r12, 8
    mul r12
    mov rdx, rax

    ;;;; print sum
    mov rax, SYS_WRITE
    mov rdi, STD_IN
    mov rsi, rsp
    ;; call sys_write
    syscall

jmp exit

我们已经知道如何用sys_write syscall打印字符串,但这里有一个有趣的部分。我们必须计算字符串的长度。如果您查看int_to_str,您将看到我们每次迭代都递增r12寄存器,因此它包含我们的字符串的字数。我们必须将它乘以8(因为我们将每个符号都推到堆栈中),它将是需要打印字符串的长度。在这之后,我们每次都把1放在rax(sys_write number)、1放在rdi(stdin)、字符串长度放在rdx和指向栈顶的指针放在rsi(start of string)上。完成我们的程序:

 

We already know how to print string with sys_write syscall, but here is one interesting part. We must to calculate length of string. If you will look on the int_to_str, you will see that we increment r12 register every iteration, so it contains amount of digits in our number. We must multiple it to 8 (because we pushed every symbol to stack) and it will be length of our string which need to print. After this we as everytime put 1 to rax (sys_write number), 1 to rdi (stdin), string length to rdx and pointer to the top of stack to rsi (start of string). And finish our program:

exit:
    mov rax, SYS_EXIT
    exit code
    mov rdi, EXIT_CODE
    syscall

就这些了!

 

That's all.


[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

收藏
点赞3
打赏
分享
最新回复 (1)
雪    币: 641
活跃值: (399)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
sunzhanwei 2020-1-13 18:42
2
0
没雪币了,还差四篇文章没办法上传啊.管理员能帮忙么?
游客
登录 | 注册 方可回帖
返回