首页
社区
课程
招聘
[翻译]Say hello to x86_64 Assembly [part 2]
发表于: 2020-1-13 18:35 9216

[翻译]Say hello to x86_64 Assembly [part 2]

2020-1-13 18:35
9216

Say hello to x86_64 Assembly [part 2]

最近在学习x64汇编,在github上面找到了一点学习资料,入门级别的,因为想细致的学习一下,所以顺便久把作者的内容都翻译了一下,也不知道自己翻译的是否合适,请大家看看有问题的地方请批评指正.第一次做翻译,做的不好请大家原谅,

 

作者原文


 

几天前,我写了第一篇博文-介绍x64汇编-Say hello to x64 Assembly [part 1],出乎我的意料,引起了极大的兴趣:

 

Some days ago I wrote the first blog post - introduction to x64 assembly - Say hello to x64 Assembly [part 1] which to my surprise caused great interest:

 

它更激励我去描述我的学习方式。在这段时间里,我从不同的人那里得到了很多反馈。有很多感激的话,但对我来说更重要的是,有很多建议和批评。特别是我想对你的反馈说声谢谢:

 

It motivates me even more to describe my way of learning. During this days I got many feedback from different people. There were many grateful words, but what is more important for me, there were many advices and adequate critics. Especially I want to say thank you words for great feedback to:

以及所有参与Reddit和Hacker News讨论的人。有很多意见,第一部分对初学者来说不是很清楚,这就是为什么我决定写更多信息性的文章。所以,让我们从Say hello to x86_64 assembly的第二部分开始。

 

And all who took a part in discussion at Reddit and Hacker News. There were many opinions, that first part was a not very clear for absolute beginner, that’s why i decided to write more informative posts. So, let’s start with second part of Say hello to x86_64 assembly.

术语和概念

Terminology and Concepts

 

正如我在上面写的,我从不同的人那里得到了很多反馈,第一篇文章的某些部分并不是很清楚,这就是为什么让我们从描述一些术语开始,我们将在这一部分和下一部分看到这些术语。

 

As i wrote above, I got many feedback from different people that some parts of first post are not clear, that’s why let’s start from description of some terminology that we will see in this and next parts.

 

寄存器-寄存器是处理器内部的小批量的存储。处理器的核心是数据处理。处理器可以从内存中获取数据,但运行缓慢。这就是为什么处理器有自己的内部受限制的数据存储集,其名称叫-register(寄存器)。

 

Register - register is a small amount of storage inside processor. Main point of processor is data processing. Processor can get data from memory, but it is slow operation. That’s why processor has own internal restricted set of data storage which name is - register.

 

小端-我们可以把内存想象成一个大数组。它包含字节。每个地址存储内存“array”的一个元素。每个元素都是一个字节。例如,我们有4个字节:AA 56 AB FF。在小端数据中,最低有效字节的地址最小:

 

Little-endian - we can imagine memory as one large array. It contains bytes. Each address stores one element of the memory “array”. Each element is one byte. For example we have 4 bytes: AA 56 AB FF. In little-endian the least significant byte has the smallest address:

    0 FF
    1 AB
    2 56
    3 AA

其中0、1、2和3是内存地址。
大端-大端以与小端相反的顺序存储字节。所以如果我们有一个56 AB FF字节的序列,它将是:

 

where 0,1,2 and 3 are memory addresses.

 

Big-endian - big-endian stores bytes in opposite order than little-endian. So if we have AA 56 AB FF bytes sequence it will be:

    0 AA
    1 56
    2 AB
    3 FF

Syscall是一个用户级程序要求操作系统为它做一些事情的方式。您可以在这里找到syscall表
堆栈 - 处理器的寄存器数量非常有限,所以堆栈是内存可寻址的特殊寄存器RSP、SS、RIP等组成的一个连续区域。我们将在下一部分对堆栈进行更深入的研究。

 

Syscall - is the way a user level program asks the operating system to do something for it. You can find syscall table - here.

 

Stack - processor has a very restricted count of registers. So stack is a continuous area of memory addressable special registers RSP,SS,RIP and etc. We will take a closer look on stack in next parts.

 

节-每个汇编程序都由节组成。有以下部分:

  • data-段用于声明初始化的数据或常量
  • bss-段用于声明未初始化的变量
  • Text-段用于代码

Section - every assembly program consists from sections. There are following sections:

  • data - section is used for declaring initialized data or constants
  • bss - section is used for declaring non initialized variables
  • text - section is used for code

通用寄存器有16个-rax、rbx、rcx、rdx、rbp、rsp、rsi、rdi、r8、r9、r10、r11、r12、r13、r14、r15。当然,它并不是一个完整的与汇编编程相关的术语和概念列表。如果我们在下一篇博文中遇到另一个陌生和不熟悉的词,会有对这些词的解释。

 

General-purpose registers - there are 16 general-purpose registers - rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15. Of course, it is not a full list of terms and concepts which related with assembly programming. If we will meet another strange and unfamiliar words in next blog posts, there will be explanation of this words.

数据类型

Data Types

 

基本数据类型是字节、字、双字、四字和双四字。一个字节是8比特位,一个字是2个字节位,一个双字是4个字节位,一个四字是8个字节位,一个双四字是16个字节位(128位)。

 

The fundamental data types are bytes, words, doublewords, quadwords, and double quadwords. A byte is eight bits, a word is 2 bytes, a doubleword is 4 bytes, a quadword is 8 bytes and a double quadword is 16 bytes (128 bits).

 

现在我们只处理整数,所以让我们来看看。整数有两种类型:无符号整数和有符号整数。无符号整数是包含在字节、字、双字和四字中的无符号二进制数。无符号字节整数的值范围为0到255,无符号字整数的值范围为0到65535,无符号双字整数的值范围为0到2^32–1,无符号四字整数的值范围为0到2^64–1。有符号整数是在字节、字等中作为无符号保存的有符号二进制数。符号位设置为负整数,清除为正整数和零。整数值的范围从-128到+127(字节整数),从-32768到+32767(字整数),从-2^31到+2^31–1(双字整数),从-2^63到+2^63–1(四字整数)。

 

Now we will work only with integer numbers, so let’s see to it. There two types of integer: unsigned and signed. Unsigned integers are unsigned binary numbers contained in a byte, word, doubleword, and quadword. Their values range from 0 to 255 for an unsigned byte integer, from 0 to 65,535 for an unsigned word integer, from 0 to 2^32 – 1 for an unsigned doubleword integer, and from 0 to 2^64 – 1 for an unsigned quadword integer. Signed integers are signed binary numbers held as unsigned in a byte, word and etc… The sign bit is set for negative integers and cleared for positive integers and zero. Integer values range from –128 to +127 for a byte integer, from –32,768 to +32,767 for a word integer,from –2^31 to +2^31 – 1 for a doubleword integer, and from –2^63 to +2^63 – 1 for a quadword integer.

Sections

 

如上所述,每个汇编程序都由段组成,可以是数据部分、文本部分和bss部分。让我们看看数据部分,它是声明初始化常量的要地点。例如:

 

As i wrote above, every assembly program consists from sections, it can be data section, text section and bss section. Let’s look on data section.It’s main point - to declare initialized constants. For example:

section .data
    num1:   equ 100
    num2:   equ 50
    msg:    db "Sum is correct", 10

好的,这里几乎都清楚了。3个名为num1、num2、msg的常量,值为100、50和“Sum is correct”,10。但是什么是db,equ?实际是NASM支持许多伪指令之一:

 

Ok, it is almost all clear here. 3 constants with name num1, num2, msg and with values 100, 50 and “Sum is correct”, 10. But what is it db, equ? Actual NASM supports a number of pseudo-instructions:

  • DB、DW、DD、DQ、DT、DO、DY和DZ-用于声明初始化数据。例如:

  • DB, DW, DD, DQ, DT, DO, DY and DZ - are used for declaring initialized data. For example:

;; Initialize 4 bytes 1h, 2h, 3h, 4h
db 0x01,0x02,0x03,0x04

;; Initialize word to 0x12 0x34
dw    0x1234
  • RESB, RESW, RESD, RESQ, REST, RESO, RESY and RESZ -用于声明未初始化的变量
  • INCBIN - 包含外部二进制文件
  • EQU - 定义常量。例如:

  • RESB, RESW, RESD, RESQ, REST, RESO, RESY and RESZ - are used for declaring non initialized variables

  • INCBIN - includes External Binary Files
  • EQU - defines constant. For example:
;; now one is 1
one equ 1

TIMES - 重复的指令或数据。(描述将在下一篇文章中)

 

TIMES - Repeating Instructions or Data. (description will be in next posts)

算术运算

Arithmetic operations

 

下面是算术指令的简短列表:

  • ADD - 整数加法
  • SUB - 减法
  • MUL - 无符号乘法
  • IMUL - 有符号乘法
  • DIV - 无符号除法
  • IDIV - 有符号除法
  • INC - 递增
  • DEC - 递减
  • NEG - 求补运算

我们将在这篇文章的练习中看到其中的一些。其他将在下一篇文章中讨论。

 

There is short list of arithmetic instructions:

  • ADD - integer add
  • SUB - substract
  • MUL - unsigned multiply
  • IMUL - signed multiply
  • DIV - unsigned divide
  • IDIV - signed divide
  • INC - increment
  • DEC - decrement
  • NEG - negate

Some of it we will see at practice in this post. Other will be covered in next posts.

控制流

Control flow

 

通常编程语言能够改变求值顺序(使用if语句、case语句、goto等),而汇编语言也有这种能力。在这里我们将看到一些。cmp指令用于执行两个值之间的比较。它与条件跳转指令一起用于决策。例如:

 

Usually programming languages have ability to change order of evaluation (with if statement, case statement, goto and etc…) and assembly has it too. Here we will see some of it. There is cmp instruction for performing comparison between two values. It is used along with the conditional jump instruction for decision making. For example:

;; compare rax with 50
cmp rax, 50

cmp指令只比较两个值,但不会影响它们,也不会根据比较结果执行任何操作。对于比较后执行的任何操作,都有条件跳转指令。它可以是下面的其中之一:

 

The cmp instruction just compares 2 values, but doesn’t affect them and doesn’t execute anything depend on result of comparison. For performing any actions after comparison there is conditional jump instructions. It can be one of it:

  • JE - 如果相等
  • JZ - 如果为零
  • JNE - 如果不相等
  • JNZ - 如果不为零
  • JG - 如果第一个操作数大于第二个操作数
  • JGE - 如果第一个操作数大于或等于第二个操作数
  • JA - 与JG相同,但执行无符号比较
  • JAE - 与JGE相同,但执行无符号比较
  • JE - if equal
  • JZ - if zero
  • JNE - if not equal
  • JNZ - if not zero
  • JG - if first operand is greater than second
  • JGE - if first operand is greater or equal to second
  • JA - the same that JG, but performs unsigned comparison
  • JAE - the same that JGE, but performs unsigned comparison

例如,如果我们想在C中生成类似if/else的语句:

 

For example if we want to make something like if/else statement in C:

if (rax != 50) {
    exit();
} else {
    right();
}

在汇编中:

 

will be in assembly:

;; compare rax with 50
cmp rax, 50
;; perform .exit if rax is not equal 50
jne .exit
jmp .right

还有无条件跳转语法:

 

There is also unconditional jump with syntax:

JMP label

例子

_start:
    ;; ....
    ;; do something and jump to .exit label
    ;; ....
    jmp .exit

.exit:
    mov    rax, 60
    mov    rdi, 0
    syscall

在这里我们可以有一些代码在开始标签之后,所有这些代码将被执行,汇编转移控制指令到.Excel标签和代码之后。退出代码将开始执行。

 

Here we have can have some code which will be after _start label, and all of this code will be executed, assembly transfer control to .exit label, and code after .exit: will start to execute.

 

在循环中经常使用无条件跳转。例如,我们有标签和一些代码。此代码执行的任何操作,如果条件不成功,则跳转到此代码的开头。循环将在下一部分中介绍。

 

Often unconditional jump uses in loops. For example we have label and some code after it. This code executes anything, than we have condition and jump to the start of this code if condition is not successfully. Loops will be covered in next parts.

例子

Example

 

让我们看一个简单的例子。它将取两个整数,得到这些数的和,并与预定义的数进行比较。如果预定义的数等于两个整数的和,那么它将在屏幕上打印一些东西,如果不是的话,只需退出即可。下面是我们示例的源代码:

 

Let’s see simple example. It will take two integer numbers, get sum of these numbers and compare it with predefined number. If predefined number is equal to sum, it will print something on the screen, if not - just exit. Here is the source code of our example:

section .data
    ; Define constants
    num1:   equ 100
    num2:   equ 50
    ; initialize message
    msg:    db "Sum is correct\n"

section .text

    global _start

;; entry point
_start:
    ; set num1's value to rax
    mov rax, num1
    ; set num2's value to rbx
    mov rbx, num2
    ; get sum of rax and rbx, and store it's value in rax
    add rax, rbx
    ; compare rax and 150
    cmp rax, 150
    ; go to .exit label if rax and 150 are not equal
    jne .exit
    ; go to .rightSum label if rax and 150 are equal
    jmp .rightSum

; Print message that sum is correct
.rightSum:
    ;; write syscall
    mov     rax, 1
    ;; file descritor, standard output
    mov     rdi, 1
    ;; message address
    mov     rsi, msg
    ;; length of message
    mov     rdx, 15
    ;; call write syscall
    syscall
    ; exit from program
    jmp .exit

; exit procedure
.exit:
    ; exit syscall
    mov    rax, 60
    ; exit code
    mov    rdi, 0
    ; call exit syscall
    syscall

让我们看看源代码。首先有两个常数num1、num2和变量msg的数据段,其值为“Sum is correct\n”。现在看14行。有程序开始的入口点。我们将num1和num2值传输到通用寄存器rax和rbx。用加法指令求和。在执行add指令后,它计算rax和rbx的值之和,并将其值存储到rax。现在我们有了rax寄存器中num1和num2的和。

 

Let’s go through the source code. First of all there is data section with two constants num1, num2 and variable msg with “Sum is correct\n” value. Now look at 14 line. There is begin of program’s entry point. We transfer num1 and num2 values to general purpose registers rax and rbx. Sum it with add instruction. After execution of add instruction, it calculates sum of values from rax and rbx and store it’s value to rax. Now we have sum of num1 and num2 in the rax register.

 

好的,我们有值为100的num1和值为50的num2。我们的总数必须是150。让我们用cmp指令检查一下。在比较RAX和150之后,我们检查比较的结果,如果RAX和150不相等(用JNE检查),我们去.exit标签,如果它们相等,我们去.rightSum标签。

 

Ok we have num1 which is 100 and num2 which is 50. Our sum must be 150. Let’s check it with cmp instruction. After comparison rax and 150 we check result of comparison, if rax and 150 are not equal (checking it with jne) we go to .exit label, if they are equal we go to .rightSum label.

 

现在我们有两个标签:.exit和.rightSum。首先是设置60到rax,它是退出系统的调用码,0是rdi,它是一个退出码。第二个是.rightSum ,非常简单,它只是打印出"Sum is correct"。

 

Now we have two labels: .exit and .rightSum. First is just sets 60 to rax, it is exit system call number, and 0 to rdi, it is a exit code. Second is .rightSum is pretty easy, it just prints Sum is correct.


[培训]内核驱动高级班,冲击BAT一流互联网大厂工作,每周日13:00-18:00直播授课

最后于 2020-1-13 18:43 被sunzhanwei编辑 ,原因:
收藏
免费 3
支持
分享
最新回复 (2)
雪    币: 641
活跃值: (404)
能力值: ( LV3,RANK:30 )
在线值:
发帖
回帖
粉丝
2
没雪币了,还差四篇文章没办法上传啊.管理员能帮忙么?
2020-1-13 18:42
0
雪    币: 922
活跃值: (1813)
能力值: ( LV4,RANK:40 )
在线值:
发帖
回帖
粉丝
3
帮!
2023-1-28 16:12
0
游客
登录 | 注册 方可回帖
返回
//