Some days ago I wrote the first blog post - introduction to x64 assembly - Say hello to x64 Assembly [part 1] which to my surprise caused great interest:
It motivates me even more to describe my way of learning. During this days I got many feedback from different people. There were many grateful words, but what is more important for me, there were many advices and adequate critics. Especially I want to say thank you words for great feedback to:
以及所有参与Reddit和Hacker News讨论的人。有很多意见,第一部分对初学者来说不是很清楚,这就是为什么我决定写更多信息性的文章。所以,让我们从Say hello to x86_64 assembly的第二部分开始。
And all who took a part in discussion at Reddit and Hacker News. There were many opinions, that first part was a not very clear for absolute beginner, that’s why i decided to write more informative posts. So, let’s start with second part of Say hello to x86_64 assembly.
As i wrote above, I got many feedback from different people that some parts of first post are not clear, that’s why let’s start from description of some terminology that we will see in this and next parts.
Register - register is a small amount of storage inside processor. Main point of processor is data processing. Processor can get data from memory, but it is slow operation. That’s why processor has own internal restricted set of data storage which name is - register.
小端-我们可以把内存想象成一个大数组。它包含字节。每个地址存储内存“array”的一个元素。每个元素都是一个字节。例如,我们有4个字节:AA 56 AB FF。在小端数据中,最低有效字节的地址最小:
Little-endian - we can imagine memory as one large array. It contains bytes. Each address stores one element of the memory “array”. Each element is one byte. For example we have 4 bytes: AA 56 AB FF. In little-endian the least significant byte has the smallest address:
0 FF
1 AB
2 56
3 AA
其中0、1、2和3是内存地址。 大端-大端以与小端相反的顺序存储字节。所以如果我们有一个56 AB FF字节的序列,它将是:
where 0,1,2 and 3 are memory addresses.
Big-endian - big-endian stores bytes in opposite order than little-endian. So if we have AA 56 AB FF bytes sequence it will be:
Syscall - is the way a user level program asks the operating system to do something for it. You can find syscall table - here.
Stack - processor has a very restricted count of registers. So stack is a continuous area of memory addressable special registers RSP,SS,RIP and etc. We will take a closer look on stack in next parts.
节-每个汇编程序都由节组成。有以下部分:
data-段用于声明初始化的数据或常量
bss-段用于声明未初始化的变量
Text-段用于代码
Section - every assembly program consists from sections. There are following sections:
data - section is used for declaring initialized data or constants
bss - section is used for declaring non initialized variables
General-purpose registers - there are 16 general-purpose registers - rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi, r8, r9, r10, r11, r12, r13, r14, r15. Of course, it is not a full list of terms and concepts which related with assembly programming. If we will meet another strange and unfamiliar words in next blog posts, there will be explanation of this words.
The fundamental data types are bytes, words, doublewords, quadwords, and double quadwords. A byte is eight bits, a word is 2 bytes, a doubleword is 4 bytes, a quadword is 8 bytes and a double quadword is 16 bytes (128 bits).
Now we will work only with integer numbers, so let’s see to it. There two types of integer: unsigned and signed. Unsigned integers are unsigned binary numbers contained in a byte, word, doubleword, and quadword. Their values range from 0 to 255 for an unsigned byte integer, from 0 to 65,535 for an unsigned word integer, from 0 to 2^32 – 1 for an unsigned doubleword integer, and from 0 to 2^64 – 1 for an unsigned quadword integer. Signed integers are signed binary numbers held as unsigned in a byte, word and etc… The sign bit is set for negative integers and cleared for positive integers and zero. Integer values range from –128 to +127 for a byte integer, from –32,768 to +32,767 for a word integer,from –2^31 to +2^31 – 1 for a doubleword integer, and from –2^63 to +2^63 – 1 for a quadword integer.
As i wrote above, every assembly program consists from sections, it can be data section, text section and bss section. Let’s look on data section.It’s main point - to declare initialized constants. For example:
section .data
num1: equ 100
num2: equ 50
msg: db "Sum is correct", 10
好的,这里几乎都清楚了。3个名为num1、num2、msg的常量,值为100、50和“Sum is correct”,10。但是什么是db,equ?实际是NASM支持许多伪指令之一:
Ok, it is almost all clear here. 3 constants with name num1, num2, msg and with values 100, 50 and “Sum is correct”, 10. But what is it db, equ? Actual NASM supports a number of pseudo-instructions:
DB、DW、DD、DQ、DT、DO、DY和DZ-用于声明初始化数据。例如:
DB, DW, DD, DQ, DT, DO, DY and DZ - are used for declaring initialized data. For example:
;; Initialize 4 bytes 1h, 2h, 3h, 4h
db 0x01,0x02,0x03,0x04
;; Initialize word to 0x12 0x34
dw 0x1234
RESB, RESW, RESD, RESQ, REST, RESO, RESY and RESZ -用于声明未初始化的变量
INCBIN - 包含外部二进制文件
EQU - 定义常量。例如:
RESB, RESW, RESD, RESQ, REST, RESO, RESY and RESZ - are used for declaring non initialized variables
INCBIN - includes External Binary Files
EQU - defines constant. For example:
;; now one is 1
one equ 1
TIMES - 重复的指令或数据。(描述将在下一篇文章中)
TIMES - Repeating Instructions or Data. (description will be in next posts)
算术运算
Arithmetic operations
下面是算术指令的简短列表:
ADD - 整数加法
SUB - 减法
MUL - 无符号乘法
IMUL - 有符号乘法
DIV - 无符号除法
IDIV - 有符号除法
INC - 递增
DEC - 递减
NEG - 求补运算
我们将在这篇文章的练习中看到其中的一些。其他将在下一篇文章中讨论。
There is short list of arithmetic instructions:
ADD - integer add
SUB - substract
MUL - unsigned multiply
IMUL - signed multiply
DIV - unsigned divide
IDIV - signed divide
INC - increment
DEC - decrement
NEG - negate
Some of it we will see at practice in this post. Other will be covered in next posts.
Usually programming languages have ability to change order of evaluation (with if statement, case statement, goto and etc…) and assembly has it too. Here we will see some of it. There is cmp instruction for performing comparison between two values. It is used along with the conditional jump instruction for decision making. For example:
The cmp instruction just compares 2 values, but doesn’t affect them and doesn’t execute anything depend on result of comparison. For performing any actions after comparison there is conditional jump instructions. It can be one of it:
JE - 如果相等
JZ - 如果为零
JNE - 如果不相等
JNZ - 如果不为零
JG - 如果第一个操作数大于第二个操作数
JGE - 如果第一个操作数大于或等于第二个操作数
JA - 与JG相同,但执行无符号比较
JAE - 与JGE相同,但执行无符号比较
JE - if equal
JZ - if zero
JNE - if not equal
JNZ - if not zero
JG - if first operand is greater than second
JGE - if first operand is greater or equal to second
JA - the same that JG, but performs unsigned comparison
JAE - the same that JGE, but performs unsigned comparison
例如,如果我们想在C中生成类似if/else的语句:
For example if we want to make something like if/else statement in C:
if (rax != 50) {
exit();
} else {
right();
}
在汇编中:
will be in assembly:
;; compare rax with 50
cmp rax, 50
;; perform .exit if rax is not equal 50
jne .exit
jmp .right
还有无条件跳转语法:
There is also unconditional jump with syntax:
JMP label
例子
_start:
;; ....
;; do something and jump to .exit label
;; ....
jmp .exit
.exit:
mov rax, 60
mov rdi, 0
syscall
Here we have can have some code which will be after _start label, and all of this code will be executed, assembly transfer control to .exit label, and code after .exit: will start to execute.
Often unconditional jump uses in loops. For example we have label and some code after it. This code executes anything, than we have condition and jump to the start of this code if condition is not successfully. Loops will be covered in next parts.
Let’s see simple example. It will take two integer numbers, get sum of these numbers and compare it with predefined number. If predefined number is equal to sum, it will print something on the screen, if not - just exit. Here is the source code of our example:
section .data
; Define constants
num1: equ 100
num2: equ 50
; initialize message
msg: db "Sum is correct\n"
section .text
global _start
;; entry point
_start:
; set num1's value to rax
mov rax, num1
; set num2's value to rbx
mov rbx, num2
; get sum of rax and rbx, and store it's value in rax
add rax, rbx
; compare rax and 150
cmp rax, 150
; go to .exit label if rax and 150 are not equal
jne .exit
; go to .rightSum label if rax and 150 are equal
jmp .rightSum
; Print message that sum is correct
.rightSum:
;; write syscall
mov rax, 1
;; file descritor, standard output
mov rdi, 1
;; message address
mov rsi, msg
;; length of message
mov rdx, 15
;; call write syscall
syscall
; exit from program
jmp .exit
; exit procedure
.exit:
; exit syscall
mov rax, 60
; exit code
mov rdi, 0
; call exit syscall
syscall
让我们看看源代码。首先有两个常数num1、num2和变量msg的数据段,其值为“Sum is correct\n”。现在看14行。有程序开始的入口点。我们将num1和num2值传输到通用寄存器rax和rbx。用加法指令求和。在执行add指令后,它计算rax和rbx的值之和,并将其值存储到rax。现在我们有了rax寄存器中num1和num2的和。
Let’s go through the source code. First of all there is data section with two constants num1, num2 and variable msg with “Sum is correct\n” value. Now look at 14 line. There is begin of program’s entry point. We transfer num1 and num2 values to general purpose registers rax and rbx. Sum it with add instruction. After execution of add instruction, it calculates sum of values from rax and rbx and store it’s value to rax. Now we have sum of num1 and num2 in the rax register.
Ok we have num1 which is 100 and num2 which is 50. Our sum must be 150. Let’s check it with cmp instruction. After comparison rax and 150 we check result of comparison, if rax and 150 are not equal (checking it with jne) we go to .exit label, if they are equal we go to .rightSum label.
现在我们有两个标签:.exit和.rightSum。首先是设置60到rax,它是退出系统的调用码,0是rdi,它是一个退出码。第二个是.rightSum ,非常简单,它只是打印出"Sum is correct"。
Now we have two labels: .exit and .rightSum. First is just sets 60 to rax, it is exit system call number, and 0 to rdi, it is a exit code. Second is .rightSum is pretty easy, it just prints Sum is correct.