首页
社区
课程
招聘
[翻译]]ARM汇编简介(一)ARM数据类型和寄存器
2018-6-7 14:58 10376

[翻译]]ARM汇编简介(一)ARM数据类型和寄存器

2018-6-7 14:58
10376
在看雪上发现了一篇 ljcnaix前辈分享的教程,看雪链接如下:https://bbs.pediy.com/thread-220461.htm
我觉得这个教程超级的好,想接着 ljcnaix前辈的工作把翻译继续下去,第一章 ljcnaix前辈已经翻译完成了,于是从第二章开始翻译,我会把中英文都写出来,方便大家对比和鉴别。
因为我也是初学, 会有很多不准确的地方, 希望大家能把错误反馈给我,我会及时修改。

DATA TYPES  数据类型


This is part two of the ARM Assembly Basics tutorial series, covering data types and registers.
这是ARM汇编系列基础教程的第二部分,它主要包含了数据类型和寄存器方面的知识

Similar to high level languages, ARM supports operations on different datatypes.The data types we can load (or store) can be signed and unsigned words, halfwords, or bytes. The extensions for these data types are: -h or -sh for halfwords, -b or -sb for bytes, and no extension for words. The difference between signed and unsigned data types is:
和高级语言类似,ARM支持对不同数据类型进行操作。可以供我们载入(load)或者存储(store)的数据类型可以分为有符号和无符号类型的字,半字,或字节。对这些数据类型的扩展是:半字为-h,-sh,字节为-b或者-sb,字没有扩展。有符号类型和无符号类型的区别是:

Signed data types can hold both positive and negative values and are therefore lower in range.

Unsigned data types can hold large positive values (including ‘Zero’) but cannot hold negative values and are therefore wider in range.

有符号类型可以包含正数和负数,因此它的取值范围较小
无符号类型可以包含正数(包括0),但是不能包含负数,因此他的取值范围更大

Here are some examples of how these data types can be used with the instructions Load and Store:
以下列举了几个例子,来说明指令集是如何存储和载入这些数据类型的

ldr = Load Word  载入字
ldrh = Load unsigned Half Word 载入无符号半字
ldrsh = Load signed Half Word 载入有符号半字
ldrb = Load unsigned Byte 载入无符号字节
ldrsb = Load signed Bytes 载入有符号字节

str = Store Word 储存字
strh = Store unsigned Half Word 储存无符号半字
strsh = Store signed Half Word 储存有符号半字
strb = Store unsigned Byte 储存无符号字节
strsb = Store signed Byte 储存有符号字节

ENDIANNESS 字节顺序


There are two basic ways of viewing bytes in memory: Little-Endian (LE) or Big-Endian (BE). The difference is the byte-order in which each byte of an object is stored in memory. On little-endian machines like Intel x86, the least-significant-byte is stored at the lowest address (the address closest to zero). On big-endian machines the most-significant-byte is stored at the lowest address. The ARM architecture was little-endian before version 3, since then it is bi-endian, which means that it features a setting which allows for switchable endianness. On ARMv6 for example, instructions are fixed little-endian and data accesses can be either little-endian or big-endian as controlled by bit 9, the E bit, of the Program Status Register (CPSR). 
观察内存中的字节数据的两种基本的方式是:小字节顺序(LE)和大字节顺序(BE)。两者不同之处在于,存储区内存中的目标的每个字节以怎样的字节顺序来阅读。小字节序( LE )机制和英特尔X86指令集形式类似,最低有效字节存贮于地址的最低位(越靠近0的地址越低)。而大字节序(BE)的最高有效字节存储在地址的最低位。在第三版之前的ARM架构采用大字节顺序,意味着他可以设置切换字节顺序。以ARMV6为例,指令的固定小字节和数据访问(fixed little-endian and data accesses)既可以是大字节序,也可以是小字节序,程序状态寄存器(CPSR)的第9位,或者叫E位来控制的。


ARM REGISTERS  ARM寄存器


The amount of registers depends on the ARM version. According to the ARM Reference Manual, there are 30 general-purpose 32-bit registers, with the exception of ARMv6-M and ARMv7-M based processors. The first 16 registers are accessible in user-level mode, the additional registers are available in privileged software execution (with the exception of ARMv6-M and ARMv7-M). In this tutorial series we will work with the registers that are accessible in any privilege mode: r0-15. These 16 registers can be split into two groups: general purpose and special purpose registers.

寄存器的数量取决于ARM的版本。根据ARM参考手册可知,有30个32位的通用寄存器(除了 ARMv6-M和ARMv7-M的处理器)。在本基础系列课程中,我们学习的对象是在任意特权模式下都可以访问的寄存器:R0-R15。这16个寄存器可以被分成两组:通用寄存器和特殊功能寄存器。



( 从r0到r11都是通用寄存器 )


(特殊功能寄存器,R12是IP寄存器,内部程序调用寄存器。R13,SP,堆栈指针寄存器。R14,LR,连接寄存器。R15,PC,程序计数器。CPSR,当前程序状态寄存器)


The following table is just a quick glimpse into how the ARM registers could relate to those in Intel processors.

下表是ARM寄存器和英特尔处理器的寄存器存在的相关联性的概览




R0-R12: can be used during common operations to store temporary values, pointers (locations to memory), etc. R0, for example, can be referred as accumulator during the arithmetic operations or for storing the result of a previously called function. R7 becomes useful while working with syscalls as it stores the syscall number and R11 helps us to keep track of boundaries on the stack serving as the frame pointer (will be covered later). Moreover, the function calling convention on ARM specifies that the first four arguments of a function are stored in the registers r0-r3.

R0-R12可以在通常的运算过程中用来存储临时的数据,指针(定位内存)等。以R0为例,当我们执行算数运算或者存储当前函数的返回值时,可以把R0视为累加器。系统调用发生时,R11开始生效,它存储了系统调用数值。R11作为栈指针帮助我们追踪栈的边界(稍后会讲到)。此外,ARM专用的函数调用规则规定了函数的前四个参数应该分别存贮与R0到R3中。


R13: SP(Stack Pointer). The Stack Pointer points to the top of the stack. The stack is an area of memory used for function-specific storage, which is reclaimed when the function returns. The stack pointer is therefore used for allocating space on the stack, by subtracting the value (in bytes) we want to allocate from the stack pointer. In other words, if we want to allocate a 32 bit value, we subtract 4 from the stack pointer.

R13:SP(堆栈寄存器)堆栈寄存器指向栈的栈顶。栈是内存中一块用于存储特定数据的存储区,它用于回收函数的返回值。因此,栈指针别用来在栈空间中分配出区域,如果我们想获得32位的数据空间,我们就让栈指针减去4。


R14: LR(Link Register). When a function call is made, the Link Register gets updated with a memory address referencing the next instruction where the function was initiated from. Doing this allows the program return to the “parent” function that initiated the “child” function call after the “child” function is finished.

连接寄存器,当有一处函数调用时,连接寄存器会用一处内存地址来获取更新,该地址是函数初始化的地方的下一行代码的地址。这么做可以允许“子函数”执行完毕后让程序返回到起“父函数”的起始地址。


R15: PC(Program Counter). The Program Counter is automatically incremented by the size of the instruction executed. This size is always 4 bytes in ARM state and 2 bytes in THUMB mode. When a branch instruction is being executed, the PC holds the destination address. During execution, PC stores the address of the current instruction plus 8 (two ARM instructions) in ARM state, and the current instruction plus 4 (two Thumb instructions) in Thumb(v1) state. This is different from x86 where PC always points to the next instruction to be executed.

R15:PC(程序计数器)。程序计数器会根据指令的大小,在指令被执行时自动增加。一条指令的大小在ARM状态下总是4个字节,在THUMB模式下总是2个字节。当一条分支指令正在被执行时,PC保持存贮着目标地址。当执行该指令时,PC存贮了当前指令的地址加8字节(ARM状态下的两条ARM指令), 或当前指令的地址加4字节(Thumb(V1)模式下的两条Thumb指令的大小)。和x86不同的是,x86下PC永远会指向下一条要被执行的指令。


Let’s look at how PC behaves in a debugger. We use the following program to store the address of pc into r0 and include two random instructions. Let’s see what happens.

我们来看看PC在调试器中的表现形式。我们让接下来的程序,在R0存储PC的地址,并且包含了两条随机的指令。让我们看看究竟会发生什么。


.section .text
.global _start

_start:
 mov r0, pc
 mov r1, #2
 add r2, r1, r1
 bkpt

In GDB we set a breakpoint at _start and run it:

在GDB调试器中我们在 _start设置一个断点,并且运行这个程序


gef> br _start
Breakpoint 1 at 0x8054
gef> run

Here is a screenshot of the output we see first:

以下是我们首先看到的输出的截图



We can see that PC holds the address (0x8054) of the next instruction (mov r0, pc) that will be executed. Now let’s execute the next instruction after which R0 should hold the address of PC (0x8054), right?

我们可以看到,PC里保持着下一条将被执行的指令 (mov r0, pc)的地址0x8054,现在我们来执行下一条指令,并且在此之后R0应该还保持着0x8054的地址,对吗?


…right? Wrong. Look at the address in R0. While we expected R0 to contain the previously read PC value (0x8054) it instead holds the value which is two instructions ahead of the PC we previously read (0x805c). From this example you can see that when we directly read PC it follows the definition that PC points to the next instruction; but when debugging, PC points two instructions ahead of the current PC value (0x8054 + 8 = 0x805C). This is because older ARM processors always fetched two instructions ahead of the currently executed instructions. The reason ARM retains this definition is to ensure compatibility with earlier processors.

。。。真的对吗?显然错了。看看R0中的地址。当我们还预想着R0可以保持之前读取的PC值(0x8054)时,它反而储存了相对之前读取的0x8054之后的两条指令的地址。从这个地址我们可以看出,当我们直接读取PC时,它按照定义,PC指向下一条指令,但是当我们调试程序时,PC却指向当前PC值的下面两条指令的地址处(0x8054+8=0x805C)。这是因为,老款的ARM处理器总是获取当前已经执行的指令的后两条指令的地址。ARM保留着这个定义的原因是为了保证和早期处理器的兼容性。



CURRENT PROGRAM STATUS REGISTER   当前程序状态寄存器


When you debug an ARM binary with gdb, you see something called Flags:

当你用gdb调试ARM的二进制代码时,你能看到一些叫做标志位的东西


The register $cpsr shows the value of the Current Program Status Register (CPSR) and under that you can see the Flags thumb, fast, interrupt, overflow, carry, zero, and negative. These flags represent certain bits in the CPSR register and are set according to the value of the CPSR and turn bold when activated. The N, Z, C, and V bits are identical to the SF, ZF, CF, and OF bits in the EFLAG register on x86. These bits are used to support conditional execution in conditionals and loops at the assembly level. We will cover condition codes used in Part 6: Conditional Execution and Branching

$cpsr寄存器显示出CPSR寄存器的当前值,在他的下面一行,你可以看到标志位 thumb, fast, interrupt, overflow, carry, zero, 以及 negative。这些标志位显示了CPSR寄存器中的某些特定的位,根据CPSR的值,当某个位被激活时,对应的字体会变成粗体。其中,N,Z,C和V标志位分别和x86寄存器的SF,ZF,CF和OF标志位表示的含义一一对应。这些标志位用来在汇编级别条件执行指令和循环指令中支配他们执行。我们会在第六节 条件执行及分支 中覆盖条件代码的知识 。



The picture above shows a layout of a 32-bit register (CPSR) where the left (<-) side holds most-significant-bits and the right (->) side the least-significant-bits. Every single cell (except for the GE and M section along with the blank ones) are of a size of one bit. These one bit sections define various properties of the program’s current state.

上图显示了32位寄存器CPSR的布局,左侧 (<-) 是最高有效位,右侧 (->) 是最低有效位。每一个单独的单元格(除了GE,M和空白的单元格),其大小都是一个位。这些位表示了针对程序当前状态的不同的属性。




Let’s assume we would use the CMP instruction to compare the numbers 1 and 2. The outcome would be ‘negative’ because 1 – 2 = -1. When we compare two equal numbers, like 2 against 2, the Z (zero) flag is set because 2 – 2 = 0. Keep in mind that the registers used with the CMP instruction won’t be modified, only the CPSR will be modified based on the result of comparing these registers against each other.

我们假设,我们要使用CMP指令来比较数字1和数字2。结果会是“负的”因为1-2=-1。当我们比较两个相等的数字,比如2和2比,Z(0)标志位会被置位,因为2-2=0。记住,用于CMP指令的寄存器的指不会改变,只有CPSR会基于这些寄存器里的值的比较运算结果的改变而被编辑。


This is how it looks like in GDB (with GEF installed): In this example we compare the registers r1 and r0, where r1 = 4 and r0 = 2. This is how the flags look like after executing the cmp r1, r0 operation:

下面就是在GDB里显示出来的样子(当GEF被安装时)。在这个例子里,我们比较寄存器r1和r2的值,r1=4,r0=2。下图显示了当执行完 cmp r1, r0运算后这些标志位是怎样显示的


The Carry Flag is set because we use cmp r1, r0 to compare 4 against 2 (4-2). In contrast, the Negative flag (N) is set if we use cmp r0, r1 to compare a smaller number (2) against a bigger number (4).

Carry 标志位被置位,因为我们使用 cmp r1, r0 指令来比较4和2的大小(4-2)。相比之下, 如果使用 cmp r0, r1 来比较一个较小的数(2)和一个较大的数(4),那么负数标志位(N)被置位。


Here’s an excerpt from the ARM infocenter:

以下是从ARM消息中心的摘录下来的


The APSR contains the following ALU status flags:

APSR包含了以下ALU状态标志位:


 N– Set when the result of the operation was Negative.当运算结果为负数时置位

 Z– Set when the result of the operation was Zero. 当运算结果为0时置位

 C– Set when the operation resulted in a Carry. 当运算结果导致进位时置位

 V– Set when the operation caused oVerflow. 当运算结果出现溢出时置位


A carry occurs:以下为产生进位的情况:

1.if the result of an addition is greater than or equal to 232  如果相加的结果大于或等于232时

2.if the result of a subtraction is positive or zero 如果相减的结果是正数或是0

3.as the result of an inline barrel shifter operation in a move or logical instruction. 在赋值操作或者逻辑指令中,进行内联桶式移位操作的结果(???)


Overflow occurs if the result of an add, subtract, or compare is greater than or equal to 231, or less than –231.

当加法,减法或者比较指令的结果大于或者等于231或是小于或者等于-231时会产生溢出。


原文链接:https://azeria-labs.com/arm-data-types-and-registers-part-2/
希望大家多多支持,这会成为我继续更新的动力~~



[培训]《安卓高级研修班(网课)》月薪三万计划

最后于 2018-8-7 17:59 被r0Cat编辑 ,原因:
收藏
点赞1
打赏
分享
打赏 + 3.00雪花
打赏次数 2 雪花 + 3.00
 
赞赏  luoyesiqiu   +2.00 2018/08/06
赞赏  junkboy   +1.00 2018/06/07
最新回复 (19)
雪    币: 11716
活跃值: (133)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
junkboy 2018-6-7 16:06
2
0
支持                     
雪    币: 5907
活跃值: (3107)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
MaYil 2018-6-7 16:25
3
0
不错,  很喜欢中英文对照的风格
雪    币: 8713
活跃值: (8610)
能力值: (RANK:570 )
在线值:
发帖
回帖
粉丝
r0Cat 7 2018-6-7 16:41
4
0
junkboy 支持
感谢感谢
雪    币: 8713
活跃值: (8610)
能力值: (RANK:570 )
在线值:
发帖
回帖
粉丝
r0Cat 7 2018-6-7 16:41
5
0
MaYil 不错, 很喜欢中英文对照的风格
继续发扬~
雪    币: 59
活跃值: (680)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
tinxi 2018-6-9 13:44
6
0
赞一个
雪    币: 3905
活跃值: (5667)
能力值: ( LV12,RANK:200 )
在线值:
发帖
回帖
粉丝
roysue 3 2018-6-10 18:33
7
0
不错
雪    币: 3968
活跃值: (2657)
能力值: ( LV10,RANK:170 )
在线值:
发帖
回帖
粉丝
chpeagle 2018-6-10 19:03
8
0
这一系列文章我看过,很不错.加油
雪    币: 17
活跃值: (11)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
小熊ppt 2018-6-11 08:53
9
0
666支持
雪    币: 689
活跃值: (427)
能力值: ( LV11,RANK:190 )
在线值:
发帖
回帖
粉丝
zplusplus 1 2018-6-11 15:27
10
0
楼主,寄存器的图片是用什么画出来的还是原文附带的?
雪    币: 8713
活跃值: (8610)
能力值: (RANK:570 )
在线值:
发帖
回帖
粉丝
r0Cat 7 2018-6-11 16:03
11
0
zplusplus 楼主,寄存器的图片是用什么画出来的还是原文附带的?
你好,有些是原文附带的,有些是截屏,是为了达到更好的视觉效果才分情况对待的。文末有原文链接,可以进去查看哦
最后于 2018-6-11 16:04 被r0Cat编辑 ,原因:
雪    币: 689
活跃值: (427)
能力值: ( LV11,RANK:190 )
在线值:
发帖
回帖
粉丝
zplusplus 1 2018-6-12 11:41
12
0
amzilun zplusplus 楼主,寄存器的图片是用什么画出来的还是原文附带的? 你好,有些是原文附带的,有些是截屏,是为了达到更好的视觉效果才分情况对待的。文末 ...
好的,谢谢
雪    币: 1
活跃值: (10)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
tsukamaete 2018-6-13 16:31
13
0
挺不错的  但是ARM指令集那么多  够讲好多课了
雪    币: 2676
活跃值: (3425)
能力值: ( LV9,RANK:140 )
在线值:
发帖
回帖
粉丝
luoyesiqiu 3 2018-8-6 17:28
14
0
不错,学习了
雪    币: 207
活跃值: (10)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
小XZ 2018-8-7 10:06
15
0
赞一个
雪    币: 134
活跃值: (377)
能力值: ( LV4,RANK:50 )
在线值:
发帖
回帖
粉丝
SevenSir 1 2018-8-7 10:51
16
0
文中有一处错误,请及时更正
“无符号类型可以包含正数(包括0),但是不能包含正数,因此他的取值范围更大”
应当是 “但是不能包含负数”
雪    币: 8713
活跃值: (8610)
能力值: (RANK:570 )
在线值:
发帖
回帖
粉丝
r0Cat 7 2018-8-7 17:59
17
0
SevenSir 文中有一处错误,请及时更正 “无符号类型可以包含正数(包括0),但是不能包含正数,因此他的取值范围更大” 应当是 “但是不能包含负数”
收到,已改正,感谢提醒
雪    币: 59
活跃值: (680)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
tinxi 2018-12-3 16:25
18
0
很多地方都错了。。
系统调用发生时,R11开始生效,它存储了系统调用数值 
--这里应该是R7
当加法,减法或者比较指令的结果大于或者等于231或是小于或者等于-231时会产生溢出
--这里应该是2^31
--同理上面应该是2^32


雪    币: 121
活跃值: (1501)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
xxRea 2019-4-16 10:41
19
0
哇,中英文对照!
雪    币: 758
活跃值: (78)
能力值: ( LV2,RANK:10 )
在线值:
发帖
回帖
粉丝
壹久玖 2019-11-14 09:59
20
0
学习了!
游客
登录 | 注册 方可回帖
返回