4 Registers
I talked about registers in the preceding chapters. The idea of registers goes back to the earliest digital computer designs, although they might have been called accumulators, basically storage for intermediate results of mathematical operations. As previously mentioned, registers act like memory in that values can be stored into them and retrieved from them, but this happens very fast in modern computers since the registers are part of the CPU. Registers are essential to undertanding how CPUs work and they are a key part of assembly language programming, but they’re almost entirely hidden from programmers using higher-level languages.
The ARM-based M2 chip in my Mac has about 41 registers in what’s called the AArch64 state 1:
In AArch64 state, the following registers are available:
- Thirty-one 64-bit general-purpose registers X0-X30, the bottom halves of which are accessible as W0-W30.
- Four stack pointer registers SP_EL0, SP_EL1, SP_EL2, SP_EL3.
- Three exception link registers ELR_EL1, ELR_EL2, ELR_EL3.
- Three saved program status registers SPSR_EL1, SPSR_EL2, SPSR_EL3.
- One program counter.
The general purpose registers are what we’ve seen the most of so far. The program counter appears in every program, even if you don’t realize it. This register points at the current instruction in memory, so it gets automatically changed on jumps/branching. I’ll discuss the other types of registers in future episodes.
The general purpose registers are unusual in this architecture, because they have both W versions and X versions. The W version is the lower 32 bits of X register, but it’s important to know that the W registers act like 32-bit registers, not just like the lower-order bits of the X register. For example, consider this short program:
; wregs.s
.global _start
.align 4
_start:
MOVZ W0, #0x8405
MOVK W0, #0x0808, LSL 16
MOV W3, #111
MUL W4, W3, W0
BRK #2
This multiplies the number 134775813 (hex 0x08088405) times 111, which gives 14960115243 (the reason for using these apparently weird numbers will eventually become clear). The two operands are put in the W0 and W3 registers and the result ends up in register W4.
So let’s assemble and run this program.
$ as -o wregs.o wregs.s
$ ld -o wregs wregs.o -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path` -e _start -arch arm64
$ lldb ./wregs
(lldb) target create "./wregs"
Current executable set to '/Documents/Dev/hdmcw/wregs' (arm64).
(lldb) run
Process 40327 launched: '/Documents/Dev/hdmcw/wregs' (arm64)
Process 40327 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BREAKPOINT (code=1, subcode=0x100003fb0)
frame #0: 0x0000000100003fb0 wregs`start + 16
wregs`start:
-> 0x100003fb0 <+16>: brk #0x2
0x100003fb4: udf #0x1
0x100003fb8: udf #0x1c
0x100003fbc: udf #0x0
Target 0: (wregs) stopped.
(lldb) re read w4
w4 = 0x7bb13e2b
(lldb) re r x4
x4 = 0x000000007bb13e2b
So, as you can see the value in W4 is 0x7bb13e2b or 2075213355, just like we said.
Psych! It’s not even close to what we said it would be, which is just up above there (14960115243).
Given the setup, you’ve probably already figured out the problem. The binary rendering of 14960115243 is 1101111011101100010011111000101011, which you can see is 34 bits (lol, i had to copy and paste it into wc). If you lop off the top two bits you’re left with 2075213355. Interestingly, you’ll get the same number if you do the following computation in Python:
(0x08088405 * 111) % 2**32Do you see why? This is important. It will be on the test.
Also notice that the value of X form register is the same, just with 4 bytes of zeros in the left part, which means that when we do the addition in the into W4 it’s not just putting the leftmost two bits into the upper part of the register.
There’s one oddity here that I have to clear up. You’ll notice that that W0 register is loaded with two instructions (MOVZ and MOVK, which you can think of as “move and zero” and “move and keep”) in order to get one number into the register (0x08088405). The first instruction loads the lower two bytes and zeros out the rest of the register. The second instruction loads the leftmost two bytes by left-shifting the value (0x0808), but keeps the other bits alone. This is because a “logical immediate”, a literal number used in the instruction, can only be so many bits long. Why?
To answer, let’s first disassemble our program in lldb:
(lldb) di -b
wregs`start:
0x100003fa0 <+0>: 0x529080a0 mov w0, #0x8405 ; =33797
0x100003fa4 <+4>: 0x72a10100 movk w0, #0x808, lsl #16
0x100003fa8 <+8>: 0x52800de3 mov w3, #0x6f ; =111
0x100003fac <+12>: 0x1b007c64 mul w4, w3, w0
-> 0x100003fb0 <+16>: 0xd4200040 brk #0x2
The hex values to the left of the instruction are the actual machine language instructions2. You can see that all of the op codes are 4 bytes (32 bits). What this implies is that the literal data value has to fit in the opcode. For example the binary values of the opcode and the literal 0x8405 are shown below, with the value aligned:
01010010100100001000000010100000 (0x529080a0)
1000010000000101 (0x8405)
So, the size of the literal has to be limited so that it can fit into the opcode. The other bits in there describe the instruction, the register and various other things that I’ll describe later.
The program counter register (usually called PC) points, as noted above, to the memory address of the instruction currently being executed. The assembler won’t let you MOV a value directly into the PC register (in AArch64 state), which is a bummer because you could wreak all sorts of havoc. However, you can cause a change in the program counter by using one of the branching instructions (eg, B.NE in our first episode).
The other registers above I will cover at a later time, when we start to look at making calls to subroutines.
Note that MOVZ is an alias for MOV so when disassembling, MOVZ might be shown as just MOV.↩︎