2 Assembling
In the first episode I showed a program listing in ARM assembly language, but I didn’t describe how to turn this into a runnable program. Unfortunately, there’s no equivalent to debug.com, so you have to type the program into a text file and then run a program called an assembler. According to computer lore, the term assembler was coined in the 1950s but it meant something slightly different but then people and time intervened and now an assembler is a thing that assembles assembly language into machine instructions.
To run the assembler on Mac OS you’ll probably need to install the XCode command line tools, part of Apple’s development platform (or you can install the who shebang from the App Store). You only need the assembler (called as) and other compiler tools for now. You can probably get what you need with the command:
xcode-select --install
If this works, you should have the neccessary programs on your computer, though you might have to set the PATH variable:
$ export PATH=/Library/Developer/CommandLineTools/usr/bin:$PATH
$ which as
/Library/Developer/CommandLineTools/usr/bin/as
Now you should be able to run the following commands to make an executable program (assuming the source file is called beep.s):
as -o beep.o beep.s
ld -o beep beep.o -lSystem -syslibroot `xcrun -sdk macosx --show-sdk-path` \
-e _start -arch arm64
The first command runs the as command to product an object file. We’ll talk more about that soon. The as program on MacOS is really just a shell script that runs the clang compiler, part of the LLVM toolset
The second command, ld, creates the runnable programm called beep. The ld program is called the linker1. We’ll talk more about this in the next episode, but the basic idea is that to make an executable program we need to link our program’s code to system libraries (and sometimes other 3rd-party libraries). Let’s look at the arguments to this command:
-o beep
This specifies the executable program name.
-lSystem
This says to use the System directory to search for libraries to link.
-syslibroot `xcrun -sdk macosx --show-sdk-path`
Self-evident, so i’ll skip this one. Kidding! The -syslibroot option name is the root directory of the system library directories. The next part is trickier though. First you have to notice the backticks (`) around the xcrun command, which means “run this and then substitute the string you get back”. If we run the command by itself, it does something like this:
$ xcrun -sdk macosx --show-sdk-path
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.2.sdk
We could for instance list the contents of the directory with the command:
$ ls `xcrun -sdk macosx --show-sdk-path`
Entitlements.plist SDKSettings.plist usr
SDKSettings.json System
Back to the linker options:
-e _start
This tells the linker that the entry point is where the _start symbol is in our program. Remember the bit at the top of our first program that looked like this:
.global _start // Provide program starting address to linker
.align 4
_start:
-arch arm64
Finally, this tells the linker to produce a binary in the ARM 64 architecture format. This is the default.
Note: quite often you can get away with a much simpler version of the link command, for example the following works for me for this program:
ld -o beep beep.o -e _start
If the above commands work, you should now have a file called beep that you can run (on MacOS, for me anyway, the file already has the execute permission set). Note though that if you do run this the only way to terminate the program is with ^C.
As an alternative to running the program directly from the command line, you can also run it with the lldb debugger, which is part of the Xcode command line tools if you didn’t already have it. This way you can stop the program before it goes into the infinite loop:
$ lldb ./beep
(lldb) target create "./beep"
Current executable set to '/Dev/hdmcw/beep_4vr/beep' (arm64).
(lldb) b beep4eva
Breakpoint 1: where = beep`beep4eva, address = 0x0000000100003f9c
(lldb) run
Process 10410 launched: '//Dev/hdmcw/beep_4vr/beep' (arm64)
Process 10410 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003f9c beep`beep4eva
beep`beep4eva:
-> 0x100003f9c <+0>: mov x0, #0x1 ; =1
0x100003fa0 <+4>: adr x1, 0x100003fb4 ; bel
0x100003fa4 <+8>: mov x2, #0x1 ; =1
0x100003fa8 <+12>: mov x16, #0x4 ; =4
Target 0: (beep) stopped.
(lldb)
One last detail. You’ll notice that the first part of most of the assembly language listings here will have this directive:
.align 4
This tells the loader to align the instructions on a 4-byte boundary. The reason for doing this will be more apparent after we cover the topic of memory, but for now know that you’ll probably need this directive to avoid errors when the program runs (you’ll probably get Bus Error 100, which has nothing to do with public transportation). This can be used in both the code and data sections of our programs, but we’ll go into that in more detail in later chapters.
I assume it’s not called ln because that was already taken. It’s kind of like that Demetri Martin joke about oranges and carrots.↩︎