November 4, 2020 in Programming23 minutes
Even though I’ve developed software for a number of years now, there’s one question that has always been in the back of my mind and I haven’t had the time or patience to really answer, until now: What is a binary executable anyways?
For this example, I wrote a brutally simple Rust program that includes a function “sum
” to add two integers together, and am invoking it from main()
:
My Rust code is always structured the “cargo way”, so I can compile my program by running cargo build
, and this will produce a binary for me within the target/debug/
directory. I have named my crate rbin
, so this is the name of the binary that is created at this location:
These days, it’s really easy to take such questions for granted, but if you’re curious, you may be asking:
“But what is that file?”
I mean we all generally know that it’s an “executable”, in that we run it and our program happens. But what does that mean? What is contained in that file that means our computer automatically just knows how to run it? And how is it possible that a program with 7 lines of code can take up over 3 megabytes of disk space?!?
It turns out that in order to create an executable for this ridiculously simple program, the Rust compiler must include quite a bit of additional software to make it possible.
Well, it turns out there is a widely accepted format for these things, called the “Executable and Linkable Format”, or ELF!
Note that I won’t be comprehensively covering ELF here (there are plenty of other resources, many of which I’ll link to) - rather this is an exploration of what goes into a Rust binary with the simplest, default settings, and some observations about what seems interesting to me.
ELF is a well-known, popular format, especially in the world of Linux, but there are plenty of others. Operating systems like Windows and macOS each have their own format, which is a big reason why, when you’re compiling (or simply downloading) software, you have to specify the operating system you want to run it on. This is true despite the fact that the underlying machine code that executes your program may be the same on all of them (e.g. x86_64
).
An exceptional visual breaking down the ELF format can be found at the link to the ELF wikipedia page above, I have found myself constantly referring back to it while writing this post:
Commonly, executable formats like this specify a magic number right at the beginning of the file, so that the format can be easily identified. This occupies the first four bytes in the file header. This is a very important field, because unless we can first identify an ELF file appropriately, we can’t reasonably expect to do anything “ELF-y” with it. We know where certain bits of information should be in an ELF file, but we first must identify using these bytes that this is what we can expect.
The readelf
utility is extremely useful for printing all kinds of useful metadata and related tables of information contained within an ELF file. However, this utility expects - naturally - that the file being read is actually an ELF file, and even provides a helpful hint that the expected “magic bytes” for a non-ELF file aren’t set appropriately when used on a non-ELF file, so it doesn’t attempt to read the rest:
Once identified, the entire rest of the file can be identified using byte offsets (that is, the number of bytes from zero).
For those that are accustomed to looking at network packet captures, this should all sound very familiar to you, as this is exactly how we know where certain fields are located in a packet header. Ethernet frames have a predictable preamble and start-of-frame delimiter. Ethernet also has a field called the “Ethertype”, which provides a clue as to what protocol is contained within the Ethernet frame (which allows computers to then parse those field as well). Just like Ethernet has a standard set of byte offsets that indicate where the various fields should be represented, the ELF format specifies its own offsets for all of the fields providing useful identifying and execution information in the file header, which then point to other important locations within the file.
There’s all kinds of useful information in this header, but in particular, the e_entry
field in the file header points to the offset location from where the execution should start. This is the “entry point” for the program. We’ll definitely be following this down the rabbit hole in a little bit.
We can again use readelf
, this time on a proper ELF file (our Rust program), and also using the -h
flag to show the file header:
So, the “magic number” lets us parse at least the rest of the file header, which contains not only information about the file, but byte-offset locations for other important portions of the file. One of these is the “Start of program headers”, which starts after 64 bytes.
The program header table contains information that allows the operating system to allocate memory and load the program. This is also referred to as a process image. You can think of it as a list of “instructions” that tell the system to do various things with chunks of memory in order to prepare to execute this program.
The readelf
utility also allows us to read the program headers using the -l
flag:
Each program header type does something different to a chunk of memory (segment). Next to each header can be found two 64-bit (this is a 64 bit ELF after all) hexidecimal values. As indicated at the top of the header output, the top value is the memory offset of the segment that the header refers to (where it is located). The value below that is the size of that particular segment in the file.
Each segment is further subdivided into sections, which we’ll get to later. For now, notice the “section to segment mapping” table below the program headers. See how they’re numbered? These numbers correspond to the position of the program headers above. So, the first header (which happens to be of type PHDR
) is referring to segment 00, the second (which happens to be INTERP
) to 01, and so on.
A full summary of program header types can be found here, but a brief explanation of each segment of our actual program and what the corresponding header type is indicating should be done with that segment can be found below:
Segment Number | Header Type | Explanation |
---|---|---|
00 | PHDR | Indicates the location and size of the program header table itself |
01 | INTERP | Provides the location of an interpreter on the system, used for dynamic linking. This allows us to simply use the libraries on the system, rather than having to compile all of these libraries into the binary (this is called static linking) |
02-05 | LOAD | These segments are be loaded into memory. Note that segment 03 has the E flag set, which indicates this is where our executable code lives. |
06 | DYNAMIC | Provides dynamic linking information, such as which libraries on the system the interpreter needs to provide access to at runtime. |
07 | NOTE | This is commonly used to store things like the ABI (and version) used in this program to communicate with the underlying operating system. |
08 | TLS | Thread-local storage |
09 | GNU_EH_FRAME | Frame unwind information. Used for exception handling |
10 | GNU_STACK | Used for explicitly requesting that the stack is executable (note in the output above, this bit is not set) |
11 | GNU_RELRO | Specifies the region of memory that should be read-only once loading (relocation) has taken place |
We now have a better sense for what the Rust compiler feels should be included in the program header table - specifically how Rust recommends our computer prepares itself to run the program that we’ve compiled - again, in the simplest, default case. Some takeaways from this:
readelf
when we saw the INTERP
header type for segment 01, we saw a sneak preview of the interpreter that is being requested, /lib64/ld-linux-x86-64.so.2
. You can actually run this yourself and it will tell you a little bit about itself. Pretty cool!INTERP
and DYNAMIC
header types implies that dynamic linking is the default when compiling Rust programs, which isn’t weird - lots of compiled languages force you to specify if you want a statically linked binary.The section header table is usually located near the end of an ELF file, and its main job is to provide information for linking purposes, but I also found it useful to understand the contents of each section - in particular, the size of each:
If you convert the size of each section from hex to decimal, and add them up, you get 3190428 (bytes), which is really close to the total size of the file of 3198056, as reported by ls
. So with this section table, we can start to get a sense a for where all that data is coming from, and an idea of where to poke around next.
I’ll leave it here for now, but this lesson gives a really good overview of this table.
So now what? What actually executes?
While readelf
does have some flags for inspecting the contents of these sections, we’ll instead use a tool called objdump
which does a much better job of showing the breakdown of each section’s contents, both in raw hexidecimal opcodes and arguments, as well as the interpreted assembly:
Some pointers on the flags I’m passing:
-d
flag instructs objdump
to disassemble all executable sections. This will produce assembly instructions next to the corresponding machine code in the output.-C target/debug/rbin
points to the location of the binary to disassemble.-M intel
specifies the Intel format should be used when interpreting the machine code and displaying into assembly--insn-width=8
is an aesthetic preference for me - I like the machine code to be displayed on one line, and sometimes it can be 8 bytes long (the default is 7)| less
pipes the output to less
, which allows me to look through the output with arrow keys, and move up and down, and search with ease.The
-S
flag interleaves the rust source code within the assembly so we can see exactly what lines of Rust resulted in which lines of machine code. I left this out because I’ll be explaining the relevant lines myself, and it keeps the examples more simple-looking, but definitely use this flag on your own, as it was very helpful for me.
In the file header we saw there was a reference to an entry point in memory. As a reminder, this was 0x5070
. Once the file header and program headers are parsed, and the segments loaded into memory, the computer will start running instructions, beginning at this position. So let’s scroll to that position in the output of objdump
and use that as our starting point. Note that this can be found within the .text
section we looked at just previously:
The middle column (starting with endbr64
) shows the x86_64
instruction being performed on that line. The rightmost column contains parameters to each of these operations.
Near the end, we see that the Rust compiler has provided some helpful hints on where execution moves next. At address 0x5091
we see an instruction with an interesting comment to the right: # 53d0 <main>
. This is Rust giving us a clue that execution moves to the memory offset 0x53d0
. Scrolling down, we can see exactly where this picks up:
Again, we have another hint: # 52f0 <rbin::main>
. This time we scroll up to find that address, and in turn, the actual machine code representing our main()
function:
As you can imagine, there’s a lot to cover here - too much to cover in this post. So we’ll instead look at the instructions that are most relevant to the example code at the beginning of this post. Feel free to take a look at the instructions I don’t cover explicitly - I did, and found it to be instructive.
If you’re new to reading assembler like I am, the good news is there are a lot of guides out there that can help you understand what’s happening here. I found this guide and this summary to be helpful, but there are plenty of others that work well too - just be aware of the assembly syntax you’re using (remember I’m using Intel for this post).
First, we notice at the top of this function’s code, we see the first operation claims 120 (0x78
) bytes of stack space. This is common to see at the top of each function’s block of machine code:
Recall, our program specifies two integer literals (5 and 8) as parameters to our sum
function, so the Rust compiler moves both of these values into the edi
and esi
registers:
We’re almost ready to call our sum
function, but before we can do that we have to do something with the rax
register:
By convention, %rax is used to store a function’s return value, if it exists
The return value from sum
will be stored in the rax
register eventually, but if you look carefully, we’ve already moved a value into rax
further up in our main
function. So, before we call sum
, we should move this value somewhere safe.
Our compiler knows that our sum
function will use 18 bytes of stack space (as we’ll see when we look at the code for the sum
function), so we can safely move this value into a memory location that is 18 bytes offset from the current stack pointer:
Finally we can call our sum function:
Let’s take a look at that memory location (5390
) - I’ll again provide the whole assembly for this function, and then explore the relevant instructions in detail:
As was done with the main()
function, and as alluded just previously, this function allocates 18 bytes of stack space:
Next, we move the values in the edi
and esi
register, which - if you recall from the main
function - store the parameters for our sum
function. A few things to note here - the memory offsets rsp+0x10
and rsp+0x14
are both within the stack space we’ve allocated. In fact, there’s enough room for each location to store a 4-byte value. This makes sense, because both should be exactly that - 32 bit integers!
Next, we add these two values. This operation works by adding the value of the second operand into the first, so after this operation completes, edi
will no longer represent one of our parameters, but rather the the result of our addition operation:
Next, we move the result of this operation now stored in edi
into stack space, and then again from there into the eax
register
Finally, we can hand back our stack space and return to the calling location within the main
function:
Back out at our main function, the operations immediately following our call to sum
move the result stored in eax
into stack space, and then from there into the rax
register.
This was a very limited-scope look at the numerous operations generated by the Rust compiler for what on the surface looks like a very simple operation. Imagine what a complex application must look like! There were plenty of other cases that we didn’t cover, such as those that result in a program crash, so please take the time to use the commands I’ve shown above to look at your own code and learn.
This was a bit of a journey, but hopefully you stuck with it, and learned as much as I did! In summary:
The actual executable format itself is ELF. There are many different formats, but this is a very common one in Linux-based operating systems. There is a tremendous amount of metadata contained within a binary executable that is easily inspected using readily available tools.
The actual machine code is x86 machine code. This instruction set is supported by many hardware platforms, and it is for this reason it is thought to be “hardware independent”. This just means this instruction set isn’t only usable on a specific kind of hardware, but is broadly accepted and supported.
The interpretation of the binary instructions (as hexidecimal opcodes) is Intel. This is less meaningful for the computer and more meaningful for me, as there is no “intel vs att” at the machine code level. This is just an interpretation that allows us to make machine code a bit more readable and writable.
The actual executable code that get placed into the binary, is all up to the compiler. As we’ve seen, even a simple example results in quite a bit of machine code. We did a pretty simple operation, and only worked with stack memory allocations, which are handled pretty well by the compiler. Heap allocations are a bit more complex and require coordination with the operating system. You can imagine that the compiler has to do a lot of work to just get a viable program - never mind all of the additional checks that something like the Rust compiler does to keep us safe from all kinds of memory handling issues. Try disassembling a more complex program and try to follow the various branches of logic through the underlying assembly!
I linked to some helpful resources throughout this post, but here are a few others that I thought were useful that didn’t make it into the contents above: