HW5 - asm to C Puzzles
Due: Friday Oct 17th, at 10:00pm
- Starter code: a5.tar
- Upload solutions via Gradescope as:
puzzle1.asm
throughpuzzle7.asm
, twopuzzleX.c
files, and areadme.txt
file (see below)
Goals
This assignment is designed to help you practice the following:
- identifying patterns representing common structures from higher-level languages
- tracing possible execution flows through assembly code
Collaboration policy
For this assignment, you may work alone or with a partner, but you must type up all of the answers yourself. (It is therefore unexpected for two submissions to be completely identical.)
You may also discuss the assignment at a high level with other students.
You should list any student with whom you discussed the assignment, and the manner of discussion (high level, partner, etc.) in your readme.txt
file.
If you work alone, you should say so instead.
Assessment
The core requirements for your submission are:
- identify the structures used in each puzzle program
- identify control flow after jump/call instructions
- identify possible control-flow paths through the puzzle
The advanced requirements for your submission are:
- satisfy the core requirements
- identify parameter counts and sizes
- reverse-engineer the C code to generate the assembly for two of the puzzles
- include your name and collaboration statement in your
readme.txt
file
Background
What does a compiler do?
That question has a long, complicated answer. In brief, though, our compiler (gcc
) takes C source files as input and produces an executable program as output. The executable program contains, among other things, machine language instructions whose behavior implements the computations articulated in the original C code.
Machine language is just bits, like anything else in the computer, but this means it’s hard to read. So, if we want to understand the correspondence between C code and its equivalent machine language, we’re better off asking gcc
to output assembly language code instead. Assembly isn’t particularly easy to read either, but it’s a lot easier than machine language.
As a general rule, each assembly language instruction corresponds to exactly one machine language instruction, and vice versa. There are some exceptions (e.g., sometimes one assembly language instruction is an alias for a sequence of two or three machine language instructions), but as a rough guide, you can think of assembly and machine code as being in one-to-one correspondence. As a result, by understanding the assembly generated by gcc
, we will be very close to understanding the machine code as well.
Assignment overview
For this assignment, you are going to practice understanding the correspondence between simple C code and its equivalent assembly language by solving a sequence of puzzles.
To test your knowledge, you are encouraged (although mostly not required to) try to “reverse engineer” the corresponding C code Although we could use gcc
on mantis
, we will start by using an extremely handy tool called the Compiler Explorer. You’ll put some C code into the input panel, and the output panel will show you the assembly generated by the selected compiler. As you adjust your C code, you’ll be able to watch the changes in the assembly, and then compare your assembly code to the puzzle’s code.
Your assignment
This assignment comprises seven puzzles, given in assembly. For each puzzle, you will read some given assembly and answer some questions about the corresponding C program.
For “advanced”, you will also need to reverse engineer two of the puzzle programs.
Core
For the core requirements, you must answer some questions as comments in the assembly code for each puzzle. To answer these questions, you’ll need to complete the following:
- Fill in the comment at the top of the assembly file, indicating which structure(s) are present in the puzzle:
- conditional branching (
if, if/else
) - switch statement
- loop (
for, while, do-while
) - nested loop
- function call
- recursive function call
- conditional branching (
- For each
# Next: TODO
marked in the inline comments, provide the label(s) of the instruction(s) that could be executed immediately after the instruction on the TODO line. - In comments at the top of the assembly file, list three possible orders in which labels could be executed during a single invocation of the function
function1
.
Advanced
For the advanced requirements, you will additionally need to understand the inputs in each assembly program and do some reverse engineering.
- Fill in the comment at the top of the assembly file, listing the registered used for function parameters, both in name and size.
- For example, if the only parameter is in
%rdi
and it is always referenced as%edi
, you’d write something like%edi (4 bytes)
. If two registers are used, with the first being its 8-byte version and the second being its 2-byte version, you’d write%rdi (8 bytes)
and%si (2 bytes)
.
- For example, if the only parameter is in
- The final piece of this assignment is to reverse engineer two programs. For two puzzles of your choice in the given puzzles (puzzle1–puzzle7), create a C program that produces the assembly output provided. (Hint: You will likely find the Compiler Explorer helpful for this part.)
Note that some of the labels were added by us (all of CA
, CB
, etc.), so those won’t show up in the assembly you see on Compiler Explorer. The LX.
labels don’t have to exactly match in numbers, but they should be in the right places, and all assembly instructions should match exactly.
Note: Don’t forget to check the list at the top of this page to make sure you have everything you need (e.g., collaboration statement).
Getting started
Compiler Explorer is a really handy tool for making sense of the assembly corresponding to C code. Here are some quick instructions on getting it set up the way we will use it:
To start, open the Compiler Explorer, click the Settings button near the top of the right half, and deselect “Intel asm syntax”. Your screen should look like this:
As you work on writing C code to see its assembly, play around with compiler flags – what do you get if you specify the flags -Og
or -O0
? (Note that -O0
is a capital letter O followed by a zero.)
Before:
After:
We will solve puzzle0.asm
together in class. Ask any questions you have about this process!
In working through these puzzles, I encourage you to try out tons of simple C programs, for example ones with if
statements, with for/while
loops, and any other constructs you can think of, to see what they look like in assembly. Don’t be afraid to experiment!
Handing it in
To submit this assignment, you should upload your seven commented puzzleX.asm
files to Gradescope.
If you are aiming for the advanced requirements, make sure to include two puzzleX.c
files with the appropriate names to match their puzzles (for example, submit puzzle2.c
if you reverse-engineered puzzle2.asm
).
Finally, you should put your name and collaboration statement in your readme.txt
file and submit that, too.
Advice
-
Don’t try to understand every line of assembly! There are a lot of instructions, and this assignment is really just about the structure. Look at the jumps (
jmp
,jl
,je
, and any other that start withj
) function calls (call
), returns (ret
), and labels (.L2:
,start_here:
). -
It is surprisingly tricky to fully make sense of registers used as inputs. Focus on structure first!
-
Ask questions, and go in order!