Due: Friday Oct 17th, at 10:00pm

  • Starter code: a5.tar
  • Upload solutions via Gradescope as: puzzle1.asm through puzzle7.asm, two puzzleX.c files, and a readme.txt file (see below)

Goals

This assignment is designed to help you practice the following:

  • identifying patterns representing common structures from higher-level languages
  • tracing possible execution flows through assembly code

Collaboration policy

For this assignment, you may work alone or with a partner, but you must type up all of the answers yourself. (It is therefore unexpected for two submissions to be completely identical.)

You may also discuss the assignment at a high level with other students.

You should list any student with whom you discussed the assignment, and the manner of discussion (high level, partner, etc.) in your readme.txt file.

If you work alone, you should say so instead.

Assessment

The core requirements for your submission are:

  • identify the structures used in each puzzle program
  • identify control flow after jump/call instructions
  • identify possible control-flow paths through the puzzle

The advanced requirements for your submission are:

  • satisfy the core requirements
  • identify parameter counts and sizes
  • reverse-engineer the C code to generate the assembly for two of the puzzles
  • include your name and collaboration statement in your readme.txt file

Background

What does a compiler do?

That question has a long, complicated answer. In brief, though, our compiler (gcc) takes C source files as input and produces an executable program as output. The executable program contains, among other things, machine language instructions whose behavior implements the computations articulated in the original C code.

Machine language is just bits, like anything else in the computer, but this means it’s hard to read. So, if we want to understand the correspondence between C code and its equivalent machine language, we’re better off asking gcc to output assembly language code instead. Assembly isn’t particularly easy to read either, but it’s a lot easier than machine language.

As a general rule, each assembly language instruction corresponds to exactly one machine language instruction, and vice versa. There are some exceptions (e.g., sometimes one assembly language instruction is an alias for a sequence of two or three machine language instructions), but as a rough guide, you can think of assembly and machine code as being in one-to-one correspondence. As a result, by understanding the assembly generated by gcc, we will be very close to understanding the machine code as well.

Assignment overview

For this assignment, you are going to practice understanding the correspondence between simple C code and its equivalent assembly language by solving a sequence of puzzles.

To test your knowledge, you are encouraged (although mostly not required to) try to “reverse engineer” the corresponding C code Although we could use gcc on mantis, we will start by using an extremely handy tool called the Compiler Explorer. You’ll put some C code into the input panel, and the output panel will show you the assembly generated by the selected compiler. As you adjust your C code, you’ll be able to watch the changes in the assembly, and then compare your assembly code to the puzzle’s code.

Your assignment

This assignment comprises seven puzzles, given in assembly. For each puzzle, you will read some given assembly and answer some questions about the corresponding C program.

For “advanced”, you will also need to reverse engineer two of the puzzle programs.

Core

For the core requirements, you must answer some questions as comments in the assembly code for each puzzle. To answer these questions, you’ll need to complete the following:

  1. Fill in the comment at the top of the assembly file, indicating which structure(s) are present in the puzzle:
    • conditional branching (if, if/else)
    • switch statement
    • loop (for, while, do-while)
    • nested loop
    • function call
    • recursive function call
  2. For each # Next: TODO marked in the inline comments, provide the label(s) of the instruction(s) that could be executed immediately after the instruction on the TODO line.
  3. In comments at the top of the assembly file, list three possible orders in which labels could be executed during a single invocation of the function function1.

Advanced

For the advanced requirements, you will additionally need to understand the inputs in each assembly program and do some reverse engineering.

  1. Fill in the comment at the top of the assembly file, listing the registered used for function parameters, both in name and size.
    • For example, if the only parameter is in %rdi and it is always referenced as %edi, you’d write something like %edi (4 bytes). If two registers are used, with the first being its 8-byte version and the second being its 2-byte version, you’d write %rdi (8 bytes) and %si (2 bytes).
  2. The final piece of this assignment is to reverse engineer two programs. For two puzzles of your choice in the given puzzles (puzzle1–puzzle7), create a C program that produces the assembly output provided. (Hint: You will likely find the Compiler Explorer helpful for this part.)

Note that some of the labels were added by us (all of CA, CB, etc.), so those won’t show up in the assembly you see on Compiler Explorer. The LX. labels don’t have to exactly match in numbers, but they should be in the right places, and all assembly instructions should match exactly.

Note: Don’t forget to check the list at the top of this page to make sure you have everything you need (e.g., collaboration statement).

Getting started

Compiler Explorer is a really handy tool for making sense of the assembly corresponding to C code. Here are some quick instructions on getting it set up the way we will use it:

To start, open the Compiler Explorer, click the Settings button near the top of the right half, and deselect “Intel asm syntax”. Your screen should look like this:

Setting up Compiler Explorer

As you work on writing C code to see its assembly, play around with compiler flags – what do you get if you specify the flags -Og or -O0? (Note that -O0 is a capital letter O followed by a zero.)

Before: Before setting compiler flags

After: After setting compiler flags

We will solve puzzle0.asm together in class. Ask any questions you have about this process!

In working through these puzzles, I encourage you to try out tons of simple C programs, for example ones with if statements, with for/while loops, and any other constructs you can think of, to see what they look like in assembly. Don’t be afraid to experiment!

Handing it in

To submit this assignment, you should upload your seven commented puzzleX.asm files to Gradescope.

If you are aiming for the advanced requirements, make sure to include two puzzleX.c files with the appropriate names to match their puzzles (for example, submit puzzle2.c if you reverse-engineered puzzle2.asm).

Finally, you should put your name and collaboration statement in your readme.txt file and submit that, too.

Advice

  • Don’t try to understand every line of assembly! There are a lot of instructions, and this assignment is really just about the structure. Look at the jumps (jmp, jl, je, and any other that start with j) function calls (call), returns (ret), and labels (.L2:, start_here:).

  • It is surprisingly tricky to fully make sense of registers used as inputs. Focus on structure first!

  • Ask questions, and go in order!