GOALS

  • Get familiar with x86_64 assembly language basics
  • Practice learning how C code constructs get translated into equivalent assembly language

Logistics

This is a paired assignment, so you should work with your assigned partner, if you have one. If you chose not to be assigned a partner, you must work individually (you can’t pick your own partner!). As usual, you can get help as detailed in the collaboration document. You should submit only one assignment to Gradescope for your pair and add your partner to the submission. If for some reason you and your partner can’t make things work, you can split, but you have to let me know asap.

This assignment (i.e. part 1) is due Wednesday April 24th at 10pm. As with all assignments, you will have the opportunity to individually revise this submission based on feedback.

Assessment

To demonstrate proficiency, your submission needs to:

  • Pass the autograder tests (which are checking that your C produces the correct output)
  • Have accurate descriptions for each puzzle

To demonstrate mastery, your submission needs to:

  • Nearly perfectly recreate the provided assembly
  • Have clear and concise descriptions for each puzzle

BACKGROUND

What does a compiler do?

That question has a long, complicated answer. But in brief, our compiler (gcc) takes C sources as input, and produces an executable program as output. The executable program contains, among other things, machine language instructions whose behavior implements the computations articulated in the original C code.

Machine language is just bits, and is thus hard to read. So if we want to understand the correspondence between C code and its corresponding machine language, we’re better off asking gcc to output assembly language code instead. Assembly isn’t particularly easy to read either, but it’s a lot easier than machine language. And as a general rule, each assembly language instruction corresponds to exactly one machine language instruction, and vice versa. There are some exceptions (e.g., sometimes one assembly language instruction is an alias for a sequence of two or three machine language instructions), but as a rough guide, you can think of assembly and machine language instructions as being in one-to-one correspondence. As a result, by understanding the assembly language generated by gcc, we will be very close to understanding the machine language as well.

For this assignment, you are going to practice understanding the correspondence between simple C code and its equivalent assembly language by solving a sequence of puzzles. For each puzzle, you will read some given assembly language and try to come up with the original C code that generated it. This is a simple form of reverse engineering, and it’s pretty fun.

Though we could use gcc on mantis, for this assignment we’re instead going to use an extremely handy tool called the Compiler Explorer. You’ll put some C code into the input panel, and the output panel will show you the assembly language generated by the selected compiler. As you adjust your C code, you’ll be able to watch the changes in the assembly language, and then compare your assembly code to the puzzle’s code.

WHAT YOU SHOULD DO?

In the asm-to-c1-package.tar package, you will find several files named puzzle0.asm, puzzle1.asm, etc. For each puzzle, your job will go like this:

  • Study the puzzleN.asm code to understand what it does. You should try to understand it holistically rather than just line-by-line, so you’ll be able to describe the code’s purpose in a single short sentence.
  • Write an equivalent C function (or sometimes two functions). Then use the Compiler Explorer to compile it to assembly, and see how closely your assembly matches the contents of puzzleN.asm. Refine your code until you feel the match-up is close enough. Exact matches are great, of course, but close matches might also be correct. Very slight changes in source code can make changes in the assembly code, even if the code’s end result is unchanged.
  • Write a one-sentence description of the purpose of the code for the current puzzle (e.g., you can imagine a description like “this function takes one positive integer parameter n and returns the nth prime number”).
  • Put your code in a file named puzzleN.c, and put your name(s) and one-sentence description in the file header at the top of puzzleN.c.

Hand it all into to Gradescope by uploading each of your .c files. Gradescope will run the files to see if they output the same values at the assembly and the human grader will check for how close your C code matches the original source and the quality of your descriptions.

COMPILER EXPLORER SETTINGS

Set it up like this:

Screnshot of Compiler Explorer

HAVE FUN!