Essentials

P8B Due: Tuesday, May 3, 2016 at 11:59PM, no late period

Acknowledgement

This assignment is patterned after this assignment by Milo Martin when he was at the University of Pennsylvania. Used with permission.

Goals of Assignment

In this assignment, you will complete an assembler for the LC3 assembly language by completing the file assembler.c. You will reuse code that you wrote in previous assignments, add new code and integrate with code provided to you. Some of that code is C source code. Other parts are just a library. You are given header files so that you know what functionality is provided and can call functions in the library even though you do not have the source code. This assignment serves several purposes:

Learn how to translate an assembly language to binary code
Learn how to do simple file I/O in C
Learn to decompose functionality into smaller pieces
Practice working with structures and pointers
Practice working on a larger project
Practice integrating your work into existing code

Overview of an assembler

An assembler is a program which translates assembly language statement to code. It must translate ADD R1,R2,R3 to the hex code 1283. It is like a compiler except that the language is deals with is much simpler than a high level language like Java or C. Several things make assembly language easier to deal with:

Every statement is on a single line of source code
There is at most one statement per source line
The syntax is very regular and is quite simple.

For the LC3 assembly language the syntax is:


    [optional label] opcode operand(s) [; optional end of line comment]

The assembler reads the source code a line at a time, analyzes each line and produces the output file(s) required to run the program.

an object file containing the code (.obj, .hex)
a symbol table file (.sym)

Because the assembly code may contain references to labels that have not yet been encountered (e.g. a branch to a location later in the code), the assembler normally makes two passes over the "code".

The first pass of the assembler must, at a minimum, do two things:

Verify that each line is syntactly correct. This involves determining the LC3 operator (if any) and that the operands are correct in number and type for that operator. If a line is empty or contains only a comment, simply continue with the following line. For an actual code lines, you must determine how much space this instruction will take. Most instructions only take one word and in those cases, the address is simply incremented. Other pseudo-ops may update the address differently.
Whenever a line contains a label, insert it and its address into the symbol table. This is required so that the PCoffset for the LD/ST/LDI/STI/BR/JSR/LEA opcodes can be computed in the second pass.

Additionally, the first pass may choose to store results from the syntactic analysis and pass this on to the second pass. This will make the second pass easier. It requires building a list of information about each instruction.

Alternatively, you can skip storing this information and only create the symbol table. Then, in the second pass, the source file is re-read and syntactic analysis performed again. At this point, there are no syntatic errors, because they would have been found in the first pass. This approach requires reading the source file twice.

The second pass of the assembler is responsible for generating the object code for the .asm file. The actual work depends on how the first pass was structured. It may:

Scan a data structure created with the first pass, or
re-read the source file and reprocess each line.

In either case, it must generate the LC3 word(s) that are required for each instruction. This involves creating the correct 16-bit bit pattern(s) that defines the instruction. When an LD/ST/LDI/STI/BR/JSR/LEA instruction is encountered, the code needs to compute the PCoffset, determine if it is in range and insert it into the bit pattern. Offsets out of range are reported and are the only errors generated by during the second pass.

Implementing `asm_pass_two()`

You should first write the basic loop to traverse that data structure returned by asm_pass_one(). Simply print the source line. Once you have verified that your basic loop is correct, your code will now need to generate the machine code for each instruction. If you have completed the first pass correctly, then each line_info_t contains the information you need to generate the machine code.

The first step is to copy the prototype from the format field of the information on this line into a variable that will contain the final machine code. The next step is to insert the operand(s) into the correct locations of the machine code. For example, if the line was an ADD with an immediate value, then the fields DR, SR1, immediate are inserted into the appropriate bit locations of the macine code. Different instructions will have different number of operands. When the machine code is complete, write it to the object file using lc3_write_LC3_word(). When you encounter an instruction that uses a PCoffset, you will need to look up the address of the reference in the symbol table and compute the offset.

Note that the .BLKW, .STRINGZ opcodes may generate multiple words in the object file. If you are ever unsure what should be put in the output file, run ~cs270/lc3tools/lc3as -hex file.asm and look at the hex code that it generates.

Grading Criteria

Part B is worth 150 points, grading is based on .hex file.

Test the following commands (100 points):
4 points - not.asm
6 points - addr.asm, addi.asm, andr.asm, andi.asm
6 points - jmp.asm, jsr.asm, jsrr.asm
6 points - ld.asm, ldi.asm, ldr.asm, lea.asm
6 points - st.asm, sti.asm, str.asm
6 points - ret.asm, trap.asm
Test the branch commands (10 points):
5 points - breasy.asm
5 points - brhard.asm
Test assembler directives (10 points):
2 points - orig.asm
4 points - fill.asm
4 points - blkw.asm
Test an assembly program (10 points):
10 points - code.asm
Test error handling (20 points):
2 points - #define ERR_OPEN_READ "could not open '%s' for reading."
2 points - #define ERR_OPEN_WRITE "could not open '%s' for writing."
3 points - #define ERR_EXPECTED_REG "expected register (R0-R7), got '%s'"
4 points - #define ERR_DUPLICATE_LABEL "label '%s' previosly defined"
4 points - #define ERR_MISSING_LABEL "label '%s' never defined"
5 points - #define ERR_BAD_PCOFFSET "PCoffset to '%s' out of range"
Memory management (15 points, extra credit):
15 points - code.asm - check memory management using valgrind

The code.asm file used in preliminary and final testing is here.

Checking in Your Code

You will submit the single file P8B.tar using the Checkin tab of the course web page. The file P8B.tar is created by performing make submit using the new Makefile for this assignment, or by modifying your current Makefile to rename the tar file from P8A.tar to P8B.tar.