Essentials
P8B Due: Tuesday, May 3, 2016 at 11:59PM, no late period
Acknowledgement
This assignment is patterned after this
assignment by Milo Martin
when he was at the University of Pennsylvania. Used with permission.
Goals of Assignment
In this assignment, you will complete an assembler for the LC3 assembly
language by completing the file
assembler.c
. You will reuse code
that you wrote in previous assignments, add
new code and integrate with code provided to you. Some of that code is C
source code. Other parts are just a library. You are given header files
so that you know what functionality is provided and can call functions in
the library even though you do not have the source code. This assignment
serves several purposes:
- Learn how to translate an assembly language to binary code
- Learn how to do simple file I/O in C
- Learn to decompose functionality into smaller pieces
- Practice working with structures and pointers
- Practice working on a larger project
- Practice integrating your work into existing code
Overview of an assembler
An assembler is a program which translates assembly language statement to code.
It must translate
ADD R1,R2,R3
to the hex code
1283
.
It is like a compiler except that the language is deals with is much simpler
than a high level language like Java or C. Several things make assembly language
easier to deal with:
- Every statement is on a single line of source code
- There is at most one statement per source line
- The syntax is very regular and is quite simple.
For the LC3 assembly language the syntax is:
[optional label] opcode operand(s) [; optional end of line comment]
The assembler reads the source code a line at a time, analyzes each line and
produces the output file(s) required to run the program.
- an object file containing the code (
.obj, .hex
)
- a symbol table file (
.sym
)
Because the assembly code may contain references to labels that have not yet
been encountered (e.g. a branch to a location later in the code), the assembler
normally makes two passes over the "code".
The first pass of the assembler must, at a minimum, do two things:
- Verify that each line is syntactly correct. This involves determining the
LC3 operator (if any) and that the operands are correct in number and type
for that operator. If a line is empty or contains only a comment, simply
continue with the following line. For an actual code lines, you must
determine how much space this instruction will take. Most instructions
only take one word and in those cases, the address is simply incremented.
Other pseudo-ops may update the address differently.
- Whenever a line contains a label, insert it and its address into the symbol
table. This is required so that the PCoffset for the
LD/ST/LDI/STI/BR/JSR/LEA
opcodes can be computed in the
second pass.
Additionally, the first pass
may choose to store results from the
syntactic analysis and pass this on to the second pass. This will make
the second pass easier. It requires building a list of information about each
instruction.
Alternatively, you can skip storing this information and only create the
symbol table. Then, in the second pass, the source file is re-read
and syntactic analysis performed again. At this point, there are no
syntatic errors, because they would have been found in the first pass.
This approach requires reading the source file twice.
The second pass of the assembler is responsible for generating the object code
for the .asm
file. The actual work depends on how
the first pass was structured. It may:
- Scan a data structure created with the first pass, or
- re-read the source file and reprocess each line.
In either case, it must generate the LC3 word(s) that are required for each
instruction. This involves creating the correct 16-bit bit pattern(s) that
defines the instruction. When an
LD/ST/LDI/STI/BR/JSR/LEA
instruction is encountered, the code needs to compute the PCoffset, determine
if it is in range and insert it into the bit pattern. Offsets out of range
are reported and are the only errors generated by during the second pass.
Implementing asm_pass_two()
You should first write the basic loop to traverse that data structure
returned by
asm_pass_one()
. Simply print the source line.
Once you have verified that your basic loop is correct,
your code will now need to generate the machine code for each instruction. If
you have completed the first pass correctly, then each
line_info_t
contains the information you need to generate the machine code.
The first step is to copy the prototype
from the
format
field of the information on this line into a variable that
will contain the final machine code. The next step is to insert the operand(s)
into the correct locations of the machine code. For example, if the line
was an ADD
with an immediate value, then the fields
DR, SR1, immediate
are inserted into the appropriate bit locations
of the macine code. Different instructions will have different number of
operands. When the machine code is complete, write it to the object file using
lc3_write_LC3_word()
. When you encounter an instruction that uses
a PCoffset
, you will need to look up the address of the reference
in the symbol table and compute the offset.
Note that the .BLKW, .STRINGZ
opcodes may generate multiple
words in the object file. If you are ever unsure what should be put in the
output file, run ~cs270/lc3tools/lc3as -hex file.asm
and look at
the hex code that it generates.
Grading Criteria
Part B is worth 150 points, grading is based on .hex file.
- Test the following commands (100 points):
- 4 points - not.asm
- 6 points - addr.asm, addi.asm, andr.asm, andi.asm
- 6 points - jmp.asm, jsr.asm, jsrr.asm
- 6 points - ld.asm, ldi.asm, ldr.asm, lea.asm
- 6 points - st.asm, sti.asm, str.asm
- 6 points - ret.asm, trap.asm
- Test the branch commands (10 points):
- 5 points - breasy.asm
- 5 points - brhard.asm
- Test assembler directives (10 points):
- 2 points - orig.asm
- 4 points - fill.asm
- 4 points - blkw.asm
- Test an assembly program (10 points):
- 10 points - code.asm
- Test error handling (20 points):
- 2 points - #define ERR_OPEN_READ "could not open '%s' for reading."
- 2 points - #define ERR_OPEN_WRITE "could not open '%s' for writing."
- 3 points - #define ERR_EXPECTED_REG "expected register (R0-R7), got '%s'"
- 4 points - #define ERR_DUPLICATE_LABEL "label '%s' previosly defined"
- 4 points - #define ERR_MISSING_LABEL "label '%s' never defined"
- 5 points - #define ERR_BAD_PCOFFSET "PCoffset to '%s' out of range"
- Memory management (15 points, extra credit):
- 15 points - code.asm - check memory management using valgrind
The code.asm file used in preliminary and final testing is
here.
Checking in Your Code
You will submit the single file
P8B.tar
using the Checkin tab
of the course web page. The file
P8B.tar
is created by performing
make submit
using the new Makefile for this assignment, or by
modifying your current Makefile to rename the tar file from P8A.tar to P8B.tar.