Programming Assignment P10: CS 270 Computer Organization

Programming Assignment P10 - LC-3 Assembler in C

P10A - Due Monday, Dec. 7 at 11:59pm, no late penalty.

P10B - Due Saturday, Dec. 12 at 11:59pm, no late penalty.

Acknowledgement

This assignment is patterned after this assignment by Milo Martin when he was at the University of Pennsylvania. Used by permission.

Pair Programming

You may (optionally) work with one partner on this assignment, and turn in your joint work. In this case the assembler.c file must contain both names and a description of how the work was shared.

Goals of Assignment

In this assignment, you will complete an assembler for the LC3 assembly language by completing the file assembler.c. You will reuse code that you wrote in previous assignments, add new code and integrate with code provided to you. Some of that code is C source code. Other parts are just a library. You are given header files so that you know what functionality is provided and can call functions in the library even though you do not have the source code. This assignment serves several purposes:

Learn how to translate an assembly language to binary code
Learn how to do simple file I/O in C
Learn to decompose functionality into smaller pieces
Practice working with structures and pointers
Practice working on a larger project
Practice integrating your work into existing code

Overview of an assembler

An assembler is a program which translates assembly language statement to code. It must translate ADD R1,R2,R3 to the hex code 1283. It is like a compiler except that the language is deals with is much simpler than a high level language like Java or C. Several things make assembly language easier to deal with:

Every statement is on a single line of source code
There is at most one statement per source line
The syntax is very regular and is quite simple.

For the LC3 assembly language the syntax is:


    [label] opcode operand(s) [; optional end of line comment]

The assembler reads the source code a line at a time, analyzes each line and produces the output file(s) required to run the program.

an object file containing the code (.obj, .hex)
a symbol table file (.sym)

Because the assembly code may contain references to labels that have not yet been encountered (e.g. a branch to a location later in the code), the assembler normally makes two passes over the "code".

The first pass of the assembler must, at a minimum, do two things:

Verify that each line is syntactically correct. This involves determining the LC3 operator (if any) and that the operands are correct in number and type for that operator. If a line is empty or contains only a comment, simply ignore it and continue with the following line. For an actual code lines, you must determine how much space this instruction will take. Most instructions only take one word and in those cases, the address is simply incremented. Other pseudo-ops may update the address differently.
Whenever a line contains a label, insert it and its address into the symbol table. This is required so that the PCoffset for the LD/ST/LDI/STI/BR/JSR/LEA opcodes can be computed in the second pass, based on information from the symbol table.

Additionally, the first pass must store results from the syntactic analysis to pass this on to the second pass. It does this by building a linked list of instructions with information about each instruction.

The second pass of the assembler is responsible for generating the object code from the .asm file. This is done entirely by traversing the linked list created in the first pass. The file does not need to be read again. The second pass generates the LC3 hex value(s) that are required for each instruction. This involves creating the correct 16-bit bit pattern(s) that defines the instruction. When an LD/ST/LDI/STI/BR/JSR/LEA instruction is encountered, the code needs to compute the PCoffset, determine if it is in range and insert it into the bit pattern. Offsets out of range are reported and are the only errors generated by during the second pass.

Getting Started

This assignment depends on the file symbol.c you wrote in previous assignments. You may need to fix any remaining bugs in those files to complete this assignment.

Create a directory for this assignment and cd there
Download the file appropriate for your OS
- lc3asm.linux.tar
- lc3asm.osx.tar
Unpack the lc3asm.XXX.tar file. WARNING: this tar ball will spew files into the current directory. It does not unpack into a subdirectory.
```
    tar -xvpf lc3asm.XXX.tar
```
Replace the file symbol.c with your version.
Review the Makefile and make sure the variable GCC is appropriate for your C compiler.
Build the excutable by typing make. There should not be any errors or warnings.

You are now ready to begin implementation. However, before writing ANY code, you should make yourself a map of all the components in this project, what functionality each provides, and how they relate to one another. In doing this you are creating a model of the project. Modeling is something you will formally study when you take your Software Engineering course cs314.

Implementing `asm_init(), asm_term()`

Figure out what global variables (if any) and modules need to be initialized. Add code to do so. As you work on your assembler, you may need to add more code to this function. Similarly, figure out what must be cleaned up, and add code to asm_term().

Stubbing in `asm_pass_two()`

This may seem like getting the cart before the horse, but a little work here, will greatly aid in writing/debugging asm_pass_one()

The second pass of the assembler will loop over the data structure created in the first pass and generate code. So, write a loop that traverses the elements of the linked list passed as a parameter and calls asm_print_line_info() on each element.

Implementing `asm_pass_one()`, phase 1

Study the documentation for this routine. You may translate this outline directly into code. When you encounter a label, add it to the symbol table with an address of 0. For initial testing, you might skip steps 5.2 and 5.4. Simply save the source line in the field reference using the function strdup(). To test your code, make the assembler and run it with a small assembly file(s). The name of your assembler is mylc3as. What you should get is two things:

a symbol table file (.sym) with a header, and symbols. The addresses will all be 0.
a list of the lines of the source file (with line numbers). The only lines that you should see are those that contain a valid opcode.

To understand how to convert a line into tokens, make testTokens, and run this program. Study the source of testTokens.c to see how to do it. Understand what happens with blank lines and lines that contain only comments.

You can see from step 5.3 that this assembler is building a data structure that will be re-used in the second pass. This will let you practice your C dynamic memory management skills.

Implementing `asm_pass_one()`, phase 2

You are now ready to add a bit more. Specifically, you will be verifying the syntax of each line. Make the executable seeLC3 and run it. Try different LC3 opcodes and see what the program prints. Then study the code in seeLC3.c to understand how it works.

You now have a model of how to determine the type(s) of operands expected by any LC3 instruction. Note that many LC3 instructions have common operands. For example, many instructions require one of more registers. Therefore, it may be appropriate to write a helper method that converts a token to a register, and reports an error if the token is not a register. Similarly, if may be useful to have a helper method that gets immediate values. Immediates are used in a variety of instructions. Those instructions differ in the number of bits used to store the value, and some values are signed, while others are unsigned. A helper method can take care of all these cases.

Add code to collect and store each operand into the fields of line_info_t. Create very short assembly language files to test your code. These files will often have .ORIG, .END and a single additional LC3 instruction. Although you may be tempted to start with ADD, AND instructions, they are actually a little tricky. This is because you do not know from the name whether or not the third operand will be an immediate or a register. Only when you encounter the third operand will you know which form the writer used. Compare this to JMP, RET which use two different "names" for the two forms.

Implementing `asm_pass_one()`, phase 3

Next you will add code to keep track of the address of each instruction. Most instructions take only a single word of memory. The instructions .BLKW, .STRINGZ are the exceptions. And, .ORIG is handled a little differently. Ay any rate, your code should now put the address of each label in the symbol table. The symbol table file that you create can be compared to that produced by the regular LC3 assembler.

Implementing `asm_pass_one()`, phase 4

Now add code for error checking. Depending on how you write your helper routines, you may already have completed a substantial portion of the error checking. Create multiple simple error case files and test them with your assembler. The assembler.h file has all of the possible errors that you need to handle, and assembler.c has a method that prints errors.

Implementing `asm_pass_two()`

You should already have written the basic loop to traverse that data structure returned by asm_pass_one(). Instead of printing the source line, your code will now need to generate the machine code for each instruction. If you have completed the first pass correctly, then each line_info_t contains the information you need to generate the machine code.

The first step is to copy the prototype from the format field of the information on this line into a variable that will contain the final machine code. The next step is to insert the operand(s) into the correct locations of the machine code. For example, if the line was an ADD with an immediate value, then the fields DR, SR1, immediate are inserted into the appropriate bit locations of the macine code. Different instructions will have different number of operands. When the machine code is complete, write it to the object file using lc3_write_LC3_word(). When you encounter an instruction that uses a PCoffset, you will need to look up the address of the reference in the symbol table and compute the offset.

Note that the .BLKW and .STRINGZ opcodes may generate multiple words in the object file. If you are ever unsure what should be put in the output file, run ~cs270/lc3tools/lc3as -hex file.asm and look at the hex code that it generates.

Extending the assembler (optional)

An assembler is a big step up from machine code. When using an assembler, the programmer can write ADD R1,R2,R3 instead of 1283. As you know, LC-3 assembly language is very limited. In order to make LC-3 programming slightly simpler, we will introduce several pseudo instructions. Pseudo instructions are instructions that are recognized by the assembler but don't actually exist in the machine. For example, LC-3 does not actually support a HALT instruction, but we've been putting them in our assembly code. The assembler recognizes that HALT is not a real instruction and generates a TRAP x25. Note that pseudo instructions do not give programmers any additional power, because anything that can be done with a pseudo instruction can be done with 1 or more regular instructions. They just make programming easier.

Two existing psuedo instructions that generate multiple LC3 instructions are .BLKW and .STRINGZ. You will add a couple of others that will make programming a little simpler.

There are several things that you must do to extend the assembler.

Add a new entry to the opcode_t defined in lc3.h. This adds an "opcode" for the pseudo instructrion.
Add a new entry to lc3_instructions[] defined in lc3.c. This defines the number and types of operands for the pseudo instruction.
Add a new entry in lc3_instruction_map[] defined in util.c. This associates the "name" of the pseudo instruction with its "opcode".
Add code to asm_pass_one() analyze the operands and determine how many LC3 instructions this pseudo instruction will produce. This may not even require any additional code.
Add code to asm_pass_two() to generate the LC3 instructions correponding to the pseudo instruction and add them to the object file.

The `.SETR` pseudo instruction

The .SETR pseudo instruction allows the programmer to initialize a register with an immediate value. Thus, one can write .SETR R1,#100. The end result is that register R1 contains the value 100 and the condition code is set according to the value. The following table shows the two cases.

small value (.SETR DR,#5) large value (.SETR DR,#1000)


AND DR,DR,#0  ; clear the register
ADD DR,DR,val ; put val in register


          ; * represents current address
BR *+2    ; skip over value (code 0x0E01)
xxxx      ; encode large val here
LD DR,*-1 ; load from memory (prototype 0X21FE)
          ; encode DR in bits 11..9 of prototype

When the value is "small", you can take advantage of the immediate field in a ADD instruction and use only two LC3 instructions. If the immediate exceeds to size allowed in ADD, three instructions will be required.

The `.SUB` pseudo instruction

.SUB is just like ADD except that it performs subtraction. Like ADD, the third operand can be a register or an immediate. For reasons that will become clear soon, if the third operand is an immediate, it can only have a value from decimal -15 to 16 (versus -16 to 15 for ADD). If the third operand is an immediate, just use an ADD with the value negated.

When the third operand is a register, one can generate multiple instructions. Consider this example.

using `.SUB`	without using `.SUB`
`.SUB DR,SR1,SR2`	`NOT SR2,SR2 ; invert bits (part of two's comp) ADD DR,SR1,SR2 ; DR = SR1 + ~SR2 NOT SR2,SR2 ; restore original SR2 val ADD DR,DR,#1 ; two's comp - DR = SR1 + ~SR2 + 1`

Pretty neat how we performed the subtraction without an additional register, eh? But... There's a problem: if the first and third registers are the same register (.SUB R1,R2,R1), because the second NOT instruction will corrupt the result of the subtraction. In this case, don't emit the second NOT. And there's another problem: if the second and third registers are the same (.SUB R1,R2,R2); when you negate the third register you will corrupt the value of the second register (which is the same as the third). In this case, rather than the 4-instruction sequence above, we'll simply emit an instruction that puts 0 in the DR register (AND DR,DR,#0).

Checking in Your Code

Your submission includes the assembler.c and symbol.c files only. Do NOT change any other files! You will submit the archive file P10.tar using the checkin program and the P10> key, or the Checkin tab on the course website. The provided Makefile has a target to create the archive file containing both source files, as follows:

  $ make package

Grading Criteria

Detailed grading information for P10A is shown below, P10B grading information will follow shortly. Individual commands are surrounded by .ORIG and .END directives, so these must work. The contents of all test files are shown in the grading output on the Checkin tab.

Part A is worth 100 points, only asm_pass_one will be tested.

Test the following commands (50 points):
2 points - not.asm
3 points - addr.asm, addi.asm, andr.asm, andi.asm
3 points - jmp.asm, jsr.asm, jsrr.asm
3 points - ld.asm, ldi.asm, ldr.asm, lea.asm
3 points - st.asm, sti.asm, str.asm
3 points - ret.asm, trap.asm
Test the branch commands (10 points):
5 points - breasy.asm
5 points - brhard.asm
Test various TRAP commands (10 points):
2 points - IN.asm, OUT.asm, GETC.asm, GETS.asm, PUTS.asm
Test assembler directives (10 points):
2 points - orig.asm
4 points - fill.asm
4 points - blkw.asm
Test an assembly program (20 points):
10 points - code.asm (looks at code.asm output)
10 points - code.asm (looks at code.sym file)

Part B is worth 150 points, grading is based on .hex file.

Test the following commands (100 points):
4 points - not.asm
6 points - addr.asm, addi.asm, andr.asm, andi.asm
6 points - jmp.asm, jsr.asm, jsrr.asm
6 points - ld.asm, ldi.asm, ldr.asm, lea.asm
6 points - st.asm, sti.asm, str.asm
6 points - ret.asm, trap.asm
Test the branch commands (10 points):
5 points - breasy.asm
5 points - brhard.asm
Test assembler directives (10 points):
2 points - orig.asm
4 points - fill.asm
4 points - blkw.asm
Test an assembly program (10 points):
10 points - code.asm
Test error handling (20 points):
2 points - #define ERR_OPEN_READ "could not open '%s' for reading."
2 points - #define ERR_OPEN_WRITE "could not open '%s' for writing."
3 points - #define ERR_EXPECTED_REG "expected register (R0-R7), got '%s'"
4 points - #define ERR_DUPLICATE_LABEL "label '%s' previosly defined"
4 points - #define ERR_MISSING_LABEL "label '%s' never defined"
5 points - #define ERR_BAD_PCOFFSET "PCoffset to '%s' out of range"
Memory management (15 points, extra credit):
15 points - code.asm - check memory management using valgrind

The assembly files used for testing can be downloaded as an archive file here.

Programming Assignment P10 - LC-3 Assembler in C

P10A - Due Monday, Dec. 7 at 11:59pm, no late penalty. P10B - Due Saturday, Dec. 12 at 11:59pm, no late penalty.

Acknowledgement

Pair Programming

Goals of Assignment

Overview of an assembler

Getting Started

Implementing asm_init(), asm_term()

Stubbing in asm_pass_two()

Implementing asm_pass_one(), phase 1

Implementing asm_pass_one(), phase 2

Implementing asm_pass_one(), phase 3

Implementing asm_pass_one(), phase 4

Implementing asm_pass_two()