CS 270
cs270 Programming Assignment - LC3 Assembler

Essentials

Due: see Progress page for due date
Key: Use the key P8 for checkin

Acknowledgment

This assignment is patterned after this assignment by Milo Martin when he was at the University of Pennsylvania. Used with permission.


Goals of Assignment

In this assignment, you will complete an assembler for the LC3 assembly language by completing the file assembler.c. You will reuse code that you wrote in previous assignments, add new code and integrate with code provided to you. Some of that code is C source code. Other parts are just a library. You are given header files so that you know what functionality is provided and can call functions in the library even though you do not have the source code. This assignment serves several purposes:
  • Learn how to translate an assembly language to binary code
  • Learn how to do simple file I/O in C
  • Learn to decompose functionality into smaller pieces
  • Practice working with structures and pointers
  • Practice working on a larger project
  • Practice integrating your work into existing code

Overview of an assembler

An assembler is a program which translates assembly language statement to code. It must translate ADD R1,R2,R3 to the hex code 1283. It is like a compiler except that the language it deals with is much simpler than a high level language like Java or C. Several things make assembly language easier to deal with:
  • Every statement is on a single line of source code
  • There is at most one statement per source line
  • The syntax is very regular and is quite simple.
For the LC3 assembly language the syntax is:

    [optional label] opcode operand(s) [; optional end of line comment]
The assembler reads the source code a line at a time, analyzes each line and produces the output file(s) required to run the program.
  • an object file containing the code (.obj, .hex)
  • a symbol table file (.sym)
Because the assembly code may contain references to labels that have not yet been encountered (e.g. a branch to a location later in the code), the assembler normally makes two passes over the "code".

The first pass of the assembler must, at a minimum, do two things:

  1. Verify that each line hast the correct syntax. This involves determining the LC3 opcode (if any) and that the operands are correct in number and type for that opcode. If a line is empty or contains only a comment, simply continue with the following line. For an actual code lines, you must determine how much space this instruction will take. Most instructions only take one word and in those cases, the address is simply incremented. Other pseudo-ops may update the address differently.
  2. Whenever a line contains a label, insert it and its address into the symbol table. This is required so that the PCoffset for the LD/ST/LDI/STI/BR/JSR/LEA opcodes can be computed in the second pass.
Additionally, the first pass may choose to store results from the syntax analysis for use by the second pass. This will make the second pass easier. It requires building a list of information about each instruction.

Alternatively, you can skip storing this information and only create the symbol table. Then, in the second pass, the source file is re-read and syntactic analysis performed again. At this point, there are no syntactic errors, because they would have been found in the first pass. This approach requires reading the source file twice.

The second pass of the assembler is responsible for generating the object code for the .asm file. The actual work depends on how the first pass was structured. It may:

  • scan a data structure created with the first pass, or
  • re-read the source file and reprocess each line.
In either case, it must generate the LC3 word(s) that are required for each instruction. This involves creating the correct 16-bit bit pattern(s) that defines the instruction. When an LD/ST/LDI/STI/BR/JSR/LEA instruction is encountered, the code needs to compute the PCoffset, determine if it is in range and insert it into the bit pattern. Offsets out of range and references to undefined labels are reported and are the only errors generated by during the second pass.


Getting Started

Part of this assignment will involve using the Symbol you wrote in an earlier assignment. You will be provided with a library containing the code. However, you will need to remember how to use that module in conjunction with this project.
  1. Create a directory for this assignment and cd there
  2. Download the file
  3. Unpack the lc3asm.XXX.tar file. WARNING: this tar ball will spew files into the current directory. It does not unpack into a subdirectory.
    
        tar -xvpf lc3asm.XXX.tar
  4. Review the Makefile and make sure the variable GCC is appropriate for your C compiler.
  5. Build the executable by typing make. There should not be any errors or warnings. If you get an error that looks like this: usr/bin/ld: lc3as.a(util.o): relocation R_X86_64_32 against symbol `lc3_instruction_map' can not be used when making a PIE object; recompile with -fPIC

    First change the permissions on the Makefile so you can edit it.
    chmod u+w Makefile

    Then add the flag -no-pie to the LD_FLAGS in the Makefile.

You are now ready to begin implementation. However, before writing ANY code, read through the util.h, tokens.h, symbol.h, lc3.h and assembler.h documentation to get a feel for the functionality that is provided for you. Then on paper make yourself a map of all the components in this project, what functionality each provides, and how they relate to one another, this will make writing the remaining code much easier.

For this project you only need to implement 4 functions and potentially modify one additional function in assembler.c

Once you have a feel for the code/project a good place to start would be implementing the asm_pass_one() function as described below and in assembler.h


Implementing asm_init(), asm_term()

Figure out what global variables (if any) and modules need to be initialized. Add code to do so. As you work on your assembler, you may need to add more code to this function. Similarly, figure out what must be cleaned up, and add code to asm_term(). This is probably not the function to start with as you probably won't know exactly what to initialize without working on asm_pass_one.


Implementing asm_pass_one(), phase 1

Study the documentation for this routine which can be found in assembler.h. You may translate this outline directly into code. When you encounter a label, add it to the symbol table with an address of 0. For initial testing, you might skip steps 5.2 and 5.4. Simply save the source line in the field reference using the function strdup(). Each time you read a line that contains code, create a lin_info_t, link it into the list and print it using asm_print_line_info() if the variable printPass1 is non-zero. It is set to a zon-zero value by running the assembler with the -pass1 flag. To test your code, make the assembler and run it with a small assembly file(s). The name of your assembler is mylc3as. What you should get is two things:
  • a symbol table file (.sym) with a header, and symbols. The addresses will all be 0.
  • a list of the lines of the source file (with line numbers). The only lines that you should see are those that contain a valid opcode. This list will be printed if you run mylc3as -pass1
To understand how to convert a line into tokens, make testTokens, and run this program. Study the source of testTokens.c to see how to do it. Understand what happens with blank lines and lines that contain only comments.

You can see from step 5.3 that this assembler is building a data structure that will be re-used in the second pass. This will let you practice your C dynamic memory management skills.


Implementing asm_pass_one(), phase 2

You are now ready to add a bit more. Specifically, you will be verifying the syntax of each line. Make the executable seeLC3 and run it. Try different LC3 opcodes and see what the program prints. Then study the code in seeLC3.c to understand how it works. Use of the code and ideas presented in this file is optional. You may find it easier to write your own code for verifying the syntax.

You now have a model of how to determine the type(s) of operands expected by any LC3 instruction. Note that many LC3 instructions have common operands. For example, many instructions require one of more registers. Therefore, it may be appropriate to write a helper method that converts a token to a register, and reports an error if the token is not a register. Similarly, if may be useful to have a helper method that gets immediate values. Immediates are used in a variety of instructions. Those instructions differ in the number of bits used to store the value, and some values are signed, while others are unsigned. A helper method can take care of all these cases.

Add code to collect and store each operand into the fields of line_info_t. Create very short assembly language files to test your code. These files will often have .ORIG, .END and a single additional LC3 instruction. Although you may be tempted to start with ADD/AND instructions, they are actually a little tricky. This is because you do not know from the name whether or not the third operand will be an immediate or a register. Only when you encounter the third operand will you know which form the writer used. Compare this to JMP, RET which use two different "names" for the two forms.


Implementing asm_pass_one(), phase 3

Next you will add code to keep track of the address of each instruction. Most instructions take only a single word of memory. The instructions .BLKW/.STRINGZ are the exceptions. Also, .ORIG is handled a little differently. At any rate, your code should now put the address of each label in the symbol table. The symbol table file that you create can be compared to that produced by the regular LC3 assembler.


Implementing asm_pass_one(), phase 4

Now add code for error checking. Depending on how you write your helper routines, you may already have completed a substantial portion of the error checking. Create multiple simple error case files and test them with your assembler.


Implementing asm_pass_two()

You should first write the basic loop to traverse that data structure returned by asm_pass_one(). Simply print the source line. Once you have verified that your basic loop is correct, your code will now need to generate the machine code for each instruction. If you have completed the first pass correctly, then each line_info_t contains the information you need to generate the machine code.

The first step is to copy the prototype from the format field of the information on this line into a variable that will contain the final machine code. The next step is to insert the operand(s) into the correct locations of the machine code. For example, if the line was an ADD with an immediate value, then the fields DR, SR1, immediate are inserted into the appropriate bit locations of the macine code. Different instructions will have different number of operands. When the machine code is complete, write it to the object file using lc3_write_LC3_word(). When you encounter an instruction that uses a PCoffset, you will need to look up the address of the reference in the symbol table and compute the offset.

Note that the .BLKW, .STRINGZ opcodes may generate multiple words in the object file. If you are ever unsure what should be put in the output file, run ~cs270/lc3tools/lc3as -hex file.asm and look at the hex code that it generates (file file.hex).


Extending the assembler (optional)

An assembler is a big step up from machine code. When using an assembler, the programmer can write ADD R1,R2,R3 instead of 1283. As you know, LC-3 assembly language is very limited. In order to make LC-3 programming slightly simpler, we will introduce several pseudo instructions. Pseudo instructions are instructions that are recognized by the assembler but don't actually exist in the machine. For example, LC-3 does not actually support a HALT instruction, but we've been putting them in our assembly code. The assembler recognizes that HALT is not a real instruction and generates a TRAP x25. Note that pseudo instructions do not give programmers any additional power, because anything that can be done with a pseudo instruction can be done with 1 or more regular instructions. They just make programming easier.

Two existing psuedo instructions that generate multiple LC3 instructions are .BLKW and .STRINGZ. You will add a couple of others that will make programming a little simpler.

There are several things that you must do to extend the assembler.

  1. Add a new entry to the opcode_t defined in lc3.h. This adds an "opcode" for the pseudo instruction.
  2. Add a new entry to lc3_instructions[] defined in lc3.c. This defines the number and types of operands for the pseudo instruction.
  3. Add a new entry in lc3_instruction_map[] defined in util.c. This associates the "name" of the pseudo instruction with its "opcode".
  4. Add code to asm_pass_one() to analyze the operands and determine how many LC3 instructions this pseudo instruction will produce. This may not even require any additional code.
  5. Add code to asm_pass_two() to generate the LC3 instructions corresponding to the pseudo instruction and add them to the object file.

NOTE: For the Fall 19 project, steps 1, 2, and 3 have been done for you.

The .SUB pseudo instruction [Optional]

.SUB is just like ADD except that it performs subtraction. Like ADD, the third operand can be a register or an immediate. For reasons that will become clear soon, if the third operand is an immediate, it can only have a value from decimal -15 to 16 (versus -16 to 15 for ADD). If the third operand is an immediate, just use an ADD with the value negated.

When the third operand is a register, one can generate multiple instructions. Consider this example.

using .SUB without using .SUB

.SUB DR,SR1,SR2

NOT SR2,SR2    ; invert bits (part of two's comp)
ADD DR,SR1,SR2 ; DR = SR1 + ~SR2
NOT SR2,SR2    ; restore original SR2 val
ADD DR,DR,#1   ; two's comp - DR = SR1 + ~SR2 + 1

Pretty neat how we performed the subtraction without an additional register, eh? But... There's a problem: if the first and third registers are the same register (.SUB R1,R2,R1), because the second NOT instruction will corrupt the result of the subtraction. In this case, don't emit the second NOT. And there's another problem: if the second and third registers are the same (.SUB R1,R2,R2); when you negate the third register you will corrupt the value of the second register (which is the same as the third). In this case, rather than the 4-instruction sequence above, we'll simply emit an instruction that puts 0 in the DR register (AND DR,DR,#0).

Milo Martin


Checking in Your Code

You will submit the single file assembler.c using the checkin program. Use the key P8. Or use the checkin tab of the course web page.