assembler.c
. You will reuse code
that you wrote in previous assignments, add
new code and integrate with code provided to you. Some of that code is C
source code. Other parts are just a library. You are given header files
so that you know what functionality is provided and can call functions in
the library even though you do not have the source code. This assignment
serves several purposes:
ADD R1,R2,R3
to the hex code 1283
.
It is like a compiler except that the language it deals with is much simpler
than a high level language like Java or C. Several things make assembly language
easier to deal with:
[optional label] opcode operand(s) [; optional end of line comment]
The assembler reads the source code a line at a time, analyzes each line and
produces the output file(s) required to run the program.
.obj, .hex
).sym
)The first pass of the assembler must, at a minimum, do two things:
LD/ST/LDI/STI/BR/JSR/LEA
opcodes can be computed in the
second pass.Alternatively, you can skip storing this information and only create the symbol table. Then, in the second pass, the source file is re-read and syntatic nalysis performed again. At this point, there are no syntaic errors, because they would have been found in the first pass. This approach requires reading the source file twice.
The second pass of the assembler is responsible for generating the object code
for the .asm
file. The actual work depends on how
the first pass was structured. It may:
LD/ST/LDI/STI/BR/JSR/LEA
instruction is encountered, the code needs to compute the PCoffset, determine
if it is in range and insert it into the bit pattern. Offsets out of range
and references to undefined labels are reported and are the only errors
generated by during the second pass.
symbol.c, util.c
you wrote
in previous assignments. You may need to fix any remaining bugs in those files
to complete this assignment.
cd
therelc3asm.XXX.tar
file. WARNING: this tar ball
will spew files into the current directory. It does not unpack
into a subdirectory.tar -xvpf lc3asm.XXX.tar
symbol.c, util.c
by your versions.Makefile
and make sure the variable GCC
is appropriate for your C compiler.make
. There should not be any
errors or warnings.cs314
.
asm_init(), asm_term()
asm_term()
.
asm_pass_one()
, phase 1reference
using the function strdup()
.
Each time you read a line that contains code, create a lin_info_t
,
link it into the list and print it using asm_print_line_info()
if
the variable printPass1
is non-zero. It is set to a zon-zero
value by running the assembler with the -pass1
flag.
To test your code, make
the assembler and run it with a small
assembly file(s). The name of your assembler is mylc3as
. What you
should get is two things:
.sym
) with a header, and symbols. The
addresses will all be 0.mylc3as -pass1
make testTokens
,
and run this program. Study the source of testTokens.c
to see
how to do it. Understand what happens with blank lines and lines that contain
only comments.
You can see from step 5.3 that this assembler is building a data structure that will be re-used in the second pass. This will let you practice your C dynamic memory management skills.
asm_pass_one()
, phase 2seeLC3
and run it. Try
different LC3 opcodes and see what the program prints. Then study the code in
seeLC3.c
to understand how it works. Use of the code and ideas
presented in this file is optional. You may find it easier to write your own
code for verifying the syntax.
You now have a model of how to determine the type(s) of operands expected by any LC3 instruction. Note that many LC3 instructions have common operands. For example, many instructions require one of more registers. Therefore, it may be appropriate to write a helper method that converts a token to a register, and reports an error if the token is not a register. Similarly, if may be useful to have a helper method that gets immediate values. Immediates are used in a variety of instructions. Those instructions differ in the number of bits used to store the value, and some values are signed, while others are unsigned. A helper method can take care of all these cases.
Add code to collect and store each operand into the fields of
line_info_t
. Create very short assembly language files to test
your code. These files will often have .ORIG, .END
and a single
additional LC3 instruction. Although you may be tempted to start with
ADD/AND
instructions, they are actually a little tricky. This
is because you do not know from the name whether or not the third operand will
be an immediate or a register. Only when you encounter the third operand will
you know which form the writer used. Compare this to JMP, RET
which use two different "names" for the two forms.
asm_pass_one()
, phase 3.BLKW/.STRINGZ
are the exceptions. Also, .ORIG
is
handled a little differently. At any rate, your code should now put the
address of each label in the symbol table. The symbol table file that you
create can be compared to that produced by the regular LC3 assembler.
asm_pass_one()
, phase 4
asm_pass_two()
asm_pass_one()
. Simply print the source line.
Once you have verified that your basic loop is correct,
your code will now need to generate the machine code for each instruction. If
you have completed the first pass correctly, then each line_info_t
contains the information you need to generate the machine code.
The first step is to copy the prototype
from the
format
field of the information on this line into a variable that
will contain the final machine code. The next step is to insert the operand(s)
into the correct locations of the machine code. For example, if the line
was an ADD
with an immediate value, then the fields
DR, SR1, immediate
are inserted into the appropriate bit locations
of the macine code. Different instructions will have different number of
operands. When the machine code is complete, write it to the object file using
lc3_write_LC3_word()
. When you encounter an instruction that uses
a PCoffset
, you will need to look up the address of the reference
in the symbol table and compute the offset.
Note that the .BLKW, .STRINGZ
opcodes may generate multiple
words in the object file. If you are ever unsure what should be put in the
output file, run ~cs270/lc3tools/lc3as -hex file.asm
and look at
the hex code that it generates (file file.hex
).
ADD R1,R2,R3
instead of 1283
.
As you know, LC-3 assembly language is very limited. In order to make
LC-3 programming slightly simpler, we will introduce several pseudo instructions.
Pseudo instructions are instructions that are recognized by the assembler but
don't actually exist in the machine. For example, LC-3 does not actually support
a HALT
instruction, but we've been putting them in our assembly
code. The assembler recognizes that HALT
is not a real instruction
and generates a TRAP x25
. Note that pseudo instructions do not
give programmers any additional power, because anything that can be done with a
pseudo instruction can be done with 1 or more regular instructions. They just
make programming easier.
Two existing psuedo instructions that generate multiple LC3 instructions are
.BLKW
and .STRINGZ. You will add a couple of others
that will make programming a little simpler.
There are several things that you must do to extend the assembler.
opcode_t
defined in
lc3.h
. This adds an "opcode" for the pseudo instructrion.lc3_instructions[]
defined in
lc3.c
. This defines the number and types of operands for
the pseudo instruction.lc3_instruction_map[]
defined in
util.c
. This associates the "name" of the pseudo instruction
with its "opcode".asm_pass_one()
to analyze the operands and
determine how many LC3 instructions this pseudo instruction will produce.
This may not even require any additional code.asm_pass_two()
to generate the LC3 instructions
correponding to the pseudo instruction and add them to the object file..SETR
pseudo instruction.SETR
pseudo instruction allows the programmer to initialize a
register with an immediate value. Thus, one can write
.SETR R1,#100
. The end result is that register R1
contains the value 100
and the condition code is set according to
the value. The following table shows the two cases.
small value (.SETR DR,#5 ) |
large value (.SETR DR,#1000 ) |
---|---|
|
|
When the value is "small", you can take advantage of the immediate field in
a ADD
instruction and use only two LC3 instructions. If
the immediate exceeds to size allowed in ADD
, three instructions
will be required.
.SUB
pseudo instruction.SUB
is just like ADD
except that it performs
subtraction. Like ADD
, the third operand can be a register or an
immediate. For reasons that will become clear soon, if the third operand is an
immediate, it can only have a value from decimal -15 to 16 (versus -16 to 15
for ADD). If the third operand is an immediate, just use an
ADD
with the value negated.
When the third operand is a register, one can generate multiple instructions. Consider this example.
using .SUB |
without using .SUB |
---|---|
|
|
Pretty neat how we performed the subtraction without an additional register, eh? But... There's a problem: if the first and third registers are the same register (.SUB R1,R2,R1
), because the secondNOT
instruction will corrupt the result of the subtraction. In this case, don't emit the secondNOT
. And there's another problem: if the second and third registers are the same (.SUB R1,R2,R2
); when you negate the third register you will corrupt the value of the second register (which is the same as the third). In this case, rather than the 4-instruction sequence above, we'll simply emit an instruction that puts 0 in the DR register (AND DR,DR,#0
).Milo Martin
XXX.tar
using the
checkin
program. Use the key ASM. Or use the checkin
tab of the course web page.