My Project
|
interface to functions to tokenize a line of LC3 source code More...
Go to the source code of this file.
Macros | |
#define | MAX_LINE_LENGTH 8180 |
#define | MAX_TOKENS 10 |
Functions | |
void | tokens_init (void) |
char * | tokenize_line (char *line) |
char * | next_token (void) |
void | print_tokens (void) |
int | token_count (void) |
char * | get_token (int index) |
void | tokens_term (void) |
One of the first steps in converting a "high" level language to some other form is to tokenize the input stream. This process is also know as lexical analysis. It involves breaking the input into into a list of "words" that are significant in terms of the syntax of the language.
For a laguage such as C, this means identifying keywords, numbers, user defined names, and all the punctuation that is used (e.g. (){}+*-/,; ...). In languages like C, there can be multiple "statements" on a single line and "statements" can span multiple lines.
The LC3 assembly language is much simpler. Every statement is contained on a single line of the file. The only punctuation used is the comma used to separate multiple operands. The most complex statement is of the form:
label opcode operand1, operand2, operand3
This code is provided to reduce the work in completing your assembler project. For more details, see this description from Wikipedia. If you take a compiler class like cs453, you will learn a lot more about lexical analysis, and how to use tools that will generate the code for you from a language description in a text file.
#define MAX_LINE_LENGTH 8180 |
Maximum length of source line
#define MAX_TOKENS 10 |
Max token in LC3 line, plus a few more to handle bad syntax
char* get_token | ( | int | index | ) |
Get a specified token from the line
index | - which token to return |
char* next_token | ( | void | ) |
Return the next token for the list generated by tokenize_line()
void print_tokens | ( | void | ) |
Print the tokens of the line. This is for debugging purposes.
int token_count | ( | void | ) |
Return the number of tokens in the current line
char* tokenize_line | ( | char * | line | ) |
Convert a single line of LC3 source code into a list of tokens and return the first one. Susequent tokens are retrieved using next_token()
. The function recoginizes the semi-colon as the LC3 end of line comment and discards all the comment. Tokens are separated by whitspace or commas. The commas are returned as part of the list.
line | - the source code line |
void tokens_init | ( | void | ) |
Initialze the module
void tokens_term | ( | void | ) |
Terminate the module