My Project
Macros | Functions
tokens.h File Reference

Defines interface to functions to tokenize a line of LC3 source code (do not modify) More...

Go to the source code of this file.

Macros

#define MAX_LINE_LENGTH   8180
 
#define MAX_TOKENS   10
 

Functions

void tokens_init (void)
 
const char * tokenize_lc3_line (const char *line)
 
const char * next_token (void)
 
void print_tokens (void)
 
void tokens_term (void)
 

Detailed Description

One of the first steps in converting a "high" level language to some other form is to tokenize the input stream. This process is also know as lexical analysis. It involves breaking the input into into a list of "words" that are significant in terms of the syntax of the language.

For a laguage such as C, this means identifying keywords, numbers, user defined names, and all the punctuation that is used (e.g. (){}+*-/,; ...). In languages like C, there can be multiple "statements" on a single line and "statements" can span multiple lines.

The LC3 assembly language is much simpler. Every statement is contained on a single line of the file. The only punctuation used is the comma used to separate multiple operands. The most complex statement is of the form:


  label opcode operand1, operand2, operand3

This code is provided to reduce the work in completing your assembler project. For more details, see this description from Wikipedia. If you take a compiler class like cs453, you will learn a lot more about lexical analysis, and how to use tools that will generate the code for you from a language description in a text file.

Author
Fritz Sieker

Macro Definition Documentation

#define MAX_LINE_LENGTH   8180

Maximum length of source line

#define MAX_TOKENS   10

Max token in LC3 line, plus a few more to handle bad syntax

Function Documentation

const char* next_token ( void  )

Return the next token from the list generated by tokenize_lc3_line()

Returns
the next token or NULL if there are no more tokens
void print_tokens ( void  )

Print the tokens of the line. This is for debugging purposes.

const char* tokenize_lc3_line ( const char *  line)

Convert a single line of LC3 source code into a list of tokens and return the first one. Susequent tokens are retrieved using next_token(). The function recoginizes the semi-colon as the LC3 end of line comment and discards all the comment. Tokens are separated by whitspace or commas. The commas are returned as part of the list.

Parameters
line- the source code line
Returns
the first token of the line or NULL token. The value returned is a static variable whose contents are modified on each call. Therefore, the caller must copy values that need to be be preserved from call to call. For quoted strings used by the .STRINGZ directive, the returned token preserves the opening/closing quote marks, but converts all internal escape sequences into their actual character value.
void tokens_init ( void  )

Initialze the module

void tokens_term ( void  )

Terminate the module