Building an Assembler - Nand2Tetris Unit 06
A big shift in gears for this unit - from hardware to software. Writing the assembler which will translate the .asm files to .hack machine files.
It wasn’t as challenging as the CPU for example. I guess my years of writing software really did help me out here.
I decided to challenge myself and use Python as my language of choice. It was good to freshen up and also try and use some Python classes and learn some good Python CLI Project structure.
What I Built this Unit
- A Python CLI Assembler that translates .asm assembly code to .hack machine code.
Actually not a lot more.
What Did I learn
First of all, quick refresher on Python. That was great.
I learned that in the lowest form, it’s basically just string manipulation. When that hit me, it wasn’t really that hard to finish. It was just a matter of piecing it together and make the program as readable and functional as possible.
I learnt how to structure and assembler and how it actually works. The symbol table that holds all the data/variables.
I learnt that we must build the assembler using a two-pass algorithm.
1st Pass:
- Scan the file line by line
- Find and save all labels (LABEL)
- Save the label in the symbol table based on the instruction address
- Trim all comments and whitespace
2nd Pass:
- Go through each line again
- Handle the A & C instructions
- Replace symbols from the symbol table with their instruction address.
- Save the binary 16-bit binary instruction
This is the symbol table before the first pass. It’s just a hash map or Python dictionary.
def __init__(self):
"""
Initialize dictionary with predefined Hack platform symbols
"""
self.symbols = {
"SP": 0,
"LCL": 1,
"ARG": 2,
"THIS": 3,
"THAT": 4,
"R0": 0,
"R1": 1,
"R2": 2,
"R3": 3,
"R4": 4,
"R5": 5,
"R6": 6,
"R7": 7,
"R8": 8,
"R9": 9,
"R10": 10,
"R11": 11,
"R12": 12,
"R13": 13,
"R14": 14,
"R15": 15,
"SCREEN": 16384,
"KBD": 24576,
} The A instruction was rather simple.
It takes either the number or symbol and translates it to it’s corresponding address.
- @100 -> 0000000001100100
- @i -> Looks in the symbol table and gets the address. First variable declarations start from RAM[16]
- @i // First variable -> RAM[16] -> 0000000000010000
- @sum // Second variable -> RAM[17] -> 0000000000010001
- @temp // Third variable -> RAM[18] -> 0000000000010010
The C instruction was a little bit different. This was more about parsing the instruction.
As we know from the course material it’s all based on the ‘111 a cccccc ddd jjj’ syntax.
- comp -> 7 bits (a + cccccc)
- dest -> 3 bits (d1 d2 d3)
- jump -> 3 bits (j1 j2 j3)
Since we have all the instructions from the course material, it was rather simple to add all the data to the project and just fetch the correct instruction.
This is the jump table.
JUMP_TABLE = {
"": "000",
"JGT": "001",
"JEQ": "010",
"JGE": "011",
"JLT": "100",
"JNE": "101",
"JLE": "110",
"JMP": "111",
} It made everything a lot simpler, and it was just a matter of splicing and splitting the instructions and getting the correct binary.
For example. D=D+1 -> comp=D+1, dest=D, jump=null -> 111 0 011111 010 000
Program Structure
The assembler is structured in different modules
Parser Module
- Cleans comments & whitespace
- Identifies the instruction type
- Extracts the correct mnemonics based on the instruction
Symbol Table Module
- Dictionary with pre-populated data
- Adds labels and instructions in the first pass
- Adds new variables in the second pass starting from RAM[16]
Code Module
- Returns the binary from each dest, comp, jump instruction
Assembler Module (Main Loop)
- First pass
- Second pass
- Write output to .hack
Here’s an example of an assembled program Max.hack.
0000000000000000 // @R0
1111110000010000 // D=M
0000000000000001 // @R1
1111010011010000 // D=D-M
0000000000001010 // @OUTPUT_FIRST (address 10)
1110001100000001 // D;JGT
0000000000000001 // @R1
1111110000010000 // D=M
0000000000001100 // @OUTPUT_D (address 12)
1110101010000111 // 0;JMP
0000000000000000 // @R0
1111110000010000 // D=M
0000000000000010 // @R2
1110001100001000 // M=D
0000000000001110 // @INFINITE_LOOP (address 14)
1110101010000111 // 0;JMP What Surprised Me?
That it was actually rather simple. Like many things in software(and in the world), it becomes a lot simpler and comprehensible when you break it down into smaller tasks.
Conclusion
I’ve now built a computer that can take instructions in form of hack machine language. This is rather cool I think.
This was the end of Part 1 of Nand2Tetris. What a great course! I know that Part 2 focuses way more on the software part of everything and how we will build upon all we’ve done in Part 1. I’m looking forward to it, but I’m going to take a break and focus on some other topics before attacking Part 2.
This has been great so far and will recommend it for everyone who has an interest in understanding computer architecture.
All the best and never stop learning,
See you around!