Past couple of weeks have been a very hectic week for me. I just started my new summer internship, so I needed some time to settle into my new workplace and living space. Although I’ve been lazy on writing my blog, I did not stop working on my compiler. I did have some success putting some things together, such as basic functions, arithmetic, and variable declaration, but making these work was not a smooth journey for me. I was too excited to just start coding, and did not have a solid plan how I would approach the project. I do now have a pretty clear idea of the whole workflow of my compiler, but that came only after I had to learn a valuable lesson to always have plan before the hard way. So I encourage everyone to always remember to plan thoroughly before working on a big project like making a compiler.
Anyways, enough rambling, I would like to talk about what I have so far. My compiler breaks down into 4 big parts: Lexer, syntax tree generator, grammar classifier, and code generator. Today I will talk about the lexer. As we talked about before, lexer takes a string input, and generate corresponding tokens based on keywords in the input. This is the interface I created for a token, where each token will hold the type and the actual value of the keyword:
type Token struct {
Type string
Value []byte
}
For converting a line of code into a list of tokens, I wrote a very simple function that iterates through the keywords and puts a corresponding token into the result token array. As you can see, the way I catch keywords in a line of code is by splitting the code by space. This means writing a code like “foo(a,b)” cannot be processed correctly, so instead, you must write in a format with space in between the keywords like “foo ( a , b )”. This is something that I will work on to improve in the future, but for now, this is good enough to test the rest of my compiler implementation!
func tokenizer(line string) []Token {
words := strings.Fields(line)
tokens := []Token{}
for i, w := range words {
if isOperator(w) {
token := Token{
Type: "Operator",
Value: []byte(w),
}
tokens = append(tokens, token)
} else if isDeclaration(w) {
token := Token{
Type: "Declaration",
Value: []byte(w),
}
tokens = append(tokens, token)
} else if isAssignment(w) {
token := Token{
Type: "Assignment",
Value: []byte(w),
}
tokens = append(tokens, token)
} .....
Besides, some simple helper functions to classify keyword types, this is pretty much all I have for the lexer for now. I will keep you all posted on the progress :)