Sunday, November 6, 2016

Lexical Analysis

Woah, what a weird phrase! Let's define that.

A lexicon is a dictionary, so lexical pertains to words and vocabulary of a language.
Well, what are the "words" of a programming language? 
They're the smallest sequence of characters that have meaning.
Image result for programming basic code snippet
Looking at the example above, "public" could be one of our words. You see, we can't split it up any more without it becoming nonsense.

Lexical analysis concerning itself with recognizing basic patterns, or tokens, in the code and dividing the code into a list of tokens.

For example, the code above might be split as "public", "class", "Dog", "extends", "Animal", "{", "public", etc.

For humans, this splitting of the code is intuitive, but computers need a little more guidance.

Lexical analysis and its rules can be implemented by a tool called lex, which is a program that makes programs. Confused? Yeah, I was too. Hopefully this helps.

In this diagram, the boxes represent programs and "patterns" is a text file. 
"patterns" contains rules on how to divide the tokens. These rules are passed into the lex program, which generates code for a program that will process input according to the rules in "patterns".
So, lex does not deal with analyzing your code, but creates a program that does.

This is really handy, because you don't need to write code to go through the input - all you need to do is specify some rules and lex will create a lexical analyzer that will split up the code into tokens for you. I love how programmers are so lazy!

One thing is still unclear: how do we write the rules? We will explore regular expressions next post.

Sources:
http://dinosaur.compilertools.net/flex/index.html
http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf

No comments:

Post a Comment