Sunday, November 6, 2016

Lexical Analysis

Woah, what a weird phrase! Let's define that.

A lexicon is a dictionary, so lexical pertains to words and vocabulary of a language.
Well, what are the "words" of a programming language? 
They're the smallest sequence of characters that have meaning.
Image result for programming basic code snippet
Looking at the example above, "public" could be one of our words. You see, we can't split it up any more without it becoming nonsense.

Lexical analysis concerning itself with recognizing basic patterns, or tokens, in the code and dividing the code into a list of tokens.

For example, the code above might be split as "public", "class", "Dog", "extends", "Animal", "{", "public", etc.

For humans, this splitting of the code is intuitive, but computers need a little more guidance.

Lexical analysis and its rules can be implemented by a tool called lex, which is a program that makes programs. Confused? Yeah, I was too. Hopefully this helps.

In this diagram, the boxes represent programs and "patterns" is a text file. 
"patterns" contains rules on how to divide the tokens. These rules are passed into the lex program, which generates code for a program that will process input according to the rules in "patterns".
So, lex does not deal with analyzing your code, but creates a program that does.

This is really handy, because you don't need to write code to go through the input - all you need to do is specify some rules and lex will create a lexical analyzer that will split up the code into tokens for you. I love how programmers are so lazy!

One thing is still unclear: how do we write the rules? We will explore regular expressions next post.

Sources:
http://dinosaur.compilertools.net/flex/index.html
http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf

Overview of the Compilation Process and Programming Languages

First, some definitions are in order.

Programming language: set of rules to govern communication with a computer.
-Just like many different languages have different ways to express a greeting ("Hello", "Bonjour", etc), the different programming languages have different syntaxes to express certain commands (print a word to the screen, save a variable, etc.)

Compiler: a program that translates the human-readable code into code the machine can understand.

With that out of the way, here's a big scary diagram:

Don't worry, I'll go over each part.

The pre-processor doesn't do too much. It simply takes away parts of the program that are exclusively for human-readability, a.k.a. useless to the computer.

The compiler is where the heavy lifting takes place. Here, the syntax of the language is applied, as the compiler gradually breaks down the code into smaller snippets to analyze its purpose and effect. Once that happens, the compiler finally creates low-level assembly code that follows the source code.

The assembler takes the assembly code and converts it to machine code. The difference between the two is assembly code uses words for commands while machine code uses numbers to represent commands. The conversion is simply replacement in this case.

The linker takes the machine code and combines it with other machine code essential to run a program, creating an executable file that you can double click and run.

The loader and memory are mainly associated with running the program, which I am not concerned with.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This year, I hope to create my own programming language and write a compiler for it.

With designing a language, I can be very creative - this is a programming language in which the code shown will display the words "Hello World!"
brainfuck.JPG
Neat, huh?

While I won't make something this weird, I'll definitely try to put my own twist or quirk in my language.

Sources:
Arora, Himanshu. "Journey of a C Program to Linux Executable in 4 Stages." 
The Geek Stuff. N.p., 05 Oct. 2011. Web. 26 Sept. 2016.
"Compiler Design - Overview." www.tutorialspoint.com. Tutorials Point, n.d. Web. 25 Sept. 2016.
Various Articles. Esolang, the Esoteric Programming Languages Wiki. N.p., n.d. Web. 26 Sept. 2016.