Tuesday, December 13, 2016

Alternatives to Compiling Code

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TIME CAPSULE:
Hi, me in the future! You're probably super embarrassed by me right now but I'm super bored in calc. I hope you are going to graduate
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

My topic for the year is "Compilers".
I've said before that a compiler is a program that turns the code that humans type (public static void main....) into the code that computers can understand (0's and 1's). However, I am generalizing somewhat.
Any code that will be run will need to be translated into 0's and 1's eventually. A compiler is simply one of the methods of translation. Other methods include interpretation and a blend of compilation/interpretation. What differentiates these methods is the time at which translation is completed.


  1. Compilation: The entire program is translated before you want to run the code. The translation is done by a compiler, a program that takes your code as an input and produces the "0's and 1's" as an output
  2. Interpretation: The entire program is translated when you want to run your code. The translation is done by an interpreter, a program that takes your code as an input, analyzing it and mimicking its function with its own pre-written code of "0's and 1's". 
  3. Hybrid: The program is partially translated before run-time and is fully translated during run-time. A compiler will convert the code into an intermediate form, and the interpreter will interpret the intermediate form.
Compiling code and interpreting code have their own advantages and disadvantages, and the third method seeks to find "the best of both worlds" (-1 pt, cliche). 
Image result for compiler process
Advantages of Compiling Code
  1. Running the code is faster. The 0's and 1's produced by the compiler are in the processor's native language. An interpreter acts as an intermediary between the code and processor, and adding a third party always takes a longer time.
  2. Compilers error-check your program. When translating into machine language, the compiler checks syntax to make sure nothing obviously wrong happens when you run the program. If there is an error, the compiler will tell you and there will not be code generated. However, an interpreter will not check the code before it tries to run it, leading to situations where code will be almost fully run before the blatant error at the end is encountered (a great inefficiency).
Advantages of Interpreting Code
  1. Interpreting code is platform-independent. You can use the same code on different computers and have everything work fine. The interpreter (remember, simply a program) will compensate for differences in processors, etc. However, with compilers, the generated machine code is specific to the computer it was compiled on. This means that if you take the output (0's and 1's) from one computer to another, the new computer will quite likely not understand the program.
  2. Interpreters are simple to use. With a compiler, you must run a program (the compiler) to convert your code, then run the executable produced. With an interpreter, you simply give the interpreter your code, and the program will run it for you.
Image result for programming interpreter diagram

Before we move on to the hybrid method, let's look at a diagram (above) that may help show the difference between compilers and interpreters. Our inputs to both are the same. However the compiler outputs further code (the code above is one step away from 0's and 1's) while the interpreter outputs the result of running the code (basically runs the code). If we were to run the compiler's output, we would also get 12.

~


Image result for interpreter diagram
Now, on to the hybrid method. This is employed by one of the most common programming languages in the world: Java.
We can see on the left that both a compiler and interpreter are utilized for the running of the program. The compiler turns the code into an intermediate form, byte-code, that is neither the human form nor computer form. The interpreter (specifically the Java Virtual Machine) then takes the byte-code and runs it.



Advantages of the Hybrid Method

  1. Relatively fast: slower than compiling but faster than interpreting
  2. Machine-portable: The byte-code can be transferred between computers and still work, since the interpreter handles hardware differences between systems
This research has shown me that "Compilers", my topic, has much more depth than previously thought. I might expand my yearlong project to encompass this Hybrid Method that has shown itself to be very efficient.



Sources:
http://stackoverflow.com/questions/3265357/compiled-vs-interpreted-languages
http://www.programmerinterview.com/index.php/general-miscellaneous/whats-the-difference-between-a-compiled-and-an-interpreted-language/
https://thesocietea.org/2015/07/programming-concepts-compiled-and-interpreted-languages/
http://stackoverflow.com/questions/95635/what-does-a-just-in-time-jit-compiler-do
https://www.upwork.com/hiring/development/the-basics-of-compiled-languages-interpreted-languages-and-just-in-time-compilers/

Sunday, November 6, 2016

Lexical Analysis

Woah, what a weird phrase! Let's define that.

A lexicon is a dictionary, so lexical pertains to words and vocabulary of a language.
Well, what are the "words" of a programming language? 
They're the smallest sequence of characters that have meaning.
Image result for programming basic code snippet
Looking at the example above, "public" could be one of our words. You see, we can't split it up any more without it becoming nonsense.

Lexical analysis concerning itself with recognizing basic patterns, or tokens, in the code and dividing the code into a list of tokens.

For example, the code above might be split as "public", "class", "Dog", "extends", "Animal", "{", "public", etc.

For humans, this splitting of the code is intuitive, but computers need a little more guidance.

Lexical analysis and its rules can be implemented by a tool called lex, which is a program that makes programs. Confused? Yeah, I was too. Hopefully this helps.

In this diagram, the boxes represent programs and "patterns" is a text file. 
"patterns" contains rules on how to divide the tokens. These rules are passed into the lex program, which generates code for a program that will process input according to the rules in "patterns".
So, lex does not deal with analyzing your code, but creates a program that does.

This is really handy, because you don't need to write code to go through the input - all you need to do is specify some rules and lex will create a lexical analyzer that will split up the code into tokens for you. I love how programmers are so lazy!

One thing is still unclear: how do we write the rules? We will explore regular expressions next post.

Sources:
http://dinosaur.compilertools.net/flex/index.html
http://epaperpress.com/lexandyacc/download/LexAndYaccTutorial.pdf

Overview of the Compilation Process and Programming Languages

First, some definitions are in order.

Programming language: set of rules to govern communication with a computer.
-Just like many different languages have different ways to express a greeting ("Hello", "Bonjour", etc), the different programming languages have different syntaxes to express certain commands (print a word to the screen, save a variable, etc.)

Compiler: a program that translates the human-readable code into code the machine can understand.

With that out of the way, here's a big scary diagram:

Don't worry, I'll go over each part.

The pre-processor doesn't do too much. It simply takes away parts of the program that are exclusively for human-readability, a.k.a. useless to the computer.

The compiler is where the heavy lifting takes place. Here, the syntax of the language is applied, as the compiler gradually breaks down the code into smaller snippets to analyze its purpose and effect. Once that happens, the compiler finally creates low-level assembly code that follows the source code.

The assembler takes the assembly code and converts it to machine code. The difference between the two is assembly code uses words for commands while machine code uses numbers to represent commands. The conversion is simply replacement in this case.

The linker takes the machine code and combines it with other machine code essential to run a program, creating an executable file that you can double click and run.

The loader and memory are mainly associated with running the program, which I am not concerned with.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This year, I hope to create my own programming language and write a compiler for it.

With designing a language, I can be very creative - this is a programming language in which the code shown will display the words "Hello World!"
brainfuck.JPG
Neat, huh?

While I won't make something this weird, I'll definitely try to put my own twist or quirk in my language.

Sources:
Arora, Himanshu. "Journey of a C Program to Linux Executable in 4 Stages." 
The Geek Stuff. N.p., 05 Oct. 2011. Web. 26 Sept. 2016.
"Compiler Design - Overview." www.tutorialspoint.com. Tutorials Point, n.d. Web. 25 Sept. 2016.
Various Articles. Esolang, the Esoteric Programming Languages Wiki. N.p., n.d. Web. 26 Sept. 2016.