A PHP Compiler, aka The FFI Rabbit Hole
It’s no secret that I’m into building toy compilers and programming languages. Today I’m introducing something that’s not a toy (I hope). Today, I’m introducing php-compiler (among many other projects). My hope is that these projects will grow from experimental status into fully production ready systems.
JIT? AOT? VM? What The Heck?
Since I’m going to be talking a lot about compilers and components in this post, I figure it’s good to start with a primer on how they work, and how the different types behave.
Types of Compilers
Let’s start by talking about the 3 main categories of how programs are executed. (There are definitely some blurred lines here, and you’ll hear people using these labels to refer to multiple different things, but for the purposes of this post):
Interpreted: The vast majority of dynamic languages use a Virtual Machine of some sort. PHP, Python (CPython), Ruby, and many others may be interpreted using a Virtual Machine.
A VM is - at its most abstract level - is a giant switch statement inside of a loop. The language parses and compiles the source code into a form of Intermediary Representation often called Opcodes or ByteCode.
The prime advantage of a VM is that it’s simpler to build for dynamic languages, and removes the “waiting for code to compile” step.
Compiled: The vast majority of what we think of as static languages are “Ahead Of Time” (AOT) Compiled directly to native machine code. C, Go, Rust, and many many others use an AOT compiler.
AOT basically means that the full compilation process happens as a whole, ahead of when you want to run the code. So you compile it, and then some time later you can execute it.
The prime advantage of AOT compilation is that it can generate very efficient code. The (prime) downside is that it can take a long time to compile code.
Just In Time (JIT): JIT is a relatively recently popularized method to get the best of both worlds (VM and AOT). Lua, Java, JavaScript, Python (via PyPy), HHVM, PHP 8, and many others use a JIT compiler.
A JIT is basically just a combination of a VM and an AOT compiler. Instead of compiling the full program at once, it instead runs the code on a Virtual Machine for a while. It does this for two reasons: to figure out which parts of the code are “hot” (and hence most useful to be in machine code), and to collect some runtime information about the code (what types are commonly used, etc). Then, it pauses execution for a moment to compile just that small bit of code to machine code before resuming execution. A JIT runtime will bounce back and forth between interpreted code and native compiled code.
The prime advantage of JIT compilation is that it balances the fast deployment cycle of a VM with the potential for AOT-like performance for some use-cases. But it is also insanely complicated since you’re building 2 full compilers, and an interface between them.
Another way of saying this, is that an Interpreter runs code, where as an AOT compiler generates machine code which then the Computer runs. And a JIT compiler runs the code but every once in a while translates some of the running code into machine code, and then executes it.
Some more definitions
I just used the word “Compiler” a lot (along with a ton of other words), but each of these words have many different meanings, so it’s worth talking a bit about that:
Compiler: The meaning of “Compiler” changes depending on what you’re talking about:
When you’re talking about building language runtimes (aka: compilers), a Compiler is a program that translates code from one language into another with different semantics (there’s a conversion step, it isn’t just a representation). It could be from PHP to Opcode, it could be from C to an Intermediary Representation. It could be from Assembly to Machine Code, it could be from a regular expression to machine code. Yes, PHP 7.0 includes a compiler to compile from PHP source code to Opcodes.
When you’re talking about using language runtimes (aka: compilers), a Compiler is usually implied to be a specific set of programs that convert the original source code into machine code. It’s worth noting that a “Compiler” (like gcc for example) is normally made up of several smaller compilers t
Truncated by Planet PHP, read more at the original (another 77748 bytes)