About
C JIT x86-64 metaprogramming
An experimental C source-to-source compiler that enables compile-time metaprogramming to be performed via compile-time evaluation through a compile-time JIT compiler. I developed this for my bachelor's thesis at my university, and was awarded the Outstanding Undergraduate Research Prize for this project.
CIR Playground
Try out CIR @ CIR Playground without needing to install it on your computer!
A brief tour of CIR
Getting Started
As of time of writing, CIR only supports Linux running on x86-64. It must be compiled with either GCC or Clang. MSVC is not supported as the source code currently uses some GCC extensions.
First, clone the repository:
git clone https://github.com/pyokagan/cir.git
and compile it:
cd cir
make
The CIR compiler executable ./cir
will be compiled and linked into the current directory.
Compiling programs down to CIR
Here is a program which prints the 20th number in the Fibonacci sequence:
To use it with CIR, we first need to pass it through the C preprocessor to process all preprocessor directives
such as #include
:
gcc -E -o fib.cpp.c fib.c
Note
CIR requires input source files to first be processed with the C preprocessor. For the rest of the section we will assume that this has already been done.
We then pass the preprocessed file (fib.cpp.c
) into CIR:
./cir fib.cpp.c >fib.output.c
The output from CIR is:
As we can see, CIR has compiled fib.cpp.c
into a three-address-code-like representation
that is a subset of the C programming language.
While it may seem at first that the CIR representation is horribly inefficient,
modern optimizing compilers such as GCC will actually compile
both fib.c
and fib.output.c
to the same assembly code:
.LC0:
.string "The answer is: %d\n"
main:
sub rsp, 8
mov eax, 20
mov esi, 1
mov edx, 0
jmp .L2
.L3:
mov esi, ecx
.L2:
lea ecx, [rdx+rsi]
mov edx, esi
sub eax, 1
jne .L3
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
add rsp, 8
ret
However, as can be seen from the assembly listing (the presence of a jne
instruction, indicating a loop),
we are actually still computing the value of fib(20)
at runtime.
Can we do better?
The @
compile-time evaluation operator
CIR extends the C programming language with the compile-time evaluation operator, @
.
Function calls that are prefixed with @
will be evaluated at compile-time.
Here is the modified fib.c
source file with the @
operator added:
#include <stdio.h>
static int fib(int n) {
int a = 0, b = 1, i = 0;
while (i < n) {
int c = b;
b = a + b;
a = c;
i = i + 1;
}
return a;
}
int main(void) {
printf("The answer is: %d\n", @fib(20)); // @ operator added
return 0;
}
CIR now outputs:
extern int printf(char *__restrict __format, ...);
int main(void)
{
int vid66;
vid66 = printf("The answer is: %d\n", 6765); /* sid15 */
return 0; /* sid16 */
}
As we can see, the @fib(20)
call got replaced with 6765
.
What happened?
CIR JIT-compiled
the fib()
function info X86-64 machine code,
executed it,
and then inlined the result (6765
) into the call site.
Arbitrary compile-time evaluation
While full support for the C11 standard is still a work in progress, CIR's JIT aims to be a full-featured C compiler. This means that you can use any C language construct you want, such as conditionals, loops, calling other functions etc.
Furthermore, JIT-compiled code can call external functions and libraries.
This includes C standard library APIs such as
malloc()
, free()
, fopen()
, fwrite()
, printf()
etc.
For example,
here is a function that returns the number of words in a string using the C standard-library API strtok()
:
#include <stdio.h>
#include <string.h>
static int numWords(char *s) {
int numWords = 0;
for (char *p = strtok(s, " "); p; p = strtok(NULL, " "))
numWords = numWords + 1;
return numWords;
}
int main(void) {
printf("Hello World has %d words", @numWords("Hello World"));
return 0;
}
CIR is able to call strtok()
at compile-time, and outputs:
int printf(char *__restrict , ...);
int main(void)
{
int vid145;
vid145 = printf("Hello World has %d words", (int)2); /* sid13 */
return (int)0; /* sid14 */
}
However, this also means that incorrectly-written compile-time functions may cause the CIR compiler to not halt, or even crash. With great power comes great responsibility -- developers must exercise care when writing compile-time functions.
Compile-time metaprogramming
Code evaluated at compile time can call back into the compiler API.
CIR will examine the type of compile-time-evaluated functions:
- When a compile-time function declares its argument(s) to take a code object (
CirCodeId
), CIR will pass the raw code (in CIR form) into the function. - When a compile-time function declares its return type to be a code object (
CirCodeId
), CIR will inline the code object as-is into the call site.
For example, here is a function that receives CirCodeId
as an argument,
and examines the IR contained within:
#include <stdbool.h>
#include "../cir.h" // include compiler API
// Returns true if the code calls a function, otherwise returns false.
static bool callsAFunction(CirCodeId codeId) {
for (CirStmtId stmtId = CirCode_getFirstStmt(codeId); stmtId; stmtId = CirStmt_getNext(stmtId)) {
if (CirStmt_isCall(stmtId))
return true;
}
return false;
}
And can be used as follows:
@callsAFunction(puts("Hi")); // evaluates to true
@callsAFunction(puts(42)); // evaluates to false
Note
Notice how a simple while
loop is sufficient in discovering whether there is a call in a piece of code.
CIR is explicitly designed to be flat so as to make such traversals and manipulation of the IR easy,
without having to use recursive function calls or the visitor pattern which is more difficult to write in C.
Here is another function that returns a code object containing a string constant:
#include <stdio.h>
#include <cir.h> // include compiler API
static CirCodeId generateCode() {
return CirCode_ofExpr(CirValue_ofCString("Inlined String"));
}
int main(void) {
puts(@generateCode());
return 0;
}
And the result is:
int puts(char *);
int main(void)
{
int vid386;
vid386 = puts("Inlined String"); /* sid4 */
return (int)0; /* sid5 */
}
Statement Expressions
Sometimes we may wish to pass blocks of code into compile-time functions.
CIR implements GCC's statement expression extension for this purpose.
Compound statements that are enclosed in parentheses are treated as an expressions.
For example, we can pass a block of code into @callsAFunction
, which we described earlier, as follows:
@callsAFunction( ({
int x = 3, y = 2;
if (y > x)
puts("y > x");
}) ); // still evaluates to true
Loading external libraries
Compile-time-evaluated functions can use loaded external libraries. Consider the following program which compresses a string at compile-time with zlib:
We assume that the zlib header files and library are already installed on the system.
We first pre-process defhello.c
as usual:
gcc -E o defhello.cpp.c defhello.c
and then pass the preprocessed file (defhello.cpp.c
) into CIR.
This time we add the flag -llibz.so
to instruct CIR to load the shared library libz.so
,
making it available to compile-time functions:
./cir -llibz.so defhello.cpp.c >defhello.output.c
The output from CIR is:
As we can see, the CIR output contains the compressed Hello World!
string.
If we compile the CIR output, run the program (capturing its output),
and then decompress it using python, we can get back the original Hello World!
string::
$ gcc -o a.out defhello.output.c
$ ./a.out >output
$ python3
Python 3.7.3 (default, Apr 3 2019, 05:39:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zlib
>>> zlib.decompress(open('output', 'rb).read())
b'Hello World!'
The CIR Compiler API
CIR comes with an extensive low level compiler API for interacting with the compiler during compile-time evaluation.
The compiler API is defined in cir.h
.