About

C JIT x86-64 metaprogramming

An experimental C source-to-source compiler that enables compile-time metaprogramming to be performed via compile-time evaluation through a compile-time JIT compiler. I developed this for my bachelor's thesis at my university, and was awarded the Outstanding Undergraduate Research Prize for this project.

View source code @ GitHub

CIR Playground

Screenshot of CIR Playground

Try out CIR @ CIR Playground without needing to install it on your computer!

Try out CIR @ CIR Playground

A brief tour of CIR

Getting Started

As of time of writing, CIR only supports Linux running on x86-64. It must be compiled with either GCC or Clang. MSVC is not supported as the source code currently uses some GCC extensions.

First, clone the repository:

git clone https://github.com/pyokagan/cir.git

and compile it:

cd cir
make

The CIR compiler executable ./cir will be compiled and linked into the current directory.

Compiling programs down to CIR

Here is a program which prints the 20th number in the Fibonacci sequence:

fib.c
#include <stdio.h>

static int fib(int n) {
  int a = 0, b = 1, i = 0;
  while (i < n) {
    int c = b;
    b = a + b;
    a = c;
    i = i + 1;
  }
  return a;
}

int main(void) {
  print("The answer is: %d\n", fib(20));
  return 0;
}

To use it with CIR, we first need to pass it through the C preprocessor to process all preprocessor directives such as #include:

gcc -E -o fib.cpp.c fib.c

Note

CIR requires input source files to first be processed with the C preprocessor. For the rest of the section we will assume that this has already been done.

We then pass the preprocessed file (fib.cpp.c) into CIR:

./cir fib.cpp.c >fib.output.c

The output from CIR is:

fib.output.c
static int vid57_fib(int vid58_n)
{
  int vid59_a;
  int vid60_b;
  int vid61_i;
  int vid62_c;
  int vid63;
  int vid64;

  vid59_a = 0; /* sid1 */
  vid60_b = 1; /* sid2 */
  vid61_i = 0; /* sid3 */
sid4:
  if (vid61_i < vid58_n) goto sid6; /* sid4 */
  goto sid13; /* sid5 */
sid6:
  vid62_c = vid60_b; /* sid6 */
  vid63 = vid59_a + vid60_b; /* sid7 */
  vid60_b = vid63; /* sid8 */
  vid59_a = vid62_c; /* sid9 */
  vid64 = vid61_i + 1; /* sid10 */
  vid61_i = vid64; /* sid11 */
  goto sid4; /* sid12 */
sid13:
  /* nop */ ; /* sid13 */
  return vid59_a; /* sid14 */
}
extern int printf(char *__restrict __format, ...);
int main(void)
{
  int vid66;
  int vid67;

  vid66 = vid57_fib(20); /* sid15 */
  vid67 = printf("The answer is: %d\n", vid66); /* sid16 */
}

As we can see, CIR has compiled fib.cpp.c into a three-address-code-like representation that is a subset of the C programming language.

While it may seem at first that the CIR representation is horribly inefficient, modern optimizing compilers such as GCC will actually compile both fib.c and fib.output.c to the same assembly code:

.LC0:
        .string "The answer is: %d\n"
main:
        sub     rsp, 8
        mov     eax, 20
        mov     esi, 1
        mov     edx, 0
        jmp     .L2
.L3:
        mov     esi, ecx
.L2:
        lea     ecx, [rdx+rsi]
        mov     edx, esi
        sub     eax, 1
        jne     .L3
        mov     edi, OFFSET FLAT:.LC0
        mov     eax, 0
        call    printf
        mov     eax, 0
        add     rsp, 8
        ret

However, as can be seen from the assembly listing (the presence of a jne instruction, indicating a loop), we are actually still computing the value of fib(20) at runtime. Can we do better?

The @ compile-time evaluation operator

CIR extends the C programming language with the compile-time evaluation operator, @. Function calls that are prefixed with @ will be evaluated at compile-time.

Here is the modified fib.c source file with the @ operator added:

#include <stdio.h>

static int fib(int n) {
  int a = 0, b = 1, i = 0;
  while (i < n) {
    int c = b;
    b = a + b;
    a = c;
    i = i + 1;
  }
  return a;
}

int main(void) {
  printf("The answer is: %d\n", @fib(20)); // @ operator added
  return 0;
}

CIR now outputs:

extern int printf(char *__restrict __format, ...);
int main(void)
{
  int vid66;

  vid66 = printf("The answer is: %d\n", 6765); /* sid15 */
  return 0; /* sid16 */
}

As we can see, the @fib(20) call got replaced with 6765. What happened? CIR JIT-compiled the fib() function info X86-64 machine code, executed it, and then inlined the result (6765) into the call site.

Arbitrary compile-time evaluation

While full support for the C11 standard is still a work in progress, CIR's JIT aims to be a full-featured C compiler. This means that you can use any C language construct you want, such as conditionals, loops, calling other functions etc.

Furthermore, JIT-compiled code can call external functions and libraries. This includes C standard library APIs such as malloc(), free(), fopen(), fwrite(), printf() etc.

For example, here is a function that returns the number of words in a string using the C standard-library API strtok():

#include <stdio.h>
#include <string.h>

static int numWords(char *s) {
    int numWords = 0;
    for (char *p = strtok(s, " "); p; p = strtok(NULL, " "))
        numWords = numWords + 1;
    return numWords;
}

int main(void) {
    printf("Hello World has %d words", @numWords("Hello World"));
    return 0;
}

CIR is able to call strtok() at compile-time, and outputs:

int printf(char *__restrict , ...);
int main(void)
{
    int vid145;

    vid145 = printf("Hello World has %d words", (int)2); /* sid13 */
    return (int)0; /* sid14 */
}

However, this also means that incorrectly-written compile-time functions may cause the CIR compiler to not halt, or even crash. With great power comes great responsibility -- developers must exercise care when writing compile-time functions.

Compile-time metaprogramming

Code evaluated at compile time can call back into the compiler API.

CIR will examine the type of compile-time-evaluated functions:

For example, here is a function that receives CirCodeId as an argument, and examines the IR contained within:

#include <stdbool.h>
#include "../cir.h" // include compiler API

// Returns true if the code calls a function, otherwise returns false.
static bool callsAFunction(CirCodeId codeId) {
  for (CirStmtId stmtId = CirCode_getFirstStmt(codeId); stmtId; stmtId = CirStmt_getNext(stmtId)) {
    if (CirStmt_isCall(stmtId))
      return true;
  }
  return false;
}

And can be used as follows:

@callsAFunction(puts("Hi")); // evaluates to true
@callsAFunction(puts(42)); // evaluates to false

Note

Notice how a simple while loop is sufficient in discovering whether there is a call in a piece of code. CIR is explicitly designed to be flat so as to make such traversals and manipulation of the IR easy, without having to use recursive function calls or the visitor pattern which is more difficult to write in C.

Here is another function that returns a code object containing a string constant:

#include <stdio.h>
#include <cir.h> // include compiler API

static CirCodeId generateCode() {
    return CirCode_ofExpr(CirValue_ofCString("Inlined String"));
}

int main(void) {
    puts(@generateCode());
    return 0;
}

And the result is:

int puts(char *);
int main(void)
{
    int vid386;

    vid386 = puts("Inlined String"); /* sid4 */
    return (int)0; /* sid5 */
}

Statement Expressions

Sometimes we may wish to pass blocks of code into compile-time functions. CIR implements GCC's statement expression extension for this purpose. Compound statements that are enclosed in parentheses are treated as an expressions. For example, we can pass a block of code into @callsAFunction, which we described earlier, as follows:

@callsAFunction( ({
  int x = 3, y = 2;
  if (y > x)
    puts("y > x");
}) ); // still evaluates to true

Loading external libraries

Compile-time-evaluated functions can use loaded external libraries. Consider the following program which compresses a string at compile-time with zlib:

defhello.c
#include <stdio.h>
#include <stdlib.h>
#include <cir.h>
#include <zlib.h>

static CirCodeId def(const char *data) {
    char *buf = malloc(4096);
    z_stream strm;
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    int ret = deflateInit(&strm, 9);
    if (ret != Z_OK)
        cir_fatal("deflateInit failed");
    strm.avail_in = strlen(data);
    strm.next_in = data;
    strm.avail_out = 4095;
    strm.next_out = buf;
    deflate(&strm, Z_FINISH);
    size_t len = 4095 - strm.avail_out;
    deflateEnd(&strm);
    buf[len] = 0;
    return CirCode_ofExpr(CirValue_ofString(buf, len + 1));
}

int main(void) {
    fwrite(@def("Hello World!"), sizeof(@def("Hello World!")), 1, stdout);
    return 0;
}

We assume that the zlib header files and library are already installed on the system. We first pre-process defhello.c as usual:

gcc -E o defhello.cpp.c defhello.c

and then pass the preprocessed file (defhello.cpp.c) into CIR. This time we add the flag -llibz.so to instruct CIR to load the shared library libz.so, making it available to compile-time functions:

./cir -llibz.so defhello.cpp.c >defhello.output.c

The output from CIR is:

defhello.output.c
typedef unsigned long tid3_size_t;
struct cid1__IO_FILE;
typedef struct cid1__IO_FILE tid6_FILE;
tid3_size_t fwrite(void *__restrict , tid3_size_t, tid3_size_t, tid6_FILE *__restrict );
extern tid6_FILE *const stdout;
int main(void)
{
    tid3_size_t vid697;

    vid697 = fwrite("x\xda""\xf3""H\xcd""\xc9""\xc9""W\010\xcf""/\xca""IQ\004\0\034I\004>", (unsigned long)21, (int)1, stdout); /* sid51 */
    return (int)0; /* sid52 */
}

As we can see, the CIR output contains the compressed Hello World! string. If we compile the CIR output, run the program (capturing its output), and then decompress it using python, we can get back the original Hello World! string::

$ gcc -o a.out defhello.output.c
$ ./a.out >output
$ python3
Python 3.7.3 (default, Apr  3 2019, 05:39:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zlib
>>> zlib.decompress(open('output', 'rb).read())
b'Hello World!'

The CIR Compiler API

CIR comes with an extensive low level compiler API for interacting with the compiler during compile-time evaluation. The compiler API is defined in cir.h.