Compiling OCaml in Depth

We will begin with an overview of how OCaml is compiled. We will see that a boot OCaml system is used to compile the OCaml system you have on your machine.

The second part is how C and assembly language are compiled, and where the C and assembly language compilers get invoked.

Then we will discuss how cross-compilation typically works in the C language.

Finally we’ll use the understanding we’ve gained for how one OCaml system compiles another system to understand how cross-compilation works in OCaml.

How OCaml is compiled

In this section we will walk through the source code for OCaml at https://github.com/ocaml/ocaml/tree/4.13.

Note

Parts of the explanation below comes from https://github.com/ocaml/ocaml/blob/4.13/BOOTSTRAP.adoc

Let’s start with the boot/ directory:

boot
├── menhir
│   ├── menhirLib.ml
│   ├── menhirLib.mli
│   ├── parser.ml
│   └── parser.mli
├── ocamlc
└── ocamllex

Both boot/ocamlc and boot/ocamllex are OCaml bytecode files. OCaml bytecode is a machine language code for a very simple virtual machine called the Caml Virtual Machine (or Zinc Machine).

ocamllex is the token lexer. If we could run ocamllex then we could take a .ml OCaml source file and emit OCaml tokens (ex. =) and keywords (ex. module) that could be consumed by the OCaml parser parser.ml and its runtime library menhirLib.ml. The end result would be an abstract syntax tree called a Parsetree from a single .ml source file.

ocamlc is the bytecode compiler. If we could run ocamlc then we could take any .ml source code and generate more bytecode.

All we are missing so far is a way to run arbitrary bytecode files like ocamlc and ocamllex. That missing program is ocamlrun: ocamlrun is the bytecode interpreter for the Caml Virtual Machine.

To build executables like the ocamlrun bytecode interpreter, OCaml uses an “autoconf” build system where:

  • ./configure inspects your system to find C compilers (that is how we know which C compiler to use to generate the ocamlrun program) and assembly language compilers. It captures the C and assembly compilers and their flags into ./Makefile.config using a template file Makefile.config.in. There are also other files created from *.in templates

  • make uses the configuration captured in ./Makefile.config to generate the utils/config.ml module from the utils/config.mlp template. There are also other modules created from *.mlp templates. After make has generated all the source files, it can build C programs like ocamlrun using the C compiler and OCaml programs like ocaml and ocamlc using ocamlrun boot/ocamlc.

ocamlrun is compiled using C code (.c); here is an abbreviated listing of ocamlrun source code:

runtime
├── alloc.c
├── array.c
├── backtrace.c
├── callback.c
├── caml
│   ├── alloc.h
│   ├── backtrace.h
│   ├── callback.h
│   ├── config.h
│   ├── domain.h
│   ├── domain_state.h
│   ├── domain_state.tbl
│   ├── gc.h
│   ├── gc_ctrl.h
│   ├── io.h
│   ├── m.h
│   ├── m.h.in
│   ├── major_gc.h
│   ├── minor_gc.h
│   ├── prims.h
│   ├── s.h
│   ├── s.h.in
│   ├── signals.h
│   ├── startup.h
│   ├── startup_aux.h
│   └── sys.h
├── domain.c
├── hash.c
├── io.c
├── lexing.c
├── main.c
├── major_gc.c
├── minor_gc.c
├── parsing.c
├── prims.c
├── riscv.S
├── signals.c
├── startup_aux.c
├── startup_nat.c
├── str.c
├── sys.c
├── unix.c
└── win32.c

After make compiles the runtime/ directory with the C compiler we will have the runtime/ocamlrun executable that can run any bytecode file and a runtime library for bytecode called libcamlrun. We can now:

  • compile OCaml files with runtime/ocamlrun boot/ocamlc and run the generated bytecode with runtime/ocamlrun

  • interact with Unix/Windows system library C functions from within bytecode since the compiled assembly language (ex. amd64.S) contains low-level logic for OCaml to call C functions and C functions to callback into OCaml

That sounds like we are finished, but we now have three problems.

Problem 1: Creating a modern OCaml compiler

The first problem is that we have been using the boot/ocamlc OCaml compiler. That boot OCaml compiler may be an old OCaml compiler that can’t compile the latest OCaml source code. So we compile a new OCaml compiler ./ocamlc bytecode file from the following abbreviated OCaml compiler source code:

.
├── bytecomp
│   ├── bytegen.ml
│   ├── bytelibrarian.ml
│   ├── bytelink.ml
│   ├── bytepackager.ml
│   ├── bytesections.ml
│   ├── dll.ml
│   ├── emitcode.ml
│   ├── instruct.ml
│   ├── meta.ml
│   ├── opcodes.ml
│   ├── printinstr.ml
│   └── symtable.ml
├── driver
│   ├── compenv.ml
│   ├── compile.ml
│   ├── compile_common.ml
│   ├── compmisc.ml
│   ├── errors.ml
│   ├── main.ml
│   ├── main_args.ml
│   ├── maindriver.ml
│   ├── makedepend.ml
│   ├── optcompile.ml
│   ├── opterrors.ml
│   ├── optmain.ml
│   ├── optmaindriver.ml
│   └── pparse.ml
├── lambda
│   ├── debuginfo.ml
│   ├── lambda.ml
│   ├── matching.ml
│   ├── printlambda.ml
│   ├── runtimedef.ml
│   ├── simplif.ml
│   ├── switch.ml
│   ├── translattribute.ml
│   ├── translclass.ml
│   ├── translcore.ml
│   ├── translmod.ml
│   ├── translobj.ml
|   └── translprim.ml
└── typing
|   ├── btype.ml
|   ├── ...
|   ├── ctype.ml
|   ├── ...
|   ├── primitive.ml
|   ├── ...
|   ├── type_immediacy.ml
|   ├── typeclass.ml
|   ├── typecore.ml
|   ├── typedecl.ml
|   ├── typedecl_immediacy.ml
|   ├── typedecl_properties.ml
|   ├── typedecl_separability.ml
|   ├── typedecl_unboxed.ml
|   ├── typedecl_variance.ml
|   ├── typedtree.ml
|   ├── typemod.ml
|   ├── typeopt.ml
|   ├── types.ml
|   ├── typetexp.ml
|   └── untypeast.ml
└── utils/
    ├── ...
    ├── clflags.ml
    ├── config.ml
    ├── config.mlp
    ├── ...

Once we have a modern ./ocamlc we can see the configuration constants embedded in utils/config.ml if you run runtime/ocamlrun ./ocamlc -config:

version: 4.12.1
...
ccomp_type: cc
c_compiler: gcc
ocamlc_cflags: -O2 -fno-strict-aliasing -fwrapv -fPIC
ocamlc_cppflags: -D_FILE_OFFSET_BITS=64 -D_REENTRANT
ocamlopt_cflags: -O2 -fno-strict-aliasing -fwrapv -fPIC
ocamlopt_cppflags: -D_FILE_OFFSET_BITS=64 -D_REENTRANT
bytecomp_c_compiler: gcc -O2 -fno-strict-aliasing -fwrapv -fPIC  -D_FILE_OFFSET_BITS=64 -D_REENTRANT
native_c_compiler: gcc -O2 -fno-strict-aliasing -fwrapv -fPIC  -D_FILE_OFFSET_BITS=64 -D_REENTRANT
bytecomp_c_libraries: -lm -ldl  -lpthread
native_c_libraries: -lm -ldl
native_pack_linker: ld -r -o
ranlib: ranlib
...
asm: gcc -c
...

The net effect is that the C and assembly compilers are hardcoded _inside_ the ocamlc executable.

Problem 2: Creating the standard library

The second problem is that we don’t have the OCaml standard library. We can compile the standard library bytecode (.cmo object files and .cma object libraries) from:

stdlib
├── arg.ml
├── array.ml
├── arrayLabels.ml
├── atomic.ml
├── bigarray.ml
├── bool.ml
├── buffer.ml
├── bytes.ml
├── bytesLabels.ml
├── callback.ml
├── char.ml
├── complex.ml
├── digest.ml
├── either.ml
├── ephemeron.ml
├── filename.ml
├── float.ml
├── format.ml
├── fun.ml
├── gc.ml
├── genlex.ml
├── hashbang
├── hashtbl.ml
├── header.c
├── headernt.c
├── int.ml
├── int32.ml
├── int64.ml
├── lazy.ml
├── lexing.ml
├── list.ml
├── listLabels.ml
├── map.ml
├── marshal.ml
├── moreLabels.ml
├── nativeint.ml
├── obj.ml
├── oo.ml
├── option.ml
├── parsing.ml
├── pervasives.ml
├── printexc.ml
├── printf.ml
├── queue.ml
├── random.ml
├── result.ml
├── scanf.ml
├── seq.ml
├── set.ml
├── stack.ml
├── stdLabels.ml
├── std_exit.ml
├── stdlib.a
├── stdlib.ml
├── stream.ml
├── string.ml
├── stringLabels.ml
├── sys.ml
├── sys.mlp
├── uchar.ml
├── unit.ml
└── weak.ml

Problem 3: Generating native code

The third problem is that we completely ignored how we will generate native code. That is the subject of discussion for the next section.

Where C and assembly compilers are used

We’ve already discussed how the OCaml compiler itself uses a C compiler to compile the runtime/ directory into the ocamlrun program. And how ocamlrun can run other bytecode files and generate (with ocamlc) more bytecode files.

OCaml’s native code compiler program ./ocamlopt is build the same way we built ./ocamlc, except bytecomp/ has been replaced by asmcomp/ and middle_end/:

.
├── asmcomp
│   ├── CSE.ml -> arm/CSE.ml
│   ├── CSEgen.ml
│   ├── amd64
│   │   ├── CSE.ml
│   │   ├── arch.ml
│   │   ├── emit.mlp
│   │   ├── proc.ml
│   │   ├── reload.ml
│   │   ├── scheduling.ml
│   │   └── selection.ml
│   ├── arch.ml -> arm/arch.ml
│   ├── arm
│   │   ├── CSE.ml
│   │   ├── arch.ml
│   │   ├── emit.mlp
│   │   ├── proc.ml
│   │   ├── reload.ml
│   │   ├── scheduling.ml
│   │   └── selection.ml
│   ├── arm64
│   │   └── \*.ml
│   ├── asmgen.ml
│   ├── asmlibrarian.ml
│   ├── asmlink.ml
│   ├── asmpackager.ml
│   ├── branch_relaxation.ml
│   ├── branch_relaxation_intf.ml
│   ├── cmm.ml
│   ├── cmm_helpers.ml
│   ├── cmmgen.ml
│   ├── cmmgen_state.ml
│   ├── coloring.ml
│   ├── comballoc.ml
│   ├── deadcode.ml
│   ├── emit.ml
│   ├── emitaux.ml
│   ├── i386
│   │   └── \*.ml
│   ├── interf.ml
│   ├── interval.ml
│   ├── linear.ml
│   ├── linearize.ml
│   ├── linscan.ml
│   ├── liveness.ml
│   ├── mach.ml
│   ├── power
│   │   └── \*.ml
│   ├── printcmm.ml
│   ├── printlinear.ml
│   ├── printmach.ml
│   ├── proc.ml -> arm/proc.ml
│   ├── reg.ml
│   ├── reload.ml -> arm/reload.ml
│   ├── reloadgen.ml
│   ├── riscv
│   │   └── \*.ml
│   ├── s390x
│   │   └── \*.ml
│   ├── schedgen.ml
│   ├── scheduling.ml -> arm/scheduling.ml
│   ├── selectgen.ml
│   ├── selection.ml -> arm/selection.ml
│   ├── spill.ml
│   ├── split.ml
│   ├── strmatch.ml
│   ├── x86_dsl.ml
│   ├── x86_gas.ml
│   ├── x86_masm.ml
│   └── x86_proc.ml
├── driver
│   └── \*.ml
├── lambda
│   └── \*.ml
├── middle_end
│   ├── backend_var.ml
│   ├── clambda.ml
│   ├── clambda_primitives.ml
│   ├── closure
│   │   ├── closure.ml
│   │   └── closure_middle_end.ml
│   ├── compilation_unit.ml
│   ├── compilenv.ml
│   ├── convert_primitives.ml
│   ├── flambda
│   │   ├── alias_analysis.ml
│   │   ├── allocated_const.ml
│   │   ├── augment_specialised_args.ml
│   │   ├── base_types
│   │   ├── build_export_info.ml
│   │   ├── closure_conversion.ml
│   │   ├── closure_conversion_aux.ml
│   │   ├── closure_offsets.ml
│   │   ├── effect_analysis.ml
│   │   ├── export_info.ml
│   │   ├── export_info_for_pack.ml
│   │   ├── extract_projections.ml
│   │   ├── find_recursive_functions.ml
│   │   ├── flambda.ml
│   │   ├── flambda_invariants.ml
│   │   ├── flambda_iterators.ml
│   │   ├── flambda_middle_end.ml
│   │   ├── flambda_to_clambda.ml
│   │   ├── flambda_utils.ml
│   │   ├── freshening.ml
│   │   ├── import_approx.ml
│   │   ├── inconstant_idents.ml
│   │   ├── initialize_symbol_to_let_symbol.ml
│   │   ├── inline_and_simplify.ml
│   │   ├── inline_and_simplify_aux.ml
│   │   ├── inlining_cost.ml
│   │   ├── inlining_decision.ml
│   │   ├── inlining_stats.ml
│   │   ├── inlining_stats_types.ml
│   │   ├── inlining_transforms.ml
│   │   ├── invariant_params.ml
│   │   ├── lift_code.ml
│   │   ├── lift_constants.ml
│   │   ├── lift_let_to_initialize_symbol.ml
│   │   ├── parameter.ml
│   │   ├── pass_wrapper.ml
│   │   ├── projection.ml
│   │   ├── ref_to_variables.ml
│   │   ├── remove_free_vars_equal_to_args.ml
│   │   ├── remove_unused_arguments.ml
│   │   ├── remove_unused_closure_vars.ml
│   │   ├── remove_unused_program_constructs.ml
│   │   ├── share_constants.ml
│   │   ├── simple_value_approx.ml
│   │   ├── simplify_boxed_integer_ops.ml
│   │   ├── simplify_common.ml
│   │   ├── simplify_primitives.ml
│   │   ├── traverse_for_exported_symbols.ml
│   │   ├── un_anf.ml
│   │   ├── unbox_closures.ml
│   │   ├── unbox_free_vars_of_closures.ml
│   │   └── unbox_specialised_args.ml
│   ├── internal_variable_names.ml
│   ├── linkage_name.ml
│   ├── printclambda.ml
│   ├── printclambda_primitives.ml
│   ├── semantics_of_primitives.ml
│   ├── symbol.ml
│   └── variable.ml
└── typing
    └── \*.ml

Where bytecode uses the bytecode runtime library libcamlrun, native code uses a different runtime library libasmrun that is built in the same runtime/ directory but that includes the assembly code (.asm and .S) in that directory:

runtime
├── amd64.S
├── amd64nt.asm
├── arm.S
├── arm64.S
├── i386.S
├── i386nt.asm
├── power.S
├── riscv.S
└── s390x.S

ocamlopt performs a variety of activities including:

  • it translates .ml files into architecture-specific assembly language source code (.s files) using the code in asmcomp/. ocamlopt then compiles the assembly language into native object files (Unix .o or Windows .obj files) using the assembly compiler named in the utils/config.ml module (the same module you saw with ocamlc -config)

  • it compiles .C files into native object files using the C compiler named in utils/config.ml

  • it links the native object files into a native executable using the native linker named in utils/config.ml

Now we are basically done. With ocamlopt we can recompile everything into native code, including the compilers. For example ./ocamlc.opt and ./ocamlopt.opt are created using the same procedure as ./ocamlc and ./ocamlopt, except instead of using ./ocamlc to compile them into bytecode executable, ./ocamlopt is used to compile them into native code executables.

How C code is cross-compiled

For this section we’ll use an example where we cross-compile the Java compiler javac. Just like OCaml much of the low-level Java source code is C code. The concepts introduced in this section will carry forward to the next section where we describe the cross-compiling of OCaml compilers ocamlc and ocamlopt.

We want to use an Ubuntu 64-bit machine to create a javac.exe that can run on Windows. To compile javac.exe there are three different machines we need to consider:

Build Machine

You would use the following “autoconf” pattern to build yourself a new javac binary:

git clone https://git.openjdk.java.net/jdk/
cd jdk
./configure ...
make ...
make install

The machine where you type “git clone” and “./configure” is the Build Machine.

The Build Machine is a Ubuntu (Linux x86_64) 64-bit machine.

Host Machine

Since we want to run javac.exe on Windows machines, the Host Machine is a Windows Intel/AMD 64-bit machine.

Target Machine

javac.exe will compile .java source files into .class bytecode files. These .class bytecode files can be run on any machine.

The Target Machine is any machine; we can take the Java compiled output Main.class that was produced on a Windows machine and then run the Main.class on a macOS machine.

To produce a Windows javac.exe from a Linux machine we need to supply a C compiler that is a Linux executable that runs on Linux but generate Windows executables (.exe) and Windows shared libraries (.dll).

  1. If you want a GCC compiler you would install and use the compiler /usr/bin/x86_64-w64-mingw32-gcc-win32 using your Linux package manager (ex. apt install gcc-mingw-w64-x86-64 on Ubuntu). If you instead wanted to generate 32-bit Windows executables you would install and use /usr/bin/i686-w64-mingw32-gcc-win32.

    In general, GCC prefixes its compilers and tools with the Host Machine “triple” (ex. i686-w64-mingw32 describes 32-bit Intel/AMD Windows targets).

  2. If you want a Clang compiler you would install and the compiler /usr/bin/clang (ex. apt install clang on Ubuntu). You would use “target” compiler flags (typically the CFLAGS environment variable) to set the Host Machine. /usr/bin/clang --target=i686-pc-win32 would create executables for 32-bit Intel/AMD Windows while /usr/bin/clang --target=x86_64-pc-win32 would create executables for 64-bit Intel/AMD Windows.

In addition to the Build-Machine-running, Host-Machine-producing compiler we must also have C libraries to link against and C headers to compile against. Since the C compiler will produce code that runs on the Host Machine, the C libraries and C headers must be for the Host Machine. Those C libraries and headers are collected in a directory structure called a sysroot:

<sysroot>
  └── usr
      ├── include
      │   └── *.h
      └── lib
          ├── *.a or *.lib
          └── *.so or *.dll

Changing OCaml to do cross-compilation

Note

The technique presented here was first described by EduardoRFS, Antonio Nuno Monteiro and Romain Beauxis in discuss.ocaml.org: Cross-compiling implementations / how they work

The build procedure for compiling the OCaml system starts with the Build Machine generating the ocamlc compiler and the ocamlrun bytecode interpreter; in other words both ocamlc and ocamlrun are Host Machine executables. To continue building the rest of the OCaml system the ocamlc and ocamlrun executables must be run. So building an OCaml system requires that the Build Machine must be capable of running Host Machine executables.

Without loss of generality we will only describe the Host Machine, and expect you to use a compatible Build Machine. So a Windows 32-bit Host Machine executable can be run on a Windows 32-bit or 64-bit Build Machine, but a Windows 64-bit Host Machine executable cannot be run on a 32-bit Build Machine.

Warning

When can’t a 64-bit Build Machine run 32-bit executables? Apple Silicon (M1, etc.) hardware has 64-bit ARM processors, but for cost and energy savings the 32-bit ARM circuitry has been removed. You cannot run 32-bit ARM executables on Apple Silicon.

Most 64-bit Linux distributions do not come natively with 32-bit system libraries; you have to install something called the “i386” architecture to run most 32-bit binaries. There are tricks to avoid needing these 32-bit system libraries, but usually the i386 architecture is easy to install in popular Linux distributions.

Here are the modified steps to get a cross-compiling OCaml system:

  • Use ./configure to establish constants for the Target Machine. utils/config.ml and runtime/sys.c will contain settings for the Target Machine.

  • Compile ocamlc and ocamlopt that runs on both the Build Machine and Host Machine.

    Note

    ocamlc/ocamlopt now will have configuration (ex. ocamlc -config) for the Target Machine, which is precisely what we want

  • Compile all executable tools like ocamlyacc, ocamldebugger, etc. using the Host Machine ocamlc / ocamlopt

  • At this point all of the executables run on the Host Machine

    Note

    ocamlc/ocamlopt now will produce bytecode + native code that runs on the Host Machine. We don’t that. We want to fix ocamlc/ocamlopt to produce bytecode + native code that runs on the Target Machine.

  • Remove all OCaml compiled intermediate code. We could be selective and just remove the portions that contain instructions for producing Host Machine bytecode + native code, but it is easier and safer to remove everything intermediate. The executable tools are not deleted because they are not intermediate.

  • Recompile stdlib and its runtime dependencies (libcamlrun bytecode and libasmrun native code runtime libraries) for the Target Machine

  • Regenerate ocamlc / ocamlopt using Host Machine ocamlc (which will produce Host Machine executable ocamlc and Host Machine executable ocamlopt but that contain Target Machine standard + runtime libraries)

  • Recompile stdlib again but include other libraries (unix, str, bigarray) for the Target Machine

  • Recompile the bytecode and native code compiler libraries for the Target Machine, in case an Opam package or other OCaml package wants to bypass the compiler executables ocamlc / ocamlopt.

The end result will be:

Host Machine

The ocamlc and ocamlopt will be able to run on the Host machine.

Target Machine

ocamlc will compile .ml source files into bytecode executables and .cmo / .cma bytecode object files. These bytecode files in theory can be run on any machine that has a bytecode interpreter ocamlrun. However because bytecode can make calls to external C libraries, the C libraries need to have the same APIs and the C libraries have to have the same C calling convention (also known as “ABI” or application binary interface). Practically speaking that means bytecode created on a 32-bit system may not work on a 64-bit system, and bytecode created on Windows may not work on Unix. Unlike Java, care has to be taken so that bytecode is portable to all Target Machines.

ocamlopt will compile .ml source files into native executables and .cmx / .cmxa native object files. The native files will run only on the Target Machine configured at ./configure time.

Limitations

Not all Host Machine / Target Machine combinations are possible.

The host ocamlrun is linked against the host’s runtime/ library which defines the following constants in runtime/sys.c:

Constant

Sample

Description

word_size

32 or 64

Number of bits in a word as detected by a C compiler sizeof() test

int_size

31 or 63

32 or 64

Number of bits in an OCaml int

  • Always word_size - 1 for ocamlrun

  • js_of_ocaml (Javascript) runtime sets it the same as word_size

max_wosize

2^22 - 1 or 2^(54-P) -1

Max size in bytes of a block of memory on the heap

  • No Array can be larger than this

  • P is number of bits reserved for Space Time profiling (if any)

ostype_unix

True or False

Whether system is Unix

ostype_win32

True or False

Whether system is Windows

ostype_cygwin

True or False

Whether system is Cygwin

backend_type

0 or 1

Native (0) or Bytecode (1) backend

  • Other values possible for other compilers like js_of_ocaml (Javascript)

naked_pointers_checked

True or False

Whether the naked pointers checker is enabled

Any commands like <host>/ocamlrun <target>/ocamlc -config will inherit the host runtime constants (<host>/runtime/sys.c) even if all of the target configuration (<target>/utils/config.ml) is correct. For example you can easily get conflicting configurations from ocamlc -config:

# ...
architecture: i386  # From <target>/utils/config.ml
model: default      # From <target>/utils/config.ml
int_size: 63        # From <host>/runtime/sys.c . No, i386 does not support 63-bit integers!
word_size: 64       # From <host>/runtime/sys.c . No, i386 does not support 64-bit words!
# ...

Important

During a cross-compilation the Host Machine and the Target Machine runtime library constants should be the same. The most critical limitations are:

  • if your Target Machine is a 32-bit system, make sure you use a 32-bit Host Machine compiler. That equalizes word_size

  • if your Target Machine is a Unix system, make sure you run the Host Machine cross-compiler on a Unix system. That equalizes ostype_unix

  • if your Target Machine is a Windows system, make sure you run the Host Machine cross-compiler on a Windows system. That equalizes ostype_windows

  • no cross-compilation is supported in Cygwin because Cygwin only supports x86_64 Windows

Even if the Host Machine runtime constants were not inherited, there are a few more limitations created by:

  1. When ocamlopt is linking object files into an executable on Windows, it uses an executable called flexlink.exe that expects Windows .obj (COFF) object files for linking. However Linux uses ELF object files and macOS uses Mach-O object files, so a Windows Host Machine cannot support a non-Windows Target Machine.

  2. When ocamlopt is linking object files into an executable on Windows, the Host/Target Machine compiler must match (ie. MSVC or MinGW) and the Host/Target Machine word size (ie. 32 or 64) must match because flexlink.exe bundles a word size + compiler named object file flexdll_msvc.obj, flexdll_mingw64.obj, etc. into the final executable.