Distributing OCaml

What distributing OCaml on Windows gave me (and you)

Presentation updated:

2023-04-17

Outline

I’ll use the “narrative arc” for telling stories.

  • Act One: Set the scene, esp. seeds of conflict

    • Biography. Introduce “Why OCaml?”

  • Act Two: Reveal conflict, and grow and change.

    • OCaml and Windows. Its challenges and what it solved.

  • Act Three: Happily ever after!

    • Current perspective of OCaml + Windows. Applications incl. demo. What remains

Act One

Set the scene: Me

  • Functional background? 20 years ago!

  • Deja vu. Worked (briefly) on Windows side of CHICKEN Scheme

  • Languages I used were a snapshot of the dominant languages …

Set the scene: Me

  • 2000s: Fraud and credit analytics.

  • Lots of data and security. Lots of machine learning. Telecom, credit cards, health insurance.

  • C and Java. Solaris, z/OS and Linux servers. Windows just for GUIs.

Set the scene: Me

  • 2010s: Amazon. Fraud, data warehousing and computer vision.

  • Lots of data and security. Lots of machine and deep learning.

  • Java, C, JS and Python. Linux servers and embedded hardware. Even less Windows b/c GUIs have become web apps.

Set the scene: Diskuv

  • 2020. Proud of what I did at Amazon. But left Amazon to start Diskuv shortly after George Floyd.

  • Goal was a suite of personal safety products; hardware and software. Fit with data, security and computer vision background.

  • Most of my product designs had common element: communication between people. So first product was “chat”.

Set the scene: Diskuv

  • Heavily forked version of Signal Server, with my own backend servers.

  • Problem 1: Signal had not released their “open source” server code for a year while they were integrating the cryptocurrency “Mobile Coin”.

  • Lesson: Saying “open-source” is not enough. Have to be prepared to not only release the code but make it runnable by end-users.

Set the scene: Diskuv

  • Heavily forked version of Signal Server, with my own backend servers.

  • Problem 2: Forking is expensive/slow. Developed rule-based code transformation tools in Python.

  • Lesson: Avoid forks at all costs.

Set the scene: Diskuv

  • Heavily forked version of Signal Server, with my own backend servers.

  • Problem 3: Front-end code was lock heavy. And caused messages to be lost.

  • Deep dive …

Set the scene - Locking

My bug report:

The bug is fairly general: I've waited for up to an hour waiting for
messages to be received. I've seen other people report similar things.
If I switch apps (ex. switch to a game, and then switch back to Signal)
my messages start going through.

Set the scene - Locking

On your phone

Holding a phone during Signal registration
  1. Go through the “registration” flow

  2. Send a message “hi” to another Signal user (“other”)

Set the scene - Locking

On other phone

  1. Accept the message sender

  2. Reply to the message with “Hi”

Another person holding phone while accepting a message

Set the scene - Locking

On your phone

Another person holding phone while accepting a message Holding a phone while waiting for message to be sent
  1. Wait for 1+ hrs

  2. App in background then in foreground. Sometimes get message.

  3. Close app and re-open, immediately get undelivered messages.

Set the scene - Locking

How would you know that you aren’t getting your message?

If you switch to a web page, and then back to your app, your message is there!

How do you support your product in the field?

Set the scene - Locking

My five-line hacky solution:

TextSecurePreferences.setWebsocketRegistered(context, true);
// ...
TextSecurePreferences.setPushRegistered(context, true);
// ...
IncomingMessageObserver incomingMessageObserver =
  ApplicationDependencies.getIncomingMessageObserver();
synchronized (incomingMessageObserver) {
  incomingMessageObserver.notifyAll();
}

Set the scene - Locking

My five-line hacky solution:

TextSecurePreferences.setWebsocketRegistered(context, true);
// ...
TextSecurePreferences.setPushRegistered(context, true);
// ...
/*
  CONTRASTS with how difficult it is

  to find locking problems!
*/

Set the scene - Locking

Root cause? Code did not guarantee that Object.notify sent when any of 5 conditions changed:

private synchronized boolean isConnectionNecessary() {
  boolean registered          = isPushRegistered(context);
  boolean websocketRegistered = isWebsocketRegistered(context);
  boolean isGcmDisabled       = isFcmDisabled(context);
  boolean hasNetwork          = isMet(context);
  boolean hasProxy            = isProxyEnabled();

  return registered                    && websocketRegistered &&
         (appVisible || isGcmDisabled) && hasNetwork          &&
         !networkAccess.isCensored(context);
}

private synchronized void waitForConnectionNecessary() {
  while (!isConnectionNecessary()) wait();
}

Set the scene - Locking

  • I forked because I was introducing new behaviors to the Signal mobile application

  • But new behaviors still need to interact with old behaviors, which means acquiring the same set of locks …

  • But still guard against lock mis-ordering problems

  • … means acquire your own locks in event-driven UI

  • Nuclear arms race to ensure safe lock usage!

Set the scene - Locking

Lesson:

  • Avoid locks at all costs. Prefer cooperative threading.

Set the scene - Alpha Launch

  • Solved the problems, but limped to the starting line

  • 6 months into company, demo of chat available! Ideal startup timeline.

  • A couple months of Alpha testing with an invite-only app from the Google Play Store; getting good feedback

  • But just before a public Beta launch, we enter Act Two

Act Two

Enter Google

Google Cloud

To rub salt on your wounds, they don’t tell you that your entire Google Cloud account is locked as well.

Enter Google - Reason?

No Previous Emails

Of course they didn’t actually send any previous emails. This was just a form letter.

Summary: No explanation why they terminate.

Enter Google - Appeal

Enter Google - Appeal

How Long?

Exactly one month later!

Enter Google - Appeal

Enter Google - Appeal

I was overcome with emotion for how kind they were.

🙃

What?

No warning. No explanation. Business is inaccessible for a month.

What?

What do I say to people who … after using one of my personal safety products … find themselves not able to use a safety product for a month?

NEVER AGAIN

Enter OCaml

Never again requirements:

  • When writing a mobile application, must be easy to switch to a web application if a vendor messes with you

Enter OCaml

Personal requirements:

  • Adding features like a new thread must not make existing code less safe or less understandable

  • Minimize time it takes to deploy to embedded devices, desktops and iOS

  • Make it as easy as possible to prove the software I develop is safe

Enter OCaml

Never again requirements:

  • “When writing a mobile application, must be easy to switch to a web application if a vendor messes with you”

    Not many languages fit this. Highest rank first:

    1. JavaScript

    2. JavaScript code from OCaml (jsoo) backend

    3. WebAssembly code from Rust and C

Enter OCaml

Personal requirements:

  • “Adding features like a new thread must not make existing code less safe or less understandable”

    Languages that supported cooperative threading were a good fit. Generalization hard to measure; stopped at cooperative threads as proxy.

    1. JavaScript and OCaml

    2. Rust (perceived difficulty of async, single-threaded tokio)

    3. C

Enter OCaml

Personal requirements:

  • Minimize time it takes to deploy to embedded devices, desktops and iOS

    Hard to evaluate! Short-term time sinks (least first):

    1. Rust and C (large ecosystems; run anywhere)

    2. OCaml (learn new language) and JS (does not run everywhere)

Enter OCaml

Personal requirements:

  • Minimize time it takes to deploy to embedded devices, desktops and iOS

Long-term time sinks (least time sink first):

  1. C

  2. JavaScript (interop mobile + embedded)

  3. Rust (perceived drag on productivity; followup)

  4. OCaml (hiring with low supply)

Enter OCaml

Rust Lang Roadmap for 2024 (written in 2022)

Companies building large teams of Rust users report that the typical onboarding time for a Rust engineer is around 3-6 months. Once folks learn Rust, they typically love it. Even so, many people report a sense of high “cognitive overhead” in using it, and “learning curve” remains the most common reason not to use Rust.

Enter OCaml

Personal requirements:

  • Make it as easy as possible to prove the software I develop is safe

    Easy to prove first:

    1. OCaml (best-in-class with Coq, memory safe)

    2. Rust (memory and resource safety)

    3. JavaScript (memory safe)

    4. C

Enter OCaml

Not easy. No clear winner!

Rankings

Requirement

JS

OCaml

Rust

C

Web Apps

1

2

3

3

Threads

1

1

3

4

Short Term

3

3

1

1

Long Term

2

4

3

1

Prove Safety

3

1

2

4

Ranking Summary

Web Apps as the Never Again requirement was weighted higher.

Long Term for OCaml was soft. If I took a risk and made it easy for non-OCamlers to adopt, I’ve mitigated half of this risk.

Enter OCaml

OCaml is chosen winner, but only with large risk

Rankings

Requirement

JS

OCaml

Rust

C

Web Apps

1

2

3

3

Threads

1

1

3

4

Short Term

3

3

1

1

Long Term

2

4 2

3

1

Prove Safety

3

1

2

4

OCaml Risks

  • Long Term: How to deal with low supply of OCaml-ers.

  • Answer: Make OCaml easy to adopt and be productive for newcomers.

  • Windows: Poor support today for Windows

  • Answer: From my only interaction with OCaml (compiling Unison) knew that support existed for Windows.

OCaml Risks

  • Long Term: How to deal with low supply of OCaml-ers.

  • Windows: Poor support today for Windows

  • Strategies?

    • Deal with top risks first.

    • Fund the cross-vendor/cross-platform development by releasing tools for peers who have similar vendor risks.

OCaml Risks

  • Low Supply + Windows risks ⇒ Building OCaml projects on Windows should be easy for within my company and any contractors I hire.

  • End goal: Open project, run a Windows script to install OCaml, edit OCaml source code, and press Build.

  • For employees and contractors only.

Windows Issue #1

  • The dominant compiler is Microsoft Visual Studio.

  • OCaml on Windows used GCC.

  • GCC and MSVC are often compatible, but sometimes not.

  • They are different ABIs, especially wrt to DLLs.

  • Rust: x86_64-pc-windows-msvc and x86_64-pc-windows-gnu

Windows Issue #1

libuv is typical Tier 3 support for GCC (MinGW): Node.js eventing library.

Windows Issue #1

libuv Support Matrix

System

Type

Versions

GNU/Linux

Tier 1

Linux >= 2.6.32 with glibc >= 2.12

macOS

Tier 1

>= 10.15

FreeBSD

Tier 1

>= 10

Windows

Tier 1

>= Windows 8

MinGW

Tier 3

MinGW32 and MinGW-w64

Tier 3

Community maintained. These systems may inadvertently break and the community and interested parties are expected to help with the maintenance.

Windows Issue #2

  • Unix compatibility: MSYS2

  • Pervasive UNIX commands in opam packages - conventional package manager for OCaml

Windows Issue #2

ocaml-base-compiler.4.14.0 package installs the OCaml compiler, but assumes there is a /bin/sh to run ./configure:

build: [
  [
    "./configure"           # <--- assumes /bin/sh
    "--prefix=%{prefix}%"
    "--docdir=%{doc}%/ocaml"
    "-C"
    "CC=cc" {os = "openbsd" | os = "macos"}
    "ASPP=cc -c" {os = "openbsd" | os = "macos"}
  ]
  [make "-j%{jobs}%"]
]

Windows Issue #1 and #2

Created with-dkml that proxies to a UNIX executable.

UNIX binaries in the PATH + MSVC compiler variables

Final                           Original
C:/Users/me/AppData/Local/Programs/DISKUV~1/bin
├── dune-real.exe       =       dune.exe
├── dune.exe            =       with-dkml.exe
├── opam-real.exe       =       opam.exe
├── opam.exe            =       with-dkml.exe
└── with-dkml.exe       =       with-dkml.exe

Windows Issue #3

  • Sole maintainer of the only Windows distribution said they were discontinuing Windows

  • Good time to release my Windows scripts publicly

  • But … self-interest disclosure … just enough to make sure the Windows support did not die!

Enter OCSF

../../_images/ocsf_logo.svg
  • OCaml Software Foundation approached me to see if I would continue supporting Windows

  • My existing support was making sure OCaml on Windows would not die

  • Implicit ask was to make sure Windows would thrive: others could contribute, and it be promoted

Enter OCSF

  • Decision was whether extra effort on Windows made sense

  • My perspective: change in priority so work I planned over the next three years could be brought in much sooner

  • But also need clear lines between open-source and commercial …

Enter OCSF

Clear lines between open-source “DkML” and commercial “DkSDK”

Lines

Requirement

DkML

DkSDK

Languages

OCaml

C + OCaml; Swift + Java

Purpose

Env to learn + develop one-off

Env and libraries to de-risk front+backend vendors

Build Tooling

OCaml Standards

C Standards

Enter OCSF

  • Agreed + thanks to OCSF. 1st project was single-click Windows installer

  • Dogfooding caught bugs early; used DkML installed environment to develop my own chat software and DkSDK

Enter OCSF

  • Biggest drawback: Slow! 2-cpu VM takes 90 minutes.

  • Contrast with macOS M1 MacVM: 15 minutes

  • Hardware is unequal, and Mac does not need to install Unix

  • Observations from GitHub/GitLab CI: Windows compiles are 2X slower

Enter OCSF

  • Biggest advantage: 5-7 clicks and 11 characters (y, ENTER, WinR, cmd, ENTER, utop, ENTER)

  • Contrast to macOS: 4 non-interactive commands, 1 interactive command. Lots of reading, and common beginner pain with “switches” (analog to Python virtual environments)

Enter OCSF

../../_images/vlcsnap-2023-04-19-07h38m16s519.png ../../_images/vlcsnap-2023-04-19-07h41m05s004.png ../../_images/vlcsnap-2023-04-19-07h41m29s726.png ../../_images/vlcsnap-2023-04-19-07h41m49s945.png ../../_images/vlcsnap-2023-04-19-07h42m44s020.png ../../_images/vlcsnap-2023-04-19-07h43m00s993.png ../../_images/vlcsnap-2023-04-19-07h43m19s883.png ../../_images/vlcsnap-2023-04-19-07h43m32s160.png ../../_images/vlcsnap-2023-04-19-07h43m44s100.png ../../_images/vlcsnap-2023-04-19-07h44m33s199.png ../../_images/vlcsnap-2023-04-19-07h44m41s395.png ../../_images/vlcsnap-2023-04-19-07h45m00s792.png ../../_images/vlcsnap-2023-04-19-07h49m15s969.png ../../_images/vlcsnap-2023-04-19-07h49m46s694.png ../../_images/vlcsnap-2023-04-19-07h52m48s234.png ../../_images/vlcsnap-2023-04-19-07h53m10s229.png ../../_images/vlcsnap-2023-04-19-07h53m20s160.png ../../_images/vlcsnap-2023-04-19-07h53m25s886.png ../../_images/vlcsnap-2023-04-19-07h53m36s567.png ../../_images/vlcsnap-2023-04-19-07h53m48s282.png ../../_images/vlcsnap-2023-04-19-07h53m57s369.png ../../_images/vlcsnap-2023-04-19-07h54m23s222.png ../../_images/vlcsnap-2023-04-19-07h54m32s859.png

Enter OCSF

  • Biggest issues caught in the wild:

    • Flaky downloads

    • Unicode

    • Spaces in directories

Act Three: Happily ever after!

Status of DkML

  • Biggest missing features?

    • More functionality without using switch (venv)

    • Lite “bytecode” installer without Visual Studio

Status of OCaml

  • No OCaml 5 + MSVC. Could take a year for MSVC.

  • Today is recommended Windows distribution, but will switch rapidly when opam 2.2 is released

  • Testing Windows is not automatic, but is changing

  • Deeply entrenched Unix-ism may be impossible to dislodge:

    • Makefiles not supporting spaces

    • Many many packages require pkg-config. And it is implicitly encouraged as default C resolver in dune

Status of OCaml

But it is quite reasonable to develop OCaml applications on Windows, for Windows.

My routine:

  • One week macOS

  • One week Windows (occasionally with Linux Docker or WSL2)

DkSDK and chat are the OCaml products I can develop on Windows …

Status on DkSDK

DkSDK: Development environment and libraries so you can manage your technology / vendor risk

  • I’m building a chat app. I need it for my company’s product line, but chat is commodity; source code will be available when it is ready.

  • Actors … stateful, distributed objects that communicate with message passing … is a common way to build a chat backend.

Status on DkSDK

DkSDK: Development environment and libraries so you can manage your technology / vendor risk.

I don’t foresee needing complexity of the big players: Erlang/OTP and Akka. But I might need it. Both are true:

  • I don’t want to take the risk of writing large amounts of code and not being able to adopt a full-feature actor framework

  • I don’t want to take the risk of committing to tech + vendor

Status on DkSDK

More important, I’ve built large, reliable compute systems on top of distributed memory (ex. Redis).

My hunch is others are fairly comfortable with Redis as well, and don’t need the reliability and deployment guarantees that come with the more complex frameworks.

Status on DkSDK

The project DkHelloWorldActor uses libraries that are “real” (from chat app) but top-level code is just a simple example from Akka.

../../_images/hello-world2.png

Status on DkSDK

Since Redis is written in C, most of my development has been in CLion.

CLion - run actors-cli

Status on DkSDK

  • Uses light weight threads in OCaml as the actor coordinator. OCaml-ers? Drop-downs and buttons to run OCaml code.

  • Output is similar to what Akka has for its example:

    [05.814] [hello-akka.actor.dd-4] [akka://hello/user/greeter] Hello World!
    [05.815] [hello-akka.actor.dd-4] [akka://hello/user/greeter] Hello Akka!
    [05.815] [hello-akka.actor.dd-2] [akka://hello/user/World] Greeting 1 for World
    [05.815] [hello-akka.actor.dd-4] [akka://hello/user/Akka] Greeting 1 for Akka
    [05.815] [hello-akka.actor.dd-5] [akka://hello/user/greeter] Hello World!
    [05.815] [hello-akka.actor.dd-5] [akka://hello/user/greeter] Hello Akka!
    [05.815] [hello-akka.actor.dd-4] [akka://hello/user/World] Greeting 2 for World
    [05.815] [hello-akka.actor.dd-5] [akka://hello/user/greeter] Hello World!
    [05.815] [hello-akka.actor.dd-4] [akka://hello/user/Akka] Greeting 2 for Akka
    [05.816] [hello-akka.actor.dd-5] [akka://hello/user/greeter] Hello Akka!
    [05.816] [hello-akka.actor.dd-4] [akka://hello/user/World] Greeting 3 for World
    [05.816] [hello-akka.actor.dd-6] [akka://hello/user/Akka] Greeting 3 for Akka

Status on DkSDK

Really important point. The OCaml / C code is not tied to CLion.

Load it and run into another IDE (Visual Studio Code)

Status on DkSDK

On Monday April 24, 2023 will release the CMake/IDE build system. Not yet the Actor framework, but you can do C + OCaml + CMake on several targets:

https://gitlab.com/diskuv/DkHelloWorldActor/-/pipelines/819738886

Status on DkSDK

Enablers:

  • You can do your normal development in Visual Studio, Xcode, Android Studio or Visual Studio Code.

Status on DkSDK

Enablers:

  • Your SwiftUI (iOS) or JetPack (Android) contractor can build your mobile app … they copy a few lines of code into their projects … and press Build.

  • Need a new field from the backend populated on the frontend? They can copy and paste a line, press Build, and continue front-end work.

  • No concern for “do they know OCaml / Akka / Erlang / …”

Status on DkSDK

Who for?

  • People who are pinched for tech resources.

  • Startups, IT service orgs.

  • Can visit https://diskuv.com and click on DkSDK to see documentation for how you would interact with it.

  • Will be available Monday, April 26 with more complete docs

Status on Hiring

  • DkSDK is mix of C and OCaml.

  • Can choose (and did) for first FTE to be C programmer.

  • Next (not hiring right now) will be front-end or an OCaml expert.

  • “Risky” additions were high-school OCaml trainees.

Status on Hiring

Two purposes getting high school OCaml trainees:

  1. Want someone to work on OCaml stuff until we are at point where we need a genuine OCaml expert.

  2. Want to demonstrate to potential DkSDK customers that adopting OCaml is not going to break your team. If high school students with AP CS can train in ~150 hours and be productive, likely your team can too. Also, putting money where your mouth is good.

That second point justified the risk.

Status on Hiring

Expectations?

  • Differences from college internship: High schoolers have far far less hours during school, and parents are in control of hours.

  • College interns have experience doing large projects. Cannot expect a high school intern to break down large tasks.

  • Default expectation is not to work during high school, even in summer.

Status on Hiring

Training Structure

  • Caution: Minors! No specifics (what, why)

  • Varied goals/situations; 2 interns and 2 contributors

  • Trained for ~150 hours. Own rate dependent on course load and extracurriculars, and get tested.

  • Goal for testing is to demonstrate they can repeatedly do 1 hour coding tasks correctly given 5 minutes of instruction; tasks same difficulty as a college intern. Expectation is during summer 1 hour grows to 1 day.

Status on Hiring

  • One of four has completed hours and assessment. Getting a summer offer.

  • Remaining three’s extracurriculars have eaten time (often down to 0 hours in a week). But very very confident most if not all of them will be passing the final test, and will work on a project or a return internship in the summer.

  • Bottom line: Very proud of all my student trainees. Huge success

Final Thoughts

  • Supporting Windows? Use data from real projects (ex. libuv) to convince your language peers that first class Windows means MSVC, not GCC

  • Tools you can re-use? with-dkml proxy for your own UNIX-needing binaries. opam for doing language agnostic packaging.

  • Train and hire high schoolers! Excellent way to prove that your language or tools are simple to use.

  • Consider DkSDK if you have tech/vendor lock-in concerns. https://diskuv.com