I'm as shocked as you, but the Darklang backend rewrite is actually complete

I'm as shocked as you, but the Darklang backend rewrite is actually complete
The Tower of Babel by Pieter Bruegel the Elder (1563)

For the first few years of the life of Darklang, each time we didn't have a library available for our OCaml-based backend, or decided to build a feature in our DB instead of on a proper cloud server, we said "ugh, let's hack this and we can fix it when we rewrite it in Rust". Realizing that Dark was going to be running with a small team for a long time, it was clear that we couldn't keep piling on that sort of tech debt.

And though the rewrite ended up being in F#, rather than Rust, in late 2020 I embarked on a project to rewrite our OCaml-based backend to put us on strong foundations for building Dark in the future.

Before we go through a timeline, let me set a bit of context about the makeup of the Dark:

  • the editor: this runs in the browser, is written in ReScript, and talks to a backend that we call the ApiServer
  • customers' code runs on builtwithdark.com. We call the HTTP server that runs the code the BwdServer
  • Dark supports Cron Jobs and background Workers: the CronChecker checks whether to put a job in the queue, and QueueWorker fetches and runs those jobs and any jobs that are created with the emit function.
  • The core of Dark is an interpreter, along with a standard library, and some auxilliary functions to support all that. We call this LibExecution, and it's a key part of ApiServer, BwdServer and QueueWorkers.
  • LibExecution is also compiled to run in the browser using Wasm
  • All of the backend services are run in Google Cloud, using Postgres and Kubernetes. Monitoring/telemetry is done using HoneyComb+OpenTelemetry as well as Rollbar.

In summary, we needed to port LibExecution, all the standard library, the ApiServer, BwdServer, QueueWorker, CronChecker, make LibExecution work in the browser using Wasm, and ensure that everything was connected to our infrastructure and sufficiently instrumented to operate well.

With that context provided, here's a rough timeline:

How’s the Dark rewrite going?
A few months ago I started to rewrite [https://blog.darklang.com/leaving-ocaml/] the Dark backend in F# [https://blog.darklang.com/new-backend-fsharp/]. I’m currently about 60% of the way through the rewrite. All of the majorcomponents have been partially transitioned, and it’s a question of fini…
  • Mar 29, 2021: All Dark code is stored in the database, serialized at this point using OCaml-specific serializers, for which no compatible F# serializer exists. I had created a library with those serializers, communicating with F# using a thin C layer, and the OCaml and .NET FFIs. I finally gave up and just used a HTTP server for this layer instead.
  • May 5, 2021: Upgrade to .NET 6
  • Jun 2021: another investor update: "I expect to be done by September."  🙄
  • Jul 22, 2021: Posted and sent out an important announcement to users, so that they'd know there would be some differences in the newly written backend. Also said "My goal is to finish in September, though there's a reasonable chance it will slip to October."
  • Aug 23, 2021: I realized our performance was extremely poor, and ripped out our own homegrown version of Tasks to try and address it. But then I realized that we were accidentally tracing every function instead of just the pure ones. Whoops.
  • Sep 1, 2021: I was very sad to realize that our switch from ints to BigIntegers (infinite precision integers by default) was not working out. As a result, I rolled it back.
  • Sep 10, 2021: Added the first piece of F# (the ApiServer) to our Kubernetes cluster!
  • Nov 2, 2021: Not strictly necessary but I rewrote our integration tests in Playwright (from TestCafe). This sped them up massively, also sadly getting more flaky due to exposing more race conditions :(
  • Nov 4, 2021: Moved to F# 6! (with some benchmarks)
  • Nov 26, 2021: Added an ExecHost, to run admin commands in production.
  • Dec 21, 2021: ported the queues and crons to F#
  • Dec, 2021: while interviewing Stachu, said "I expect to be finished the rewrite in February" 👍
  • Dec 28: 2021: Enable the first piece of the new backend, the CronChecker, in production 🎅
  • Jan 4, 2022: Finish porting all the OCaml unit tests
  • Jan 11, 2022: Switched from the old, and arguably deprecated JSON.NET, to the newer, shinier, and much much faster, System.Text.Json, for as much code as we could
  • Feb 3, 2022: tried and failed to connect actual Dark code to an actual http request
  • Feb 13, 2022: Figure out the devops to get the ApiServer working for some sample canvases
  • Feb 22, 2022: Completed porting all 6 versions of the HTTP clients, completing the final portions of the standard Library
  • Feb 26, 2022: Made the first HTTP request to the BwdServer in production.
Darklang year in review - 2021
We just slipped into March, so this is as good a time as any to review whathappened in Darklang in 2021. While the bulk of this post is technical detailsabout the rewrite, I have included some company details at the bottom too.Enjoy! For context, Darklang is an
Try out the new Darklang backend
We’re completing our migration to the new backend, which we’ve previouslydiscussed a [https://blog.darklang.com/darklang-year-in-review-2021/] few[https://blog.darklang.com/hows-the-dark-rewrite-going/] times. Progress isgoing well, and all https://darklang.com requests (including all editor usag…
HTTP requests moving from the old OCaml backend (purple, top-left to bottom-right) to the new F# backend (orange, bottom-left to top-right). Lines are number of HTTP requests hitting each backend

At this point, we were extremely confident of being done in May. There were only two things left to do, and both had all the code basically written.

  • QueueWorker: We attempted to port the QueueWorker over directly, but alas there were significant problems with the old version, and they got much worse when we added in multi-threading. So the QueueWorkers had to be rewritten entirely, moving to use cloud services directly (Google PubSub). After being rewritten, this was the transition from the old queues to the new queues:
Purple (top-left to bottom right) is the old queues running in OCaml. Orange (bottom-left to top-right) is the new queues written in F#. Lines are number of jobs run by each backend.
  • Blazor: Blazor is the .NET technology we use to compile LibExecution into Wasm. We had an initial version working, connected to JS, way back in early 2021. Alas, actually getting it fully working led to Stachu needing to fight with lots and lots of little bugs.

The final bug fix shipped June 5, completing the final step in the port to F#. 🎉

A word of thanks to the many contributors, consultants, and employees, who helped with the rewrite. Thanks to Araceli Sánchez, Lev Lazinskiy, María José Dávila, Matthew Jeffryes, jwalter, Daniela Campagna, Sean Manton, and especially Stachu Korick.

The full release notes are here, and the codebase is here.

Discuss on Twitter or HN.