I'm as shocked as you, but the Darklang backend rewrite is actually complete
For the first few years of the life of Darklang, each time we didn't have a library available for our OCaml-based backend, or decided to build a feature in our DB instead of on a proper cloud server, we said "ugh, let's hack this and we can fix it when we rewrite it in Rust". Realizing that Dark was going to be running with a small team for a long time, it was clear that we couldn't keep piling on that sort of tech debt.
And though the rewrite ended up being in F#, rather than Rust, in late 2020 I embarked on a project to rewrite our OCaml-based backend to put us on strong foundations for building Dark in the future.
Before we go through a timeline, let me set a bit of context about the makeup of the Dark:
- the editor: this runs in the browser, is written in ReScript, and talks to a backend that we call the ApiServer
- customers' code runs on builtwithdark.com. We call the HTTP server that runs the code the BwdServer
- Dark supports Cron Jobs and background Workers: the CronChecker checks whether to put a job in the queue, and QueueWorker fetches and runs those jobs and any jobs that are created with the
- The core of Dark is an interpreter, along with a standard library, and some auxilliary functions to support all that. We call this LibExecution, and it's a key part of ApiServer, BwdServer and QueueWorkers.
- LibExecution is also compiled to run in the browser using Wasm
- All of the backend services are run in Google Cloud, using Postgres and Kubernetes. Monitoring/telemetry is done using HoneyComb+OpenTelemetry as well as Rollbar.
In summary, we needed to port LibExecution, all the standard library, the ApiServer, BwdServer, QueueWorker, CronChecker, make LibExecution work in the browser using Wasm, and ensure that everything was connected to our infrastructure and sufficiently instrumented to operate well.
With that context provided, here's a rough timeline:
- Aug 2020: Began to investigate doing a backend rewrite. Did some spikes, ran some experiments, played with performance.
- Sep 2020: Decide to use F#. See [1, 2, 3, 4] for how the decision was made.
- Oct 2020: remove some old features we didn't use, pay off some tech debt, and make things operationally fine.
- Oct 10, 2020: Added first F# service
- Oct 15, 2020: Verify that LibExecution could be compiled to Wasm
- Nov 17, 2020: First version of the BwdServer
- Dec 14, 2020: Added the first version of the ApiServer
- Dec 16, 2020: add the start of a Fuzzing framework, which helped ensure the old and new code worked the same.
- Jan, 2021: With the basics done, this started a period of lots and lots of standard library and testing work, as well as expanding out the existing services. Most of 2021 was spent on this in one way or another.
- Feb 13, 2021: port over all the database functions, including the SqlCompiler
- Feb 2021: wrote to investors to update them, said the rewrite would be done in 6 weeks. 😬
- Mar 14, 2021: It was important that Dark code was bug for bug compatible in the old and new version. In March I really clamped down on that, finding and removing many many tiny differences and adding tests for them
- Mar 19, 2021: Released "How's the Dark rewrite going" blog post, claiming "I'm currently about 60% of the way through the rewrite. All of the major components have been partially transitioned, and it's a question of finishing everything out." 😂
- Mar 29, 2021: All Dark code is stored in the database, serialized at this point using OCaml-specific serializers, for which no compatible F# serializer exists. I had created a library with those serializers, communicating with F# using a thin C layer, and the OCaml and .NET FFIs. I finally gave up and just used a HTTP server for this layer instead.
- May 5, 2021: Upgrade to .NET 6
- Jun 2021: another investor update: "I expect to be done by September." 🙄
- Jul 22, 2021: Posted and sent out an important announcement to users, so that they'd know there would be some differences in the newly written backend. Also said "My goal is to finish in September, though there's a reasonable chance it will slip to October."
- Aug 23, 2021: I realized our performance was extremely poor, and ripped out our own homegrown version of Tasks to try and address it. But then I realized that we were accidentally tracing every function instead of just the pure ones. Whoops.
- Sep 1, 2021: I was very sad to realize that our switch from ints to BigIntegers (infinite precision integers by default) was not working out. As a result, I rolled it back.
- Sep 10, 2021: Added the first piece of F# (the ApiServer) to our Kubernetes cluster!
- Nov 2, 2021: Not strictly necessary but I rewrote our integration tests in Playwright (from TestCafe). This sped them up massively, also sadly getting more flaky due to exposing more race conditions :(
- Nov 4, 2021: Moved to F# 6! (with some benchmarks)
- Nov 26, 2021: Added an ExecHost, to run admin commands in production.
- Dec 21, 2021: ported the queues and crons to F#
- Dec, 2021: while interviewing Stachu, said "I expect to be finished the rewrite in February" 👍
- Dec 28: 2021: Enable the first piece of the new backend, the CronChecker, in production 🎅
- Jan 4, 2022: Finish porting all the OCaml unit tests
- Jan 11, 2022: Switched from the old, and arguably deprecated JSON.NET, to the newer, shinier, and much much faster, System.Text.Json, for as much code as we could
- Feb 3, 2022: tried and failed to connect actual Dark code to an actual http request
- Feb 13, 2022: Figure out the devops to get the ApiServer working for some sample canvases
- Feb 22, 2022: Completed porting all 6 versions of the HTTP clients, completing the final portions of the standard Library
- Feb 26, 2022: Made the first HTTP request to the BwdServer in production.
- Mar 2, 2022: Moved our first user to the the BwdServer in production. But alas we reverted this.
- Mar 2, 2022: Wrote the "2021 year in review", discussing progress on the rewrite.
- Mar 10, 2022: Completed tests for every function
- Mar 16, 2022: Start using a very fast .NET binary serialization framework, as loading code through JSON (twice!) before using the OCaml-based binary serialization was too slow.
- Mar 18, 2022: First attempt to switch everyone over to the new ApiServer. Alas it failed.
- Mar 22, 2022: actually switch all users the new ApiServer
- Apr 2, 2022: fully switched all users to the new F# ApiServer 🎉
- Apr 6, 2022: Asked for volunteers to switch their canvases over to the new F# BwdServer backend
- Apr 7, 2022: Moved the first volunteers over to the BwdServer.
- May 5, 2022: The HTTP backends for builtwithdark.com were fully switched over. 🎉
At this point, we were extremely confident of being done in May. There were only two things left to do, and both had all the code basically written.
- QueueWorker: We attempted to port the QueueWorker over directly, but alas there were significant problems with the old version, and they got much worse when we added in multi-threading. So the QueueWorkers had to be rewritten entirely, moving to use cloud services directly (Google PubSub). After being rewritten, this was the transition from the old queues to the new queues:
- Blazor: Blazor is the .NET technology we use to compile LibExecution into Wasm. We had an initial version working, connected to JS, way back in early 2021. Alas, actually getting it fully working led to Stachu needing to fight with lots and lots of little bugs.
The final bug fix shipped June 5, completing the final step in the port to F#. 🎉
A word of thanks to the many contributors, consultants, and employees, who helped with the rewrite. Thanks to Araceli Sánchez, Lev Lazinskiy, María José Dávila, Matthew Jeffryes, jwalter, Daniela Campagna, Sean Manton, and especially Stachu Korick.
The full release notes are here, and the codebase is here.