I'm currently about 60% of the way through the rewrite. All of the major components have been partially transitioned, and it's a question of finishing everything out.
I'll discuss below some of the interesting technical things that came up. I'll make a separate post later comparing F# and OCaml, which some people asked for, now that I've had a lot more practical experience with F#, but that's not the topic here.
If you're interested in helping out, please do! The easiest place to start is by porting standard library functions, there are more than 200 which need to be ported! It's really easy: simply uncomment the function and fix the small things that pop up. We have instructions on how to get started on our codebase, and a lot of issues on Github; also please don't be a stranger in the Dark Slack!
Migration or incremental rewrite?
I was asked whether the migration would be incremental or wholesale. I would describe it as mostly wholesale, with some small incremental aspects.
One of the important things about Dark is that everything that works in Dark today will continue to work. That's kinda the point of Dark! So this isn't like Python 3 where you have to do a port before your code can be deployed: there will be a day where your code used to run in the old version of Dark, and then at some point the same code will run unaltered in the new version of Dark (without you doing anything). As a result, the transition has to be smooth. Which means more or less everything has to be ported over before we can start to transition.
The main issue with this is risk: it's possible something might not work in the new stack, stopping us from transitioning despite having done all the work. As a result, the first thing I did was go through each subsystem, porting just enough to be certain that a full port would work.
Avoiding the Second-system Effect
The Second System Effect is when you build a new version of something and try to solve all issues it had, thus blowing up the scope and ensuring you never finish the second system.
So obviously I wanted to avoid that, but also I did actually want to fix a bunch of the stuff that made the Dark backend complex, as well as reduce the complexity of the codebase. I did this by basically doing some things "properly" (that is, falling a little bit into the second-system fallacy), and pulling back when things got complicated or boring.
After maybe fixing 5-6 big things, I realized that was enough and stopped. The rest of the code is being translated in pretty-much a direct 1-1 fashion.
OK, let's talk about the big differences between the old version and the new version. Note that this is not a list of where OCaml is bad and F# is good - rather just that with the experience of having written it once before, now we can do it better.
Async vs non-async
The old Dark runtime did not use OCaml's async libraries, such as Lwt or Async. The F# version is using .NET Tasks. As a result, the new runtime should be significantly more efficient and support a lot more users with less hardware and cost.
(I couldn't find a good thing to link to here that discussed Tasks in F#, as most of the async discussion uses
async, which is different. However, there was never anywhere I felt I didn't have full Task support in F#, which is not the experience others have had).
Using Tasks was interesting in the context of building an interpreter. The Dark interpreter is a recursive-descent interpreter in which you can't tell in advance whether the expression you're evaluating needs to return a Task or not. (Functions which do IO, like accessing the DB or making a HTTP call, will need a Task, other actions like adding two numbers don't. But if you call something like
List::map, you might pass a function which uses IO, so the basic assumption I used is that I need to support Tasks everywhere.)
If you're evaluating a string or integer, you don't want to create a Task only to throw it away a moment later. To avoid this, each expression is evaluated to a TaskOrValue, which is simply this:
type TaskOrValue =
| Task of Task<Dval>
| Value of Dval
This means that code which doesn't need tasks can be evaluated like this:
match expr with
| EString (_id, s) -> Value (DStr(s.Normalize()))
| EBool (_id, b) -> Value (DBool b)
| EInteger (_id, i) -> Value (DInt i)
| EFloat (_id, value) -> Value (DFloat value)
match e with
| EBlank id -> return! (incomplete id)
| EPartial (_, expr) -> return! eval state st expr
| ELet (_id, lhs, rhs, body) ->
// FSTODO: match with ast.ml
let! rhs = eval state st rhs
let st = st.Add(lhs, rhs)
return! (eval state st body)
| EString (_id, s) -> return (DStr(s.Normalize()))
| EBool (_id, b) -> return DBool b
| EInteger (_id, i) -> return DInt i
| EFloat (_id, value) -> return DFloat value
let! bindings: those evaluate the TaskOrValue and allow it's value to be used by later code.
return wraps a normal value in a
return! passes an existing
TaskOrValue right through.
In my initial benchmarks, this optimization doubled the performance (in terms of requests per second).
It had an interesting side-effect that needed to be handled. As I mentioned, Dark programs needs to have identical behaviour in the old (syncronous) backend as in the new (async) one. Which means I don't want Dark programs to be async, just the runtime - that is, if your program makes two HTTP calls, I want to keep the current behaviour where the first HTTP call finishes before the second starts.
As a result, I made a set of mapping primitives (map, filter, etc) that act on TaskOrValues and fully resolve the previous task before starting a new one.
Splitting apart the webserver
The webserver in Dark hasn't exactly matured gracefully. It's had more and more shit grafted onto it, to the point that the old Dark webserver was a complete mess.: feast your eyes on 2400 lines of horror.
There are two conceptual webservers used in Dark. The API server handles the editor for Dark users, logging people in, fetching traces, editing code, etc. The other server, which I'm calling the BWD server (for builtwithdark.com), serves HTTP requests for Dark programs to "grand-users" (our users' users).
In the new version I split the two servers into separate projects. We already ran these as separate services (just with the same code), but the separation into separate projects has so far this has led to much simpler, nicer code in each case.
In particular, the two modes have different needs around middleware and error handling: BwdServer should never fail and if it does, should never give out information about those failures to grand-users. Meanwhile, ApiServer is mostly used by logged-in users and should have useful error messages, so we can quite happily throw exceptions.
Refactoring into HTTP Middleware
BwdServer now uses HTTP middleware written in Dark. In the past, the HTTP framework was written in one big ugly blob, and we were never really sure how it worked or whether features were intentional or not (perhaps they were added to support the APIServer and just happened to also affect BwdServer?).
Now, the Dark middleware is written as a standard middleware stack. As it is now separate and composable, this can turn into a feature that allows you swap it out for different middleware. This is important because our current middleware has lots of bugs that are hard to fix. In the future we'll simply make a new version and allow you choose the middleware in your own code. I hope to allow users create their own middleware too. I expect the Dark HTTP framework to be able to mature nicely as a result.
The one downside is speed - the old HTTP framework was compiled and the new one is interpreted. It may be that this is too slow in the long run if this is run on every request - we'll have to see how it goes.
Testing is way better (and Dark now has parser)
Dark doesn't have a parser. All changes to programs are made in the editor, which manipulates the program's Abstract Syntax Tree (think of it as objects that represent your program, like in the DOM) based on your keystrokes.
This means that tests were ugly and annoying to write. Here's an example of an old test:
let t_lambda_with_foreach () =
(Dval.dstr_of_string_exn "SOME STRING")
[ fn "String::toList_v1" [str "some string"]
[fn "String::fromChar_v1" [var "var"]]) ]
; str "" ]))
Tired of writing tests like that, I took a new approach. Since Dark is a very similar language to F#, I figured I can use the F# parser to parse Dark code. I then wrote tests in Dark-but-really-F#, converting the parsed F# automatically to Dark's AST.
Effectively, Dark now has a parser, albeit with terrible error messages. This means tests now look like this:
(let y = (fun x -> x + 1) in
List.map_v0 [1;2;3;4] y) = [ 2; 3; 4; 5 ]
Often, 50 line tests can now be represented in 4 lines. This has helped dramatically increase the test coverage, and tests have become much more readable.
Ints are now infinite-precision
Ints in Dark have been a little confused. In the editor, Ints used the ReScript native int which uses 31 bits. In the actual backend runtime, we used OCaml's 63-bit ints (I dont remember the reason for this, but it was an unpleasant hack, even at the time). In F#, ints are 32-bit (though there are int64s if we want them). Given this was going to be a mess any way we sliced it, I decided to go right to the intended implementation, which is to use infinite-precision integers. The downside here is that this is breaking behaviour if you have overflow -- however, no-one has checked or documented what Dark does in the presence of overflow, so fingers crossed this doesn't break anything.
In OCaml we were using Postgres-ocaml with a homegrown abstraction around it. We've now switch to .NET's Npgsql.FSharp. This is a really great library, and the abstraction it uses is actually extremely similar to what we were using anyway, except with way less boiler-plate error checking code. Compare the before:
let get_all ~state (db : db) : (string * dval) list =
"SELECT key, data
WHERE table_tlid = $1
AND account_id = $2
AND canvas_id = $3
AND user_version = $4
AND dark_version = $5"
[ ID db.tlid
; Uuid state.account_id
; Uuid state.canvas_id
; Int db.version
; Int current_dark_version ]
|> List.map ~f:(fun return_val ->
match return_val with
| [key; data] ->
(key, to_obj db [data])
| _ ->
Exception.internal "bad format received in get_all")
to the after
(state : RT.ExecutionState)
(db : RT.DB.T)
: Task<List<string * RT.Dval>> =
"SELECT key, data
WHERE table_tlid = @tlid
AND account_id = @accountID
AND canvas_id = @canvasID
AND user_version = @userVersion
AND dark_version = @darkVersion"
|> Sql.parameters [ "tlid", Sql.tlid db.tlid
"accountID", Sql.uuid state.accountID
"canvasID", Sql.uuid state.canvasID
"userVersion", Sql.int db.version
"darkVersion", Sql.int currentDarkVersion ]
(fun read -> (read.string "key", read.string "data" |> toObj db))
In addition, one of the problems in Postgres-OCaml is that your entire query has to return either binary or text results. This meant when we were looking up stored programs in Dark, we had to make one request for the code (which is stored in a binary format), and another request for the metadata (handler name, author, etc). This led to wasted query time, but more importantly super super ugly code.
Npgsql allows looking up binary and textual columns in the same query, and this led to code being much much more tidy. It also allows us to significantly reduce the number of queries being made in the future – we previously had to do one textual query to find out what data we wanted, then a binary one to fetch the data and a textual one to get metadata; instead we can probably just do one query.
Start on a real type system
Dark is supposed to be a statically typed functional language. As such, most of the language is designed in the same way as a statically typed functional language, except we forgot the type-checker! (In practice, the editor's feedback often told you the type some value had, but not always).
Functions in Dark had type information, but often the type information was very limited, with functions returning a "List" or "Option" instead of "List of ints" or "String Option". The new version uses much better types which should lead to much better information in the editor, as well as something to build a type checker off.
A new lower IR
In compiler terms, an Internal Representation is a set of types or classes representing the program. A compiler typically has multiple different IRs representing different stages of compilation; it's common to have an AST to represent the program pretty exactly, then a high-level IR, medium-level, and low-level IR that each have fewer high-level constructs and more low-level ones.
In the old representation of Dark, we only had one: the AST. Now we have two: ProgramTypes and RuntimeTypes. The major difference is that RuntimeTypes only has information that's useful to run the code. In particular the different ways to call functions (pipes, bin-ops, lambdas, and regular function calls) are all reduced to one type (called Apply).
Dark code is stored in our Postgres database. For speed, it's stored using a fast binary serialization using an OCaml library built by Jane Street, that has no compatible library in F#. As such, the simplest and most reliable way forward in the short term was to directly interface to the OCaml deserialization code.
The process to do this is a little involved. The functions are exposed from OCaml to C, using the OCaml FFI. F# can call C functions using the dotnet FFI which is called P/Invoke. Of course, between them is an intermediate layer in C that reads the bytes from one runtime and passes it safely to the other.
So then we just need a way to communicate the code. The OCaml backend had a number of JSON serializers for all the code, so I made matching ones in F#.
The result was a set of functions which completely abstracted the OCaml runtime.
One of the promises of Dark is that everything is backwards compatible. Lots of data was encoded in specific formats, and users might be relying on results that are formatted exactly like that (we have versioning to allow us switch to new ones).
This implies that the new code should work exactly like the old code. While I can change some internal representations, anything user-visible has to be identical.
To help with this, I've been relying quite a lot on fuzzers. Running fuzzers is sometimes known as property-testing (typically the name used by functional langs, while fuzzers is more commonly used by compiler-folk).
Fuzzers create random inputs and test if certain properties hold up. I've been testing properties like:
- does calling this function return exactly the same result in both the OCaml and F# implementations?
- do serialization functions work for all cases?
- if I call the new F# version of pretty-printer, is the text identical? Do I get code that can be read by the old OCaml deserializer (even if the text is not identical).
- is the user data stored in the database understood to have the same meaning in both implementations?
- is the hash of a particular Dark value identical for both implementations (necessary to match traces to their callers)?
I did a refactor of the SQL compiler (the SQL compiler compiles Dark programs to SQL so they can be used to query the DB. This is how we avoid having an ORM). Where before the information on how to compile various Dark StdLib functions was in the compiler, I've moved that all into the standard library definition, making it easier to understand and to implement new functions. Here's an example.
A few years ago I released a standard library for OCaml and ReScript called Tablecloth. The idea is that it's a thin layer on top of the built-in standard libraries that have a common API. Although the idea was that we could re-use code in OCaml and ReScript, that proved challenging even with Tablecloth, so now the main use is to lower the cognitive load: we can just use one set of functions in different languages, never having to pause to think what it's called in this language, or if the signature is different.
To smooth the transition to F#, I implemented Tablecloth in F# too. Now, Tablecloth represents a library which supports 3 different MLs (OCaml, ReScript, and F#) and is also extremely similar to the Elm standard library (on which it is based).
I intend to push that upstream to the Tablecloth repo soon, as well a release a new version of Tablecloth any day now.
What's been done?
In the last post, I had outlines a list of steps involved, let's see how that's looking:
- finish the interpreter [90% done (everything except recording traces)]
- connect Dark's F# backend to Dark's DB, so it can read the same data [DONE]
- compile Dark's F# interpreter to JS [60% done, needs to be wired up]
- make an OCaml library to deserialize the binary opcodes in the DB and covert them into F# (via, I presume, a JSON intermediary) [DONE]
- add a fuzzer to ensure the semantics of the OCaml and F# versions of Dark are the same (especially the library functions) [DONE]
- do all the devops (getting it into k8s, etc) [not done]
- add the APIs we have available in OCaml [60% done, only a few APIs done but the major difficulties have been resolved]
- add a way for users to safely try their code in the new version [not done yet]
- port over 170 or so functions we have in the standard library [50% done: all the major difficulties resolved, 200 functions remain to be ported]
- start recording traces using the new design (this is actually the main thing driving the whole change, believe it or not!) [this will be done after the port has gone live]
- use Honeycomb, Rollbar, and Google Cloud using their .NET native interfaces [not done]
- port our various services to F# (lower priority) [not done]
- port our accounts to self-managed from Auth0 [will wait until after we have shipped]
- translate the contributor docs to F# [DONE ]
If you're interested in helping, the major work to be done is porting the Standard Library, I would love help with this!
This is my notes file for what's left. These are challenging for others to get involved in, but if you're interested in getting involved I'd be happy to sit down to help you get started. Either send me an email, hit me up in Slack, or book a pairing session directly.
- Save traces
- Wire up blazor
- Pass around the CSRF token
- Port all unit tests
- Complete the API
- Fix the server hang on client load
- Sync programTypes with runtimeTypes
- Implement Tracing in the interpreter
- Complete interpreter (go over each line to ensure same behavior)
- Finish HTTP middleware
- Port HTTP client
- Enable Fantomas check in CI
- Benchmark to ensure no intolerable drop in performance
- Add honeycomb
- Add Rollbar
- Go through failwiths and determine how best to handle them
- Add other services (cron checker, queue scheduler, queue worker)
- Security audit
- Add health checks for k8s
- Choose the right number of Postgres connections
- Check that all things loaded to the client are the same values
- Test that DB:: functions can save and read every value from the production DB
- Static assets - (leave up an OCaml service just for this)