FizzBoom benchmark (call for participation)
I've been working recently on a benchmark, to try and see how to get the most performance from the Dark backend. I've reimplemented the core of Dark in various languages and web frameworks, notably in OCaml, F# and Rust.
As a reminder, Dark is a language and platform for building backend web services and APIs. Implementation-wise, it's basically an interpreter hooked to a webserver and a DB. The language is a statically-typed functional, immutable language which is garbage collected.
Dark users can write arbitrary code that runs on our server, including making HTTP calls to slow and badly-behaving 3rd-party webservers. This means we need to efficiently support both computation and IO on the server. This benchmark is meant to measure that.
FizzBoom
FuzzBuzz is a well-known interview question, that may or may not have been appropriate in a bygone era, but that remains in existence today due to interviews that have not moved on. It asks you to list 100 numbers, and if they are divisible by 3 or 5 you print "fizz" or "buzz" respectively.
FizzBoom is the same benchmark, except instead of printing "fizzbuzz" when a number is divisible by both 3 and 5, you instead make a HTTP call to a local server which takes 1 second to respond. Here's what this looks like in Dark:
Call for participation
Typically, someone builds benchmarks and then releases them to the wild. Invariably, they've overlooked something, and hordes of language advocates decry how it's unfair. To avoid this unfairness, I'd like to create an opportunity for language advocates to improve the benchmarks before I release them. I'll discuss the goals of this below, but feel free to jump right to the issues if you prefer.
The benchmark measures two numbers, calculated using wrk:
- requests per second for HTTP calls to
/fizzbuzz
, which returns FuzzBuzz as JSON - requests per second for HTTP calls to
/fuzzboom
, which return FizzBoom as JSON.
Preliminary numbers (thanks Chris for reporting these) indicate F# is great (25k req/s) and Rust and OCaml are doing ok (16k req/s each), for FizzBuzz.
Benchmark name Req/s
---------------------------------------------------------
fsharp-giraffe: 24962.95
fsharp-giraffe-async: 19476.78
fsharp-suave-async: 1147.71
fsharp-suave-partial-async: Skipping broken benchmark
ocaml-httpaf: 14034.62
ocaml-httpaf-lwt: 14158.74
rust-hyper: 15985.69
rust-hyper-async: Skipping broken benchmark
However, it also shows that we're not doing async right on any platform -- FizzBoom languishes at 1 req/s on all platforms. Obviously, this is because the code I wrote doesn't work, and is not actually a reflection on the languages and frameworks.
Benchmark name Req/s
--------------------------------------------------------
fsharp-giraffe: 1.00
fsharp-giraffe-async: 8.99
fsharp-suave-async: 1.00
fsharp-suave-partial-async: Skipping broken benchmark
ocaml-httpaf: 0.10
ocaml-httpaf-lwt: 0.10
rust-hyper: Invalid fizzboom output
rust-hyper-async: Skipping broken benchmark
Improving your favorite language's performance
If you're worried about your language doing well in the benchmark, or are simply looking to help, there are a number of things you can do:
- optimize your language's web server: I may have used a poorly performing web server, have it in a poor configuration, or have hooked things up poorly
- fix your language's async benchmark: when making a HTTP call to a 3rd-party webserver, the server should free the CPU to handle other requests while the IO is running.
I don't have that working correctly in any language yet (unsure why this isn't trivial in all platforms, but there we are)I have this working everywhere except for Rust. - optimize your language's build configuration: fix the build settings so that it's being optimized to the best of its ability.
I'm also interested, but very cautiously, in improving the implementation of the interpreters. The prime directive for the interpreters is that they're easy to modify and extend: Dark is a language going under much change, so rewriting the interpreters with a JIT, or assembly, or peephole optimizations, is not something I'm interested in. But small implementation changes that have big wins are valuable, whether they're applicable to all languages or show off specific features of your favorite language. All the same, I'm going to be quite conservative on this, I don't want to turn this into a game where we write non-idiomatic code to squeeze out a win that no-one would want to maintain.
If you'd like to help, I've filed the issues above, and written down some rules to keep the benchmark fair. I hope to release some benchmark results soon, once I've got the majority of them working.
You can sign up for Dark here, and check out our progress in our contributor Slack or by watching our GitHub repo. Follow Dark (or me) on Twitter, or follow the blog using RSS.