1BRC in C/C++
Try your hand at processing 12 GB of text using low-level C code! ⚡
Submit your solution!
Calculate the min, max, and average of 1 billion measurements
Don't see your favorite language listed above? Open an Issue to add it!
Choose one of the languages listed above to see the language-specific leaderboard and instructions for submitting your solution to that language's repository.
TODO: Make sure this is up-to-date
Time | Solution | Language | Author | |
---|---|---|---|---|
1. | 6.159s | link | Java | royvanrijn |
2. | 6.532s | link | Java | Thomas Wuerthinger |
3. | 7.620s | link | Java | Quan Anh Mai |
4. | 9.062s | link | Java | obourgain |
5. | 9.338s | link | Java | Elliot Barlas |
6. | 10.589s | link | Java | Artsiom Korzun |
7. | 10.613s | link | Java | Sam Pullara |
8. | 11.038s | link | Java | Andrew Sun |
9. | 11.222s | link | Java | Jamie Stansfield |
10. | 13.277s | link | Java | Yavuz Tas |
4m 13.449s | link | Java | Reference implementation |
You can view language-specific leaderboards on each language's competition page.
Your mission, should you choose to accept it, is to write a program that retrieves temperature measurement values from a text file and calculates the min, mean, and max temperature per weather station. There's just one caveat: the file has 1,000,000,000 rows! That's more than 10 GB of data! 😱
The text file has a simple structure with one measurement value per row:
Hamburg;12.0
Bulawayo;8.9
Palembang;38.8
Hamburg;34.2
St. John's;15.2
Cracow;12.6
... etc. ...
The program should print out the min, mean, and max values per station, alphabetically ordered. The format that is expected varies slightly from language to language, but the following example shows the expected output for the first three stations:
Hamburg;12.0;23.1;34.2
Bulawayo;8.9;22.1;35.2
Palembang;38.8;39.9;41.0
Oh, and this input.txt
is different for each submission since it's generated on-demand. So no hard-coding the results! 😉
Choose a language from the cards at the top of this page to get started! 🚀
No external library dependencies may be used. That means no lodash, no numpy, no Boost, no nothing. You're limited to the standard library of your language.
Implementations must be provided as a single source file. Try to keep it relatively short; don't copy-paste a library into your solution as a cheat.
The computation must happen at application runtime; you cannot process the measurements file at build time
Input value ranges are as follows:
There is a maximum of 10,000 unique station names.
Implementations must not rely on specifics of a given data set. Any valid station name as per the constraints above and any data distribution (number of measurements per station) must be supported.
Some languages have special instructions but in general here's what you can expect:
Create a fork of the 1BRC repository for your language on your own GitHub profile. This will let you submit your solution via a pull request.
Somehow create a new implementation file in the repository. This will vary by language. For example in JavaScript you might create a new src/<username>.js
file while in C++ you might make a new src/<username>.cpp
file. It's recommended to copy the default reference solution to get started and then modify it from there.
Make that implementation fast. Really fast.
Test & benchmark your solution! There's usually language-specific instructions on how to do this but in general you run <some-command> bench <username>
to run your solution against the reference implementation. If you see any differences, fix them before submitting your implementation.
Create a pull request against the upstream repository! 🎉 There's usually some additional instructions in the Pull Request template on information you should include like how long it took on your computer and your computer's specs.
Someone or some robot will run your solution "officially" on the same hardware as everyone else's solution (so no hardware differences) and report the results. If you're the fastest, you win! 🏆 If not, you'll still probably go on the leaderboard. 🥉
If you'd like to discuss any potential ideas for implementing 1BRC with the community, you can use the GitHub Discussions of this @1brc GitHub organization or the language-specific repository discussions. Please keep it friendly and civil.
If you enter this challenge, you may learn something new, get to inspire others, and take pride in seeing your name listed in the scoreboard above. Rumor has it that the winner of the Java competition (the original challenge language) may receive a unique 1️⃣🐝🏎️ t-shirt, too!
Make sure you check your language-specific FAQ as well. 😉
The file is encoded as UTF-8.
No. While only a fixed set of station names is used by the data set generator, any solution should work with arbitrary UTF-8 station names. For the sake of simplicity, names are guaranteed to contain no ;
character.
Yes, you can. The primary focus of the challenge is about learning something new, rather than "winning". When you do so, please give credit to the relevant source submissions. Please don't re-submit other entries with no or only trivial improvements.
Probably not. 😊 1BRC results are reported in wallclock time, thus results of different implementations are only comparable when obtained on the same machine. If for instance an implementation is faster on a 32 core workstation than on the 8 core evaluation instance, this doesn't allow for any conclusions. When sharing 1BRC results, you should also always share the result of running the baseline implementation on the same hardware.
It's the abbreviation of the project name: the One Billion Row Challenge.