The Stratoneers project is an ongoing design and development team that aims to study cosmic ray bit flips in the stratosphere. I came onto the team as nobody knew how to complete the task of processing the terabytes of data we were collecting. I was able to create a novel algorithm for processing the data within minutes. This project taught me all about how to optimize code.
The design steps
I will break down the design steps I took to create the algorithm. The first step is to understand what we are trying to accomplish. We wanted to read in the data from our flash storage and check for bit flips.
The steps
-
Choosing the right language: I chose Rust as it is a systems programming language that is known for its speed and safety. I was concerned about memory safety and I knew that Rust would be a great choice for this project. Rust is also one of the languages that I’ve been trying to learn and I thought this would be a great opportunity apply it.
-
Basic Proof of Concept: I started by writing a basic proof of concept in Rust. This code simply read the data from the Linux Block device and checked for bit flips. I was able to get this working in about 30 minutes. However this was super slow about 3MB/s. I knew that I would have to optimize the code to get it to work at the rate we needed.
-
First Optimization Step: The first way I always try to optimize code is to use multithreading. I was able to use Rust’s
rayon
library to parallelize the code. This was able to get the code to run at about 12MB/s. This was a great improvement 4x (we had a 4 core cpu) but I knew that I could do better. -
Creating a Memory Mapper: I knew that the bottleneck was counting the bits. Since my raw throughput without counting the bits was about 3GB/s. I knew that I had to find a way to count the bits faster. I was able to speed the bit counting up by instead of reading bits one by one, I read in a byte, I know the maximum value of a byte is 255 so I created a memory map of 256 this way I was able to take in a byte then use that as the index of the memory map and increment the value at that index. This was able to get the code to run at about 500MB/s. This was a great improvement but It still needed to be faster.
-
Novel Algorithm: I was able to create a novel algorithm that was able to process the data at a rate of 3GB/s. I thought the fastest way to process the data would be to not process the data at all. This algorithm simply read the data from the block device and then used a low level CPU instruction called SIMD (Single Instruction Multiple Data) to add the integers together. For this CPU I could add 256 bytes together in just one CPU instruction. Then when I had added all the bytes together I was able to use an if statement to check if the value was 0. Because if the value was 0 then there were no bit flips. However if there was a bit flip then I could loop through the 256 bytes and check which bit was flipped. After this optimization the code was able to process the data at a rate of 3GB/s. It was even using less than 100% of the CPU.
-
Testing: I tested the algorithm by adding my own bit flips to the block device using hex editing. I was able to see that the algorithm was able to detect the bit flips and tell me which bit was flipped and at what address.
This whole project was completed in just 3 days.
I know I will use Rust in the future
Rust has definitely grown on me even though it has an absolutely brutal learning curve. I would be interested in seeing if I could use it on microcontrollers. The fact that the compiler imposes a lot of strict rules really makes sure your code is correct and safe. The package manager is also great which means that I can easily bring in libraries to help solve problems without reinventing the wheel every time.
Software Utilized