SPO600 Final Project Update – Env on AArch64 and Better Test Data

Since my last update on my final SPO600 project, I have replicated my development environment on the school’s AArch64 server. I did indeed run into the problems I expected. The “json” Ruby gem, as simple as that may sound, would not install on the server. The server didn’t have a header file that the gem’s C extension required. I tried finding out which system packages I would need to install to get it working, but I didn’t want to be randomly installing various things on a server shared by every student in the class, so I ended up using the “json_pure” gem instead. This gem is an implementation in pure Ruby code that doesn’t have any C extensions. If I were creating a web app running in production with many users, that would be a problem. The “json_pure” gem parses and generates JSON data about 30x slower than the “json” gem. However, this is just for benchmarking with this project. Maybe my benchmarks will run a bit slower, but that’s not a big deal. I should still be able to run them fast enough.

I also replaced my random, real world test data of varying string lengths with what I think is better for testing just the hash function. I’m taking data from /dev/urandom and saving it in a few files, each 100 MB. I’ll refer to this henceforth as “random file”s. My Ruby insert script reads the first file and one line at a time inserts the line as a record into Redis. With this new approach, the exact same data is going into Redis each time I run my benchmark. The data is also more random than the data contained in my serialized Ruby class from before.

Later on, I’ll be running additional tests which will compare the values the hash function produces before and after optimization, to make sure they’re still generating the same hash – that my optimization hasn’t broken the proper function of the program. To do that, I’ll be comparing the hashes produced by multiple random files. For that test, I expect that I won’t be able to use my existing test scripts, since using a proper Redis client to insert data doesn’t expose the hashes produced by this hash function to the client. It isn’t a part of the public API. I’ll have to use add more code inside the Redis code base to write the hashes to files, which I can then compare to look for mismatches, or perhaps I could extract just the hash function into a small, custom C program I write to test just that.

Next Steps

My next steps will be to begin optimizing the hash function now that I have my development environments set up on both my laptop (for making sure I’m making changes to the Redis code base that actually compile) and on the school’s AArch64 server (for the benchmarking itself).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s