I am currently completing the final year of my BSc Computing Science degree at the University of Edinburgh. I've previously studied abroad at the University of Virginia Namespace Database – a Python tool that creates a database of edits from Wikipedia dumps. Although these dumps are 40GB XML files, this tool produces a database with 100x reduction in storage required. It is designed to be highly parallelised and with the ability to run on consumer hardware. A paper will be published on this project and hopefully presented on. The most challenging part of this project was designing a system that can run 24/7 with no downtime. This required careful error handling and logging. Plots from this project are available on my blog. Available on request.