The internet is buzzing with the One Billion Row Challenge right now! The task is straightforward – write a Java program to fetch temperature measurements and calculate the min, max, and mean temperature per weather station. Yes, that’s dealing with a whopping 1 billion rows! You can check out the challenge details here: One Billion Row Challenge.
I've taken up this challenge myself. But, as I'm not a Java developer, I opted for SQL instead. Moreover, I approached this from a business angle, asking myself: How can I achieve peak performance with minimal effort? So, I decided to use Autonomous Database – Dedicated (ADB-D), Oracle’s state-of-the-art cloud database service, to tackle the challenge while harnessing its new In-memory (IM) capabilities.
Step 1: Generating the Data
I followed the steps outlined on this GitHub page to generate the data file, which turned out to be around 13 GB. After creating it, I uploaded this big file to Object Storage.
Step 2: Setting Up Autonomous Database with In-Memory
Creating an Autonomous Database with the In-memory option was a breeze. A few clicks to enable the feature and select the desired in-memory ratio, and I was all set.
Step 3: Running SQL with Ease
With the OCI native Database Tools, I didn’t need to install anything on my laptop to run SQL queries.
Step 4: Preparing for Data Import
First things first, I created a credential in the SQL worksheet to load data from Object Storage.
Step 5: Creating and Loading the Table
The table setup was simple – just city and temperature fields. I used the DBMS_CLOUD.COPY_DATA utility for loading. Alternatively, you could use the Database Actions’ Data Studio for a GUI-based experience, but I’m detailing everything SQL here.
Step 6: Loading Data into Oracle In-Memory
Once the table was ready (took about 1 minute to load), I executed a simple SQL statement to load the table into Oracle in-memory.
And now I will run the SQL query to generate the output:
The Results? Astounding!
The query ran in just 2.84 seconds! According to the GitHub leaderboard, that puts us at the top with minimal effort. And if we skip the plotting part, required by the challenge, it gets even faster – down to 1.3 seconds!
For this challenge, I utilized 16 ECPU with a total of 21 GB of RAM for in-memory. In comparison, all the Java tests were conducted on a 32 core AMD EPYC™ 7502P (Zen2) with 128 GB RAM.
Why Overcomplicate When You Can Simplify?
So, there you have it – my journey through the One Billion Row Challenge. It really makes you think, why spend more resources when you can achieve faster results with fewer cores and less memory? This challenge showcases the sheer power and efficiency of Autonomous Database – Dedicated (ADB-D), especially with its in-memory capabilities. It's a classic example of doing more with less, proving that sometimes the simplest approach can lead to the most impressive results.
Try this by yourself without any spending with Always Free Autonomous Database and let me know how about your experience!
Top comments (0)