How to consistently use a set of parameters throughout the larger codebase.
I've done my fair share of data processing. And one thing that most big data software implementations I worked with didn't tackle correctly was parameter handling.
The process usually goes like this. You write a class and hard code:
unsigned int num_of_users = 1401;
Before you know it, the variable num_of_users
is used in ten places all over the code. Then you update num_of_users
to 1402
. But forget to update the variable at some of the places in the code. The code works but gives inconsistent results. After hours of searching for the needle in the haystack, you figure out what you forgot to update. OK, problem solved. Two days later, you're pulling out your hair because another variable is causing trouble.
Parameter object
In software I try to stick to the paradigma:
If you copy-paste your code/parameters, you are doing it wrong.
To avoid hardcoding and parameter pasting in a large codebase, I create an object that holds the values of all my parameters. Store, all parameters in one object, then access them whenever they are needed:
class Parameters {
pubic:
unsigned int num_of_users;
// rest of parameters
Parameters() {
this->num_of_users = 1501;
}
};
Then with initialization:
Parameters var;
we get access to all parameters via:
var.num_of_users;
Passing Parameters object around
Let's say we want to have one instance of our parameter object and then pass the pointer to that object around. Smart pointers come in handy here:
auto var = std::make_shared<Parameters>();
We can then pass the var
smart pointer into any part of the code/module that needs access to those parameters. For example, we need to pass the parameter to two processing modules:
ProcessingModuleA processing_module_a(var);
ProcessingModuleB processing_module_b(var);
Where ProcessingModuleA
is defined as:
class ProcessingModuleA{
public:
// rest of the module code
ProcessingModuleA(
const std::shared_ptr<Parameters> &var){
var.num_of_users; // accesible
// rest of the module code
}
};
Loading parameters
Another handy thing is if we can load parameters from the outside of our program. A JSON file, for example. So we can compile our C++
code once and then control the processing output via a JSON file. Advantages:
- The considerable development process speed up. Since we don't have to recompile the code at each parameter change.
- We can store the parameters we used to produce specific data set and make the data set reproducible. As long as we know what code version was used ;)
Now when we want to use the parameters.JSON
file as the input file the implementation becames a bit more technical. Example class implementation below:
#include <iostream>
#include <string>
#include "json.hpp"
class Parameters {
pubic:
unsigned int num_of_users;
// rest of parameters
Parameters(
const std::string ¶meters_location) {
json parameters = read_json_file(parameters_location, "parameters.JSON");
try {
this->num_of_users = parameters.at("num_of_users").get<unsinged int>();
}
catch(std::range_error &e) {
std::cout << e.what() << std::endl;
throw;
}
catch (...) {
std::cout << "Unknown exception occurred" << std::endl;
throw;
}
}
};
You should use try
and catch
block to ensure that you catch any errors that occur while reading the JSON
file. On what read_json_file
function does check the article JSON in C++: Read & Write
Benefits for the development process
Once the code is wired correctly, the processing becomes a breeze. Sure it takes a few minutes to set up. But once up and running, the loading of program parameters from the external JSON
file gives us a tremendous amount of speed up in the data analysis. As a result, we can get insights faster. Insights are what the whole big data and analytics are about anyway.
Top comments (0)