Introduction
In this article, we explore three different approaches to fetching data from SQL databases in Golang: the standard method using JSON marshaling, a dynamic approach using maps, and an optimized method that avoids unnecessary overhead.
In my practice, there were very frequent cases where I had to retrieve data from a database with an unknown structure beforehand. This often happens in the e-commerce industry.
We will consider 3 methods of retrieving data, along with their advantages and disadvantages.
You can get all the code of the examples from the repository
https://github.com/oleg578/dbf
Data preparing
We will use the database MariaDB, because I just like this database. It has never failed me in 10 years of practice with a load of up to 500 million transactions per day with a data volume of up to 4 terabytes in e-commerce. And it’s really faster than MySQL.
Code for create MariaDB docker instance and seed test data are in https://github.com/oleg578/dbf/tree/main/db
All examples are tested on Ubuntu 24.04.1 LTS (Noble Numbat) with 11th Gen Intel Core i5–1135G7 and 16GiB RAM.
Standard way
The standard way is trivial — we fetch rows from database into array and then Marshal it into JSON struct
rs, errRs := con.QueryContext(ctx, q, numbRows)
...
for rs.Next() {
var dmy = Dummy{}
if err := rs.Scan(
&dmy.ID,
&dmy.Product,
&dmy.Description,
&dmy.Price,
&dmy.Qty,
&dmy.Date); err != nil {
panic(err)
}
result = append(result, dmy)
}
if err := rs.Err(); err != nil {
panic(err)
}
msg, errRTJ := json.Marshal(result)
if errRTJ != nil {
panic(errRTJ)
}
...
_, errOut := os.Stdout.Write(msg)
...
What about speed?
Test result:
% ./db2json_struct 10000000 1>/dev/null
Elapsed time: 12631 ms, HeapAlloc = 5400.725 MB, Sys = 7099.447 MB
10000000 records read
Let’s just remember this as a starting point.
Using map way
Next we consider fetching unknown list of columns — like “SELECT * FROM …”.
The sequence of actions is simple.
Each record will be represent as map[string]interface{}, then
// create result slice
// numbRows is number of rows in result
outRows := make([]map[string]interface{}, 0, numbRows)
We will not serialize each record to save program execution time, and our actions are not complex.
See https://github.com/oleg578/dbf/tree/mapping/example
After fetch rows from database, we will request an slice of columns
columns, err := rs.Columns()
Create slice of values and slice of pointers for data
values := make([]interface{}, len(columns))
valuePointers := make([]interface{}, len(values))
for i := range values {
valuePointers[i] = &values[i]
}
Then for each row we get the map which represent model — https://github.com/oleg578/dbf/blob/mapping/sql2json.go
func Row2Map(columns []string, values []interface{}) (map[string]interface{}, error) {
rowMap := make(map[string]interface{})
if len(columns) != len(values) {
return nil, errors.New("columns and values length not equal")
}
for i, col := range columns {
rowMap[col] = assignCellValue(values[i]) // we will help to typify the value
}
return rowMap, nil
}
func assignCellValue(val interface{}) interface{} {
if b, ok := val.([]byte); ok {
if floatValue, err := strconv.ParseFloat(string(b), 64); err == nil {
return floatValue
}
return string(b)
}
return val
}
Note:
You may want to pay attention to the function assignCellValue — its purpose is to pre-assign a type to column values. Simple trick — this function tells the JSON encoder which values to accept as non-numeric.
Benchmark:
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkRow2Map
BenchmarkRow2Map-8 7376358 159.0 ns/op 336 B/op 2 allocs/op
Finally, execution time of our example — https://github.com/oleg578/dbf/blob/mapping/example/main.go
% ./db2json_map 10000000 1>/dev/null
Elapsed time: 12152 ms, HeapAlloc = 8966.899 MB, Sys = 12248.287 MB
10000000 records read
What can we say about these results?
Benefits — We can obtain structures that are not predefined.
Disadvantages — there are several.
But we pay for this with memory consumption.
It’s easy to see that such an implementation consumes 1.5 times more memory.
The second disadvantage which can give us a headache are data types. We must define what we will access as numeric data, and what we will define as string. In current we use AssignValue auxiliary function:
func assignCellValue(val interface{}) interface{} {
if b, ok := val.([]byte); ok {
if floatValue, err := strconv.ParseFloat(string(b), 64); err == nil {
return floatValue
}
return string(b)
}
return val
}
If value can be represented as float — then we define it as interface value (json library will define it as numeric or null), else as string.
The third disadvantage is map structure property — we can’t guarantee order of fields in json. But this disadvantage may be unimportant.
We cannot say that the result we got is satisfactory.He devours memory. On small datasets this may be acceptable, but this can negatively affect processing with large amounts of data.
https://github.com/oleg578/dbf/tree/mapping
What can we improve?
Let’s look at the weak points in our algorithm of actions when we use maps.
This a creating map[string]interface{} for each row — this very expensive operation in terms of resources and processor time;
And another too expensive operation — JSON marshaling the final slice which can be very big.
It’s time to think about improvement
Let’s play with data structures
When we get data from a table, the order of columns is always defined by the database. Then we can use 2 coordinated slices — columns and values. Then we can refuse a map type and use slice.
The next trick — we will request data as byte slices and list of columns will fetch as list of ColumnTypes — it will help us in future.
columns, err := rs.ColumnTypes()
if err != nil {
log.Fatalf("fault get column types: %v", err)
}
values := make([]sql.RawBytes, len(columns))
valuePointers := make([]interface{}, len(values))
for i := range values {
valuePointers[i] = &values[i]
}
So, we are ready to fetch data in a new way and it’s time to serialize this data.
The JSON library is heavy, but we can make serialization easier.
Columns are simple token-level data usually — we can turn them into JSON strings, then we will just escape special symbols by our escape function -
func escape(in []byte) []byte {
var out bytes.Buffer
for _, b := range in {
switch b {
case '\n', '\r', '\t', '\b', '\f', '\\', '"':
out.WriteByte('\\')
out.WriteByte(b)
case '/':
out.WriteByte('\\')
out.WriteByte(b)
default:
if b < 32 || b == 127 {
out.WriteString(`\u00`)
out.WriteString(strconv.FormatInt(int64(b), 16))
} else {
out.WriteByte(b)
}
}
}
return out.Bytes()
}
Now let’s think about data types. We can determine the type of column (using sql ColumnType structure sql.ColumnType)
func isDigit(c *sql.ColumnType) bool {
switch c.DatabaseTypeName() {
case "TINYINT":
return true
case "SMALLINT":
return true
case "MEDIUMINT":
return true
case "BIGINT":
return true
case "INT":
return true
case "INT1":
return true
case "INT2":
return true
case "INT3":
return true
case "INT4":
return true
case "INT8":
return true
case "BOOL":
return true
case "BOOLEAN":
return true
case "DECIMAL":
return true
case "DEC":
return true
case "NUMERIC":
return true
case "FIXED":
return true
case "NUMBER":
return true
case "FLOAT":
return true
case "DOUBLE":
return true
default:
return false
}
}
And finally, let’s apply primitive serialization:
func Row2Json(columns []*sql.ColumnType, values []sql.RawBytes) (string, error) {
if len(values) == 0 {
return "", errors.New("no data in values")
}
if len(columns) != len(values) {
return "", errors.New("columns and values length not equal")
}
var buff strings.Builder
buff.WriteByte('{')
for i, val := range values {
buff.WriteByte('"')
buff.WriteString(columns[i].Name())
buff.WriteByte('"')
buff.WriteByte(':')
if len(val) > 0 {
if !isDigit(columns[i]) {
buff.WriteByte('"')
}
buff.Write(escape(val))
if !isDigit(columns[i]) {
buff.WriteByte('"')
}
} else {
buff.WriteString("null")
}
if i != len(values)-1 {
buff.WriteByte(',')
}
}
buff.WriteByte('}')
return buff.String(), nil
}
Benchmark:
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkRow2Json
BenchmarkRow2Json-8 2881545 385.3 ns/op 440 B/op 9 allocs/op
Use UNIX way to output
We will use UNIX way of output of our program — i.e. we will not create an output slice — we will out data into standard output stream instead — then we can use the output as a standard pipe in UNIX:
// create new buffer
writer := bufio.NewWriter(os.Stdout)
defer writer.Flush()
writer.WriteByte('[') //start print array of data
...
msg, errMsg := dbf.Row2Json(columns, values)
...
if _, err := writer.WriteString(msg); err != nil {
log.Fatalf("fault write row: %v", err)
} // write serialized row
...
writer.WriteByte(']') // finish serialized slice
In success execution we will get something like:
% ./db2json_ds 3 2>/dev/null | jq
[
{
"id": 1,
"product": "product_1",
"description": null,
"price": 1.23,
"qty": 10,
"date": "2021-01-01 00:00:00"
},
{
"id": 2,
"product": "product_2",
"description": null,
"price": 2.23,
"qty": 20,
"date": "2021-01-01 00:00:00"
},
{
"id": 3,
"product": "product_3",
"description": null,
"price": 3.23,
"qty": 30,
"date": "2021-01-01 00:00:00"
}
]
It’s a time of the moment of truth — fetch 10 million records:
% ./db2json_ds 10000000 1>/dev/null
Elapsed time: 11894 ms, HeapAlloc = 2.436 MB, Sys = 11.710 MB
10000000 records read
Let’s compare with the starting point:
- Execution time — 11.647 seconds instead 12.631 seconds;
- Memory consumption — 11.710 MB instead 7099.447 MB. So, up to 10 percent faster and 600 times less memory consumption.
Conclusion
Let’s examine the broader scope of the tests.
Test comparison table (the length of result json file is 1.2Gb)
Benchmarks comparison table:
Real test memory consumption comparison table:
The fast method (fetch data using slices) is not the fastest in benchmarks, but it’s fastest and the least gluttonous in real tests.
Analyzing the results, we can make two important conclusions, in my opinion :
- simple solutions always work more effectively;
- real tests are always more important than synthetic ones.
Happy coding 🙂
Top comments (0)