Oleg

Posted on Oct 2

Three ways to fetch SQL data in Go

#go #tutorial #devops

Introduction

In this article, we explore three different approaches to fetching data from SQL databases in Golang: the standard method using JSON marshaling, a dynamic approach using maps, and an optimized method that avoids unnecessary overhead.

In my practice, there were very frequent cases where I had to retrieve data from a database with an unknown structure beforehand. This often happens in the e-commerce industry.

We will consider 3 methods of retrieving data, along with their advantages and disadvantages.

You can get all the code of the examples from the repository
https://github.com/oleg578/dbf

Data preparing

We will use the database MariaDB, because I just like this database. It has never failed me in 10 years of practice with a load of up to 500 million transactions per day with a data volume of up to 4 terabytes in e-commerce. And it’s really faster than MySQL.

Code for create MariaDB docker instance and seed test data are in https://github.com/oleg578/dbf/tree/main/db

All examples are tested on Ubuntu 24.04.1 LTS (Noble Numbat) with 11th Gen Intel Core i5–1135G7 and 16GiB RAM.

Standard way

The standard way is trivial — we fetch rows from database into array and then Marshal it into JSON struct

rs, errRs := con.QueryContext(ctx, q, numbRows)

...

 for rs.Next() {
  var dmy = Dummy{}
  if err := rs.Scan(
   &dmy.ID,
   &dmy.Product,
   &dmy.Description,
   &dmy.Price,
   &dmy.Qty,
   &dmy.Date); err != nil {
   panic(err)
  }
  result = append(result, dmy)
 }
 if err := rs.Err(); err != nil {
  panic(err)
 }
 msg, errRTJ := json.Marshal(result)
 if errRTJ != nil {
  panic(errRTJ)
 }

...

_, errOut := os.Stdout.Write(msg)
...

What about speed?
Test result:

% ./db2json_struct 10000000 1>/dev/null Elapsed time: 12631 ms, HeapAlloc = 5400.725 MB, Sys = 7099.447 MB 10000000 records read

Let’s just remember this as a starting point.

Using map way

Next we consider fetching unknown list of columns — like “SELECT * FROM …”.
The sequence of actions is simple.
Each record will be represent as map[string]interface{}, then

// create result slice
// numbRows is number of rows in result
outRows := make([]map[string]interface{}, 0, numbRows)

We will not serialize each record to save program execution time, and our actions are not complex.
See https://github.com/oleg578/dbf/tree/mapping/example

After fetch rows from database, we will request an slice of columns

columns, err := rs.Columns()

Create slice of values and slice of pointers for data

values := make([]interface{}, len(columns))
valuePointers := make([]interface{}, len(values))
  for i := range values {
    valuePointers[i] = &values[i]
  }

Then for each row we get the map which represent model — https://github.com/oleg578/dbf/blob/mapping/sql2json.go

func Row2Map(columns []string, values []interface{}) (map[string]interface{}, error) {
 rowMap := make(map[string]interface{})
 if len(columns) != len(values) {
  return nil, errors.New("columns and values length not equal")
 }
 for i, col := range columns {
  rowMap[col] = assignCellValue(values[i]) // we will help to typify the value
 }
 return rowMap, nil
}

func assignCellValue(val interface{}) interface{} {
 if b, ok := val.([]byte); ok {
  if floatValue, err := strconv.ParseFloat(string(b), 64); err == nil {
   return floatValue
  }
  return string(b)
 }
 return val
}

Note:
You may want to pay attention to the function assignCellValue — its purpose is to pre-assign a type to column values. Simple trick — this function tells the JSON encoder which values to accept as non-numeric.
Benchmark:
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz BenchmarkRow2Map BenchmarkRow2Map-8 7376358 159.0 ns/op 336 B/op 2 allocs/op

Finally, execution time of our example — https://github.com/oleg578/dbf/blob/mapping/example/main.go
% ./db2json_map 10000000 1>/dev/null Elapsed time: 12152 ms, HeapAlloc = 8966.899 MB, Sys = 12248.287 MB 10000000 records read

What can we say about these results?

Benefits — We can obtain structures that are not predefined.

Disadvantages — there are several.
But we pay for this with memory consumption.
It’s easy to see that such an implementation consumes 1.5 times more memory.

The second disadvantage which can give us a headache are data types. We must define what we will access as numeric data, and what we will define as string. In current we use AssignValue auxiliary function:

func assignCellValue(val interface{}) interface{} {
 if b, ok := val.([]byte); ok {
  if floatValue, err := strconv.ParseFloat(string(b), 64); err == nil {
   return floatValue
  }
  return string(b)
 }
 return val
}

If value can be represented as float — then we define it as interface value (json library will define it as numeric or null), else as string.

The third disadvantage is map structure property — we can’t guarantee order of fields in json. But this disadvantage may be unimportant.

We cannot say that the result we got is satisfactory.He devours memory. On small datasets this may be acceptable, but this can negatively affect processing with large amounts of data.
https://github.com/oleg578/dbf/tree/mapping

What can we improve?

Let’s look at the weak points in our algorithm of actions when we use maps.

This a creating map[string]interface{} for each row — this very expensive operation in terms of resources and processor time;
And another too expensive operation — JSON marshaling the final slice which can be very big.

It’s time to think about improvement

Let’s play with data structures

When we get data from a table, the order of columns is always defined by the database. Then we can use 2 coordinated slices — columns and values. Then we can refuse a map type and use slice.

The next trick — we will request data as byte slices and list of columns will fetch as list of ColumnTypes — it will help us in future.

columns, err := rs.ColumnTypes()
 if err != nil {
  log.Fatalf("fault get column types: %v", err)
 }
 values := make([]sql.RawBytes, len(columns))
 valuePointers := make([]interface{}, len(values))
 for i := range values {
  valuePointers[i] = &values[i]
 }

So, we are ready to fetch data in a new way and it’s time to serialize this data.
The JSON library is heavy, but we can make serialization easier.

Columns are simple token-level data usually — we can turn them into JSON strings, then we will just escape special symbols by our escape function -

func escape(in []byte) []byte {
 var out bytes.Buffer
 for _, b := range in {
  switch b {
  case '\n', '\r', '\t', '\b', '\f', '\\', '"':
   out.WriteByte('\\')
   out.WriteByte(b)
  case '/':
   out.WriteByte('\\')
   out.WriteByte(b)
  default:
   if b < 32 || b == 127 {
    out.WriteString(`\u00`)
    out.WriteString(strconv.FormatInt(int64(b), 16))
   } else {
    out.WriteByte(b)
   }
  }
 }
 return out.Bytes()
}

Now let’s think about data types. We can determine the type of column (using sql ColumnType structure sql.ColumnType)

func isDigit(c *sql.ColumnType) bool {
 switch c.DatabaseTypeName() {
 case "TINYINT":
  return true
 case "SMALLINT":
  return true
 case "MEDIUMINT":
  return true
 case "BIGINT":
  return true
 case "INT":
  return true
 case "INT1":
  return true
 case "INT2":
  return true
 case "INT3":
  return true
 case "INT4":
  return true
 case "INT8":
  return true
 case "BOOL":
  return true
 case "BOOLEAN":
  return true
 case "DECIMAL":
  return true
 case "DEC":
  return true
 case "NUMERIC":
  return true
 case "FIXED":
  return true
 case "NUMBER":
  return true
 case "FLOAT":
  return true
 case "DOUBLE":
  return true
 default:
  return false
 }
}

And finally, let’s apply primitive serialization:

func Row2Json(columns []*sql.ColumnType, values []sql.RawBytes) (string, error) {
 if len(values) == 0 {
  return "", errors.New("no data in values")
 }
 if len(columns) != len(values) {
  return "", errors.New("columns and values length not equal")
 }
 var buff strings.Builder
 buff.WriteByte('{')
 for i, val := range values {
  buff.WriteByte('"')
  buff.WriteString(columns[i].Name())
  buff.WriteByte('"')
  buff.WriteByte(':')
  if len(val) > 0 {
   if !isDigit(columns[i]) {
    buff.WriteByte('"')
   }
   buff.Write(escape(val))
   if !isDigit(columns[i]) {
    buff.WriteByte('"')
   }
  } else {
   buff.WriteString("null")
  }
  if i != len(values)-1 {
   buff.WriteByte(',')
  }
 }
 buff.WriteByte('}')

 return buff.String(), nil
}

Benchmark:
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz BenchmarkRow2Json BenchmarkRow2Json-8 2881545 385.3 ns/op 440 B/op 9 allocs/op

Use UNIX way to output

We will use UNIX way of output of our program — i.e. we will not create an output slice — we will out data into standard output stream instead — then we can use the output as a standard pipe in UNIX:

// create new buffer
 writer := bufio.NewWriter(os.Stdout)
 defer writer.Flush()

 writer.WriteByte('[') //start print array of data
 ...
 msg, errMsg := dbf.Row2Json(columns, values)
 ...
 if _, err := writer.WriteString(msg); err != nil {
   log.Fatalf("fault write row: %v", err)
  } // write serialized row
 ...
 writer.WriteByte(']') // finish serialized slice

In success execution we will get something like:

%  ./db2json_ds 3 2>/dev/null | jq
[
  {
    "id": 1,
    "product": "product_1",
    "description": null,
    "price": 1.23,
    "qty": 10,
    "date": "2021-01-01 00:00:00"
  },
  {
    "id": 2,
    "product": "product_2",
    "description": null,
    "price": 2.23,
    "qty": 20,
    "date": "2021-01-01 00:00:00"
  },
  {
    "id": 3,
    "product": "product_3",
    "description": null,
    "price": 3.23,
    "qty": 30,
    "date": "2021-01-01 00:00:00"
  }
]

It’s a time of the moment of truth — fetch 10 million records:

% ./db2json_ds 10000000 1>/dev/null Elapsed time: 11894 ms, HeapAlloc = 2.436 MB, Sys = 11.710 MB 10000000 records read
Let’s compare with the starting point:

Execution time — 11.647 seconds instead 12.631 seconds;
Memory consumption — 11.710 MB instead 7099.447 MB. So, up to 10 percent faster and 600 times less memory consumption.

Conclusion

Let’s examine the broader scope of the tests.
Test comparison table (the length of result json file is 1.2Gb)

Benchmarks comparison table:

Real test memory consumption comparison table:

The fast method (fetch data using slices) is not the fastest in benchmarks, but it’s fastest and the least gluttonous in real tests.

Analyzing the results, we can make two important conclusions, in my opinion :

simple solutions always work more effectively;
real tests are always more important than synthetic ones.

Happy coding 🙂

DEV Community

Three ways to fetch SQL data in Go

Introduction

Data preparing

Standard way

Using map way

What can we improve?

It’s time to think about improvement

Let’s play with data structures

Use UNIX way to output

Conclusion

Top comments (0)

Read next

Design Pattern: Abstract Factory - using GoLang

Advent of Code Day 4 in Golang: Searching XMAS and X-MAS

Mastering Recharts: A Comprehensive Guide to Creating Charts in ReactJS

The Ultimate React Toolkit: Everthing you need 🗺️📦🚀