What it's all about
I've recently decided to import 6.8k entries of crypto currencies into a database - namely Rethink.
I have an old side-hustle, that never got into business, but I still have fun making it. I started on this project four years ago, and just yet decided to pick it back up. Back then I were just getting started with Go, and wasn't a particularly seasoned go-dev.
In this project I had this one function that bothered the heck out of me, as it took forever to run, alas I only had to run it once every time I re-created the database. But still too much waiting time for little moi.
Read along in the next chapter.
The Structs
All of the scructs used in the below code are as follows:
type Coin struct {
ID string `json:"id" gorethink:"id,omitempty"`
Name string `json:"name" gorethink:"name" mapstructure:"CoinName"`
Symbol string `json:"symbol" gorethink:"symbol"`
Algorithm string `json:"algorithm" gorethink:"algorithm"`
Price float64 `json:"price" gorethink:"price"`
}
type Data struct {
Data map[string]Coin
}
The Usual Suspect
This one function, I decided to time using time
. This would be done like so:
time go run . -fetch
The fetch
-flag were to tell it to populate my database, stupid name now that I think about it, but it's not the one to be questioned as for now.
The timing returned 2.5m
, that's a lot of waiting time a busy little bee like me. The total response is below:
go run . -fetch 2.24s user 1.12s system 2% cpu 2:25.31 total
As you see it didn't take much CPU, but a hecka long time! So I decided to re-write it as in the next chapter.
Code
func FetchCrypto(d *db.DB) {
dat, _ := ioutil.ReadFile("coins.json")
var data Data
json.Unmarshal(dat, &data)
for _, c := range data.Data {
c.ID = ""
if c.Algorithm == "N/A" {
c.Algorithm = ""
}
r.Table("coin").
Insert(c).
Exec(d.S)
}
}
The Batch Answer
Now, how would one go along and rewrite such a beautifully old function? The answer might surprise you!
Batching! Why, just because I wanted to see if I could. And success! I done it! It takes batches of one hundred currencies for every go-routine and process that, and waits for the rest to complete.
The timing of this one is as follows:
go run . -fetch 1.89s user 0.81s system 47% cpu 5.683 total
Easy peasy, right? It was - A tad more complicated though, but I love how it turned out.
Code
func FetchCryptoV2(d *db.DB) {
dat, _ := ioutil.ReadFile("coins.json")
var data Data
json.Unmarshal(dat, &data)
var wg sync.WaitGroup
batch := 100
coins := []coin.Coin{}
for _, c := range data.Data {
coins = append(coins, c)
}
length := len(coins)
for i := 0; i < length; i += batch {
wg.Add(1)
go func(i int) {
b := coins[i:]
if len(b) > batch {
b = b[:batch]
}
for _, c := range b {
c.ID = ""
if c.Algorithm == "N/A" {
c.Algorithm = ""
}
r.Table("coin").
Insert(c).
Exec(d.S)
}
wg.Done()
}(i)
}
wg.Wait()
}
Bonus Chapter
I also decided to make use of the slice insertion of Rethink. This is super fast!
Even faster than mine, I haven't tweaked mine to use anything other than one hundred in batch, but I suspect that lowering the batch-size will speed it up a bit.
The timing of this is:
go run . -fetch 1.02s user 0.26s system 28% cpu 4.488 total
Code
func FetchCryptoV3(d *db.DB) {
dat, _ := ioutil.ReadFile("coins.json")
var data Data
json.Unmarshal(dat, &data)
coins := []coin.Coin{}
for _, c := range data.Data {
c.ID = ""
if c.Algorithm == "N/A" {
c.Algorithm = ""
}
coins = append(coins, c)
}
r.Table("coin").
Insert(coins).
Exec(d.S)
}
Final thoughts
No error checking has been taken into account, but if you use any of these functions, you probably should, especially in production environments.
Again, you should play around with the batching sizes too, if you want to use anything long the lines of the second function.
Edit
I just did a new test with a batching size of 10
, this more than halved the time it took to insert into the database, thus making it faster than the standard one now.
Making it a meer 2.9s
.
go run . -fetch 1.67s user 0.49s system 74% cpu 2.917 total
Best,
Mads Cordes
Top comments (0)