Intro
Every hour, my Covid 19 Dashboard will get the data from the Ministry of Health Malaysia and COVID-19 Immunisation Task Force Malaysia from Github to import the open data to my database.
It used to take ~ 1 minute to
- make HTTP request to Github
- parse and transform from CSV to laravel collection
- check if the total record same with table count
- if the count is different, insert it.
Although it works, there will one minute downtime in my website each day and i decide to work on it
First Step
Instead checking with database every time, it might be better generating the hash value of that csv and saving it to cache.
If the hash is the same, we just not process it, and return an empty collection.
use Illuminate\Support\Facades\Cache;
private function getRecord(string $key): array
{
$content = $this->recordHolder[$key];
$hash = sha1($content);
$exists = true;
if (!(Cache::has($key) && Cache::get($key) == $hash)) {
Cache::put($key, $hash, now()->addDay());
$exists = false;
}
return [$content, $exists];
}
private function getCasesState(): ?Collection
{
[$content, $exists] = $this->getRecord('CASES_STATE');
if ($exists) {
return null;
}
//...
}
Sure, this help to reduce some computing time, but it is only saving me ~10% of speed! So... why stop here?
Second Step
After taking bath (as my way of debugging), I realise that maybe the reason why it's slow can be the HTTP Request itself.
So i convert it to async promise to resolve the HTTP Request by Guzzle Http when initialise the Service class.
use GuzzleHttp\Client;
use GuzzleHttp\Promise\Utils;
public function __construct()
{
$client = new Client();
$promises = [];
foreach (array_keys(self::url) as $url) {
$promises[$url] = $client->getAsync(self::url[$url]);
}
$responses = Utils::settle($promises)->wait();
foreach ($responses as $key => $response) {
$this->recordHolder[$key] = collect(explode(PHP_EOL, $response['value']->getBody()))->splice(1, -1);
}
}
Oh boi, only at this point i realise how dumb i was to not do this before.
From ~1 min, to within 10 seconds.
Not too shabby if we consider it is ~70k records are inserted in this small amount of time.
ofcoz, before inserting, the data is chunked to 500 item each operation for faster insert speed.
use Illuminate\Support\Facades\DB;
//...
$chunks = $records->chunk(500);
foreach ($chunks as $chunk) {
DB::table($tableName)->insert($chunk->toArray());
}
Here are the final result and test run in my local environment.
Hey its me few month now, So you can do async request with Laravel build-in Http:pool()
method!
The code will look something similar like
collect(Http::pool(function (Pool $pool) {
return collect(self::url)
->map(fn($url, $key) => $pool->as($key)->get($url))
->toArray();
}))
->each(fn(Response $res, $key) => $this->recordHolder[$key] = collect(explode(PHP_EOL, $res->body()))->splice(1, -1);
Cover Photo by Marc-Olivier Jodoin on Unsplash
Top comments (0)