At work, we handle data on users that change frequently as they move through production lines, shifts or departments, and we need to have historical data of the employees at all points in time and be used seamlessly, as we review historical data. From this requirement we prepare a database schema and Laravel models to manage this information in a simple way.
We begin this journey by acknowledging that we need to pull the data from a legacy system, written in VisualFoxPro and its data is in a dBase database.
We tried a lot of ways to import data to MariaDB, but what we find was the fastest, is to use a tool to push all the records to a schema in the database that is not used in production. The data is checked during import by a trigger against the data we already have in the production schema, and the "import" table is truncated at the end, as the content of this table is not important nor used.
So, this is the trigger, edited for brevity but all the important mechanics are there.
BEGIN
set @current_id = (SELECT log_id FROM prod.people e WHERE e.id = trim(new.TR101) order by begin_at desc limit 1);
if @current_id is null then
insert into prod.people
(id, name, shift, ..., begin_at)
values
( new.id, new.name, new.shift, ..., CURRENT_TIMESTAMP());
else
select id, name, shift, ...
into @id, @name, @shift, ...
from prod.people p
WHERE p.id = new.id
order by begin_at desc
limit 1;
if
-- no need to check @id as is what let us here
new.name != @name or
new.shift != @shift or
-- ... check the rest of the fields for changes
then
-- seal the last record
update prod.people set end_at = CURRENT_TIMESTAMP() where log_id = @current_id;
insert into prod.people (id, name, shift, ...) values ( new.id, new.name, new.shift, ...);
end if;
end if;
END
- It begins checking if the record exists, if not inserts the record
- If exists, pull the content of the record in variables, and check those variables, against the new record information to find any changes.
- if a change is detected, it seal the "current" record by filling the
end_at
field with thecurrent_timestamp
and creates a new record with thebegin_at
date filled with thecurrent_timestamp
andend_at
as null.
All ~3000 records are inserted in less that a second 🥳 (in the production schema). As the tool generates a single insert into temporal_table (...) values (...), (...), ... (...)
query and all data is send in one shot.
This let us with data in this form:
log_id | id | name | shift | ... | begin_at | end_at |
---|---|---|---|---|---|---|
1 | 1 | user1 | 1 | ... | 2020-01-01 | 2020-02-25 |
2 | 2 | user2 | 1 | ... | 2020-01-01 | null |
3 | 1 | user1 | 2 | ... | 2020-01-01 | null |
Now lets understand what is this information telling us:
-
log_id
is theprimary key
of the record. -
id
is the id of the user this is not unique in this table as we need to have all the history. -
name
,shift
, and the rest are self explanatory, is the information we hold of employees. -
begin_at
is the timestamp at which this record begins to be valid. -
end_at
is the timestamp at which this record is no longer valid.
To be able to query user data for a particular date we need to constraint our search to that particular date, except if we are not carefully, all records with null on any date will be discarded, so our friend coalesce
comes to the rescue and we pass the current date as a fallback, so we end with:
select *
from people
where (
coalesce(end_at, '2021-02-25') >= '2021-02-25'
and "begin_at" <= '2021-02-25'
)
We are testing for a Point in time so all the dates should be the same, if we open the window more that one record may be returned. And undesired/unexpected things may happen.
And was until this point where we switch our SQL hats to Laravel's hats. As we need to be able to use this data from our Laravel applications. To improve in ergonomics we use basic Laravel options to provide us with a simple API to make data retrieval a simple and low psychologically taxing activity.
<?php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Factories\HasFactory;
class Personnel extends Model
{
use HasFactory;
protected $primaryKey = 'log_id';
public $timestamps = false;
protected $hidden = ['begin_at','end_at', 'log_id'];
public function scopeAsOfNow($q)
{
return $q->where(function ($query) {
return $query->whereRaw(
"coalesce(end_at, ?) >= ?", [
now()->format('Y-m-d H:i'),
now()->format('Y-m-d H:i')
]
)->where(
'begin_at',
'<=',
now()->toDateTimeString()
);
});
}
public function scopeAt($q, $date)
{
return $q->where(function ($query) use ($date) {
return $query->whereRaw(
"coalesce(end_at, ?) >= ?",
[$date, $date]
)->where('begin_at', '<=', $date);
});
}
}
We ended with this Model setup (well is a little longer, but the rest is not relevant for this case).
What we did is to create two scopes at
and asOfNow
method. So instead of:
Personnel::where(function ($query) {
return $query->whereRaw(
"coalesce(end_at, ?) >= ?", [
now()->format('Y-m-d H:i'),
now()->format('Y-m-d H:i')
]
)->where(
'begin_at',
'<=',
now()->toDateTimeString()
);
});
we end up using:
Personnel::asOfNow();
// or
Personnel::at($someDate);
And that's it. After some iterations, we end with this working and very ergonomic interface.
If you have any comments, or solve this in a different way, please let me know, as will be an eye opener for all of us.
Top comments (2)
This functionality is actually built directly into MariaDB. Check this out! mariadb.com/kb/en/system-versioned...
Hi, is interesting how after a lot of experimentation and improvement we get to what is a feature of the database. 🥳
The good part is that we now know why. Also this implementation can be used for SQLite or other systems where this is not a database feature, I believe, given the database has the appropriate parts available (triggers and the features in the triggers we are using).
We'll try to use this feature at some point in the feature.