Originally posted on bcd.dev
Outline
- JSON Column Performance Challenges
- Introducing Virtual Columns
- Benchmarking the Results
- How to Optimize JSON Queries with Virtual Columns
- Conclusion and Best Practices
JSON Column Performance Challenges
Working with large datasets stored in JSON columns presents significant performance issues, especially when filtering and sorting. In my experience, these challenges became evident while monitoring PHP processes and managing large volumes of records, leading to execution time limits being hit.
Monitoring and Execution Time Issues
As part of my regular monitoring duties, I encountered max execution times of 30 seconds while querying JSON columns in a 580k record dataset. JSON columns, though flexible, are prone to performance bottlenecks, particularly without proper indexing.
Sorting and Filtering with JSON Columns
The first major issue appeared when working on a Filament list record page, which had default sorting applied to a JSON attribute. The absence of indexing on this attribute resulted in a significant slowdown, especially when processing over 10,000 records. Without an index, querying and sorting through nested JSON attributes can cause execution delays and inefficiencies in retrieving results, pushing PHP processes beyond acceptable limits.
Introducing Virtual Columns
When faced with performance issues from sorting and filtering large JSON columns, I revisited an old solution: virtual columns from my friend Rob Fonseca. Virtual columns in MySQL allow me to create an indexed, computed column from JSON data, making queries more efficient without duplicating data.
Why Virtual Columns Work Better
Unlike standard JSON columns, virtual columns are calculated automatically from existing data but can be indexed, making them faster for querying. This improves sorting and filtering performance significantly, especially in large datasets where execution time is critical.
How to Implement Virtual Columns
I implemented virtual columns by adding a migration that created a new indexed column for filtering and sorting. This virtual column extracted and indexed specific JSON attributes, drastically improving query performance. Here's an example migration:
$table->string('approved_at')
->nullable()
->virtualAs("json_unquote(json_extract(data, '$.latest_approval_date'))");
$table->index('approved_at');
By indexing this virtual column, I was able to reduce query times and improve overall efficiency, especially when filtering and sorting large datasets.
Benchmarking the Results
Once I implemented the virtual columns, I needed to ensure the performance gains were real. Benchmarking provided concrete data, comparing the execution times of filtering, sorting, and paginating large datasets using both the original nested JSON column and the new virtual column with indexing.
Before: Nested JSON Columns
With over 580k records, queries on the nested JSON column were slow:
- Sorting a page of 100 records took over 5,000ms.
- Filtering + sorting + paginating took nearly 2,000ms.
Benchmark::dd([
'count' => fn () => Document::count(),
'paginate' => fn () => Document::paginate(100),
'filter + paginate' => fn () => Document::where('data->latest_approval_date', '>', '2024-09-05')->paginate(100),
'sort + paginate' => fn () => Document::orderBy('data->latest_approval_date')->paginate(100),
'filter + sort + paginate' => fn () => Document::where('data->latest_approval_date', '>', '2024-09-05')->orderBy('data->latest_approval_date')->paginate(100),
], iterations: 100);
After: Virtual Column + Index
After indexing the virtual column, the improvements were substantial:
- Sorting the same page of 100 records dropped to 750ms (7.5x faster).
- Filtering + sorting + paginating improved to just 53ms (36x faster).
These benchmarks confirmed the effectiveness of virtual columns in optimizing query performance.
Benchmark::dd([
'count' => fn () => Document::count(),
'paginate' => fn () => Document::paginate(100),
'filter + paginate' => fn () => Document::where('approved_at', '>', '2024-09-05')->paginate(100),
'sort + paginate' => fn () => Document::orderBy('approved_at')->paginate(100),
'filter + sort + paginate' => fn () => Document::where('approved_at', '>', '2024-09-05')->orderBy('approved_at')->paginate(100),
], iterations: 100);
Steps
1. Add a Virtual Column with Migration
To improve performance, we’ll start by adding a virtual column for the approved_at
field. This column extracts and indexes the JSON attribute for better query performance.
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration {
public function up(): void {
Schema::table('documents', function (Blueprint $table) {
$table->string('approved_at')
->nullable()
->virtualAs("json_unquote(json_extract(data, '$.latest_approval_date'))");
$table->index('approved_at');
});
}
public function down(): void {
Schema::table('documents', function (Blueprint $table) {
$table->dropColumn('approved_at');
});
}
};
2. Create a Trait for Virtual Fields
We’ll create a HasVirtualFields
trait to ensure that virtual fields are not mistakenly saved.
namespace App\Models\Concerns;
trait HasVirtualFields {
public function save(array $options = []) {
if (isset($this->virtualFields)) {
$this->attributes = array_diff_key($this->attributes, array_flip($this->virtualFields));
}
return parent::save($options);
}
}
3. Add the Trait and Virtual Column Property to Your Model
In the model, include the trait and define the virtual fields. This ensures that any virtual columns are properly managed.
use App\Models\Concerns\HasVirtualFields;
class Document extends Model {
use HasVirtualFields;
protected array $virtualFields = [
'approved_at',
];
}
4. Testing Environment
To test the performance improvements, we’ll generate fake data and benchmark the queries before and after using virtual columns. Use the following provisioning script:
$count = 500 * 1000;
for ($i = 0; $i < 250; $i++) {
Document::factory()->count(1000)->create();
}
6. Wrapping Up with Unit Tests
Write tests to verify that the virtual column works as expected. Here’s an example test suite:
namespace Tests\Feature\Models;
use Tests\TestCase;
use App\Models\Document;
class DocumentTest extends TestCase {
public function testApprovedAt() {
$date = fake()->dateTimeBetween()->format(DATE_ATOM);
$document = Document::factory()->create([
'data' => [
'latest_approval_date' => $date,
],
]);
$document->refresh();
$this->assertEquals($date, $document->approved_at);
}
}
This complete solution ensures that your JSON columns can be optimized for performance, particularly for large datasets.
Conclusion and Best Practices
Using virtual columns with indexing can dramatically improve performance when working with large datasets and JSON columns. By transitioning from nested JSON queries to indexed virtual columns, I was able to reduce query times by up to 36x.
Best Practices:
- Use virtual columns to index frequently queried JSON attributes.
- Always benchmark before and after implementing changes to measure real performance improvements.
- Ensure your database structure evolves with your data as it scales, especially with JSON-heavy models.
Originally posted on bcd.dev
Top comments (0)