TLDR: Practical tips to improve ingest rates in Postgres and get more row inserts per second. Including those enhanced with TimescaleDB for time-series workloads.
The Challenge with High-Volume Data Ingestion
Time-series data workloads are relentless, and optimizing your ingest rate is a necessity. Since we enhance PostgreSQL to handle time-series data and other demanding workloads, almost every Timescale customer is interested in getting as many row inserts per second as possible into their TimescaleDB service.
How can you ensure you’re getting the best write performance? First, take a look at this blog post we published a while ago: 13 Tips to Improve PostgreSQL Insert Performance. Timescale is built on PostgreSQL, so the general best practices for maximizing your inserts into Postgres will also apply to Timescale.
However, we often find that what’s holding our Timescale customers back is “cloud-native best practices.” When dealing with a Timescale service, it’s not only essential to optimize your practices on the database level but also at the cloud infrastructure level.
5 Practical Tips to Improve Your PostgreSQL Ingest Rate
1. Set realistic expectations
First, it's good to have a performance benchmark in mind. Based on experience, aim for ingesting 50-100k rows per second per ingest process. If your requirements exceed this, you'll need to implement multiple ingest processes.
2. Ensure sufficient client-side resources
Your client machine's resources significantly impact write performance. If you're setting up multiple ingest processes, make sure you have enough CPU power.
For example, if you're writing to PostgreSQL from an AWS EC2 instance:
For a single ingest process, a 1-CPU instance might be adequate
For multiple processes, match your CPU count to at least the number of processes (e.g., four processes = four CPUs)
3.. Make sure you have enough database-side resources
Just like on your client side, your Postgres server needs sufficient resources to handle the incoming writes. We recommend having as many CPUs in the Timescale service as in your client.
If you’re not sure that you have enough resources in your Timescale service, you can use forks to make a copy of your database with more CPU and test if your ingest performance improves. Our customers find it very useful (and also cost-effective).
4. Minimize network latency
Is your data crossing the world? Try to put your client and server closer together.
Network throughput is a major factor in insert performance. To maximize throughput:
Host your client and database in the same region
If using cloud services, keep both in the same availability zone when possible
For example, if your PostgreSQL database is hosted in AWS us-east-1, your client application should also run in us-east-1 to minimize latency. This seemingly simple adjustment can yield substantial performance improvements.
5. Batch your inserts (most important!)
This is perhaps the most crucial optimization: insert your data with many rows per INSERT
statement. Don't insert row-by-row. Instead, try to write hundreds or thousands of rows in a single INSERT
statement:
INSERT INTO mytable (‘2022-06-01 12:00:00’ 1, 1.11, 1.12),
(‘2022-06-01 13:00:00’ 2, 2.21, 2.22)
For more PostgreSQL insert optimization techniques, check out this comprehensive guide: 13 Tips to Improve PostgreSQL Insert Performance.
You might also be interested in a benchmark comparing different PostgreSQL ingest methods: INSERT vs. Batch INSERT vs. COPY, which shows the dramatic performance differences between these approaches.
PostgreSQL for Time-Series Data
If you're working specifically with time-series data, TimescaleDB extends PostgreSQL with specialized capabilities while maintaining 100% compatibility with the PostgreSQL ecosystem. It's designed to handle the particular challenges of time-series workloads while preserving all the features you love about PostgreSQL.
Conclusion
Remember that the perfect database configuration depends on your specific use case—there's no one-size-fits-all solution in the database world. These tips provide a starting point, but you'll want to test and adjust based on your particular workload characteristics.
If you still haven’t tried Timescale, you can use it for free for 30 days. It is the most convenient way to use TimescaleDB in production. You’ll get automatic backups and recovery, high availability via replicas in multiple availability zones, flexible resizing with autoscaling, and one-click database forking for testing and creating developer environments. And, of course, the advice of the Timescale Team. 😎
Top comments (0)