Hoài Nhớ ( Nick )

Posted on Oct 22, 2024

Mastering Databases: From Optimizing Queries to Distributed Systems — Hoài Nhớ

#database #sql #nosql #databasedesign

Databases are at the core of modern applications, from e-commerce to social platforms, powering billions of transactions every second. In this blog, we’ll explore key concepts that every software engineer should understand—ranging from JOINs, partitioning, sharding, and query optimization to security best practices. This post serves as a comprehensive guide to help you understand, design, and maintain efficient databases.

1. SQL JOINs: Types and When to Use Them

SQL JOINs allow us to retrieve data from multiple related tables in relational databases. Understanding the differences between JOIN types ensures you can query efficiently.

JOIN Type	Description	Use Case
INNER JOIN	Returns rows with matching values in both tables	Default join when matching relationships are needed
LEFT JOIN	Returns all rows from the left table, even if no match	Use when you need all rows from the left, with or without matches
RIGHT JOIN	Returns all rows from the right table	Rarely used—opposite of LEFT JOIN
FULL OUTER JOIN	Returns all rows when there’s a match in either table	Use when all data needs to be fetched regardless of matching
CROSS JOIN	Returns the Cartesian product of both tables	Be cautious—creates a large result set

Performance Tip: Avoid unnecessary JOINs and filter data early to minimize the data processed. Use indexes on frequently joined columns to improve performance.

2. Query Optimization Techniques

Optimizing queries ensures faster responses and reduced resource consumption. Below are techniques to keep in mind:

1.Indexes: Create indexes on frequently queried columns to improve retrieval times.

2.Use LIMIT & OFFSET: Limit rows fetched to prevent fetching unnecessary data.

3.*Avoid SELECT * *: Select only required columns to reduce I/O overhead.

4.Partitioning: Divide large tables into smaller, more manageable parts.

5.Analyze Query Execution Plans: Use EXPLAIN to understand query execution paths.

6.Caching: Use Redis or Memcached to cache frequently accessed data.

3. Handling Transactions: Error Handling & Security Risks

Transactions ensure data consistency by grouping multiple operations into a single atomic unit. However, they also need careful error handling.

•Rollback on Error: Use TRY-CATCH blocks to handle failures gracefully.

BEGIN TRANSACTION;
BEGIN TRY
  INSERT INTO Orders VALUES (1, 'Product A', 100);
  INSERT INTO Payments VALUES (1, 'Card', 100);
  COMMIT TRANSACTION;
END TRY
BEGIN CATCH
  ROLLBACK TRANSACTION;
  PRINT ERROR_MESSAGE();
END CATCH;

•Deadlocks: Occur when multiple transactions block each other. Use timeouts or detect deadlocks using the database’s deadlock detection mechanisms.

•SQL Injection Attacks: Always use parameterized queries to prevent injection attacks.

4. Partitioning vs. Sharding: Scaling Databases

When scaling databases, engineers often need to decide between partitioning and sharding. Both techniques divide data, but they differ in scope and purpose.

Aspect	Partitioning	Sharding
Scope	Within one database	Across multiple databases or servers
Goal	Improve query performance	Horizontal scalability
Types	Range, List, Hash	Horizontal, Vertical
Example	Splitting sales by month	Each shard contains users by region

Use Cases:

•Partitioning: Ideal for a single large table that needs faster access.

•Sharding: Use when the application needs to distribute huge datasets across multiple servers.

5. Security Considerations: Preventing Token Abuse in Databases

Security is paramount in any system. A critical issue arises when an access token and refresh token fall into the wrong hands. To handle such cases:

1.Rotate Tokens: Issue new tokens periodically to limit exposure.

2.Blacklist Compromised Tokens: Maintain a blacklist of compromised tokens to deny access.

3.Store Refresh Tokens Securely: Encrypt refresh tokens or use HttpOnly cookies to prevent client-side access.

4.Detect Suspicious Activity: Monitor token usage and trigger alerts on anomalies, such as tokens being used from different locations.

6. Migration from MongoDB to PostgreSQL: Best Practices

Migrating 100 million records from MongoDB to PostgreSQL requires a well-planned strategy. Follow these steps:

1.Schema Design: Convert MongoDB’s document-based structure to a relational schema.

2.ETL Process: Extract, transform, and load data in batches.

3.Batch Migration: Migrate data in smaller parts to reduce pressure on the system.

4.Verification: Compare data between MongoDB and PostgreSQL after each batch to ensure consistency.

7. Handling Data Loss and Recovery

Preventing and recovering from data loss requires multiple layers of backup and replication strategies:

1.Backups: Regularly back up data and store in multiple locations.

2.Replication: Use database replication to maintain copies of data across servers.

3.Snapshots: Take snapshots of the database at critical points.

4.Logging: Maintain transaction logs to recover the latest data state.

Conclusion

Managing databases effectively requires a deep understanding of query optimization, partitioning vs. sharding, event loops, and distributed systems. Whether you’re building scalable applications, migrating data between systems, or preventing security breaches, mastering these concepts will set you apart as a seasoned software engineer.

The knowledge covered in this blog lays a solid foundation for managing modern databases efficiently—empowering you to handle data at scale, ensure security, and optimize performance across distributed systems.

Top comments (8)

DotNet Full Stack Dev • Oct 23 '24

@hoainhoblogdev Great article! Have a look on this optimizations, exploring this one may add few more techniques!
9 Costly SQL Operations - Optimized with easy solutions

Hoài Nhớ ( Nick ) • Oct 23 '24

Thank a lot, I took a look at mentioned article, that's useful, hope you will make more this article like this but more advanced 👍🏼

DotNet Full Stack Dev • Oct 23 '24

Sure, Will do!

Martin Baun • Oct 25 '24

I think the biggest pet peeve for me is CTEs with window functions over large datasets. Or just lazy window functions in general. I see them everywhere and the majority of the time they're overkill.
Very valuable post!

Hoài Nhớ ( Nick ) • Oct 25 '24

Exactly, I hope that this post will help you optimize your database @martinbaun

Hoài Nhớ ( Nick ) • Oct 22 '24

After research this article you can have an insight on database and able to answer these questions:

Questions:

DELETE vs TRUNCATE
Stored Procedure vs Trigger vs Function. Types of Triggers
Sync up SQL vs NoSQL
Aggregate vs full text search in NoSQL
Like vs Fulltext in SQL
When to use SQL vs NoSQL
What are indices? Why do we need to index columns? What to notice when indexing columns?
How to take full advantage of CPU in nodejs
Isolations vs Lock in SQL
Handle errors in transaction/break return error in stored procedure
Struck in transaction

Bùi Thắng • Oct 25 '24

Thank you bro! Great article!

Hoài Nhớ ( Nick ) • Oct 25 '24

Glad to hear that, hope this article will useful for you in your workplace and interview as well