Adam Furmanek for Metis

Posted on Feb 26, 2024 • Originally published at metisdata.io on Feb 29, 2024

Hey Managers! Don’t Let Developers Break Your Production! Let Metis Help You

#database #observability #management #sql

You may need to deal with many things as a manager. You manage people and make sure they are happy, develop their skills, and don’t leave the company prematurely. You plan roadmaps and execution to make sure that your team meets all the deadlines. You decide which features to build and plan releases to please your customers the most. All of that can go to waste if your production system fails. And the easiest way for it to fail is by breaking your database. Pretty often it’s caused by developers not testing their solutions well enough or because of choosing speed over quality where we shouldn’t do that. But what if you could make sure that ongoing changes won’t break the database and won’t derail your planning? Let’s read on.

But What Happens Exactly?

Many things may break when working with databases. Let’s quickly see all of them and understand why they happen today.

Inherently Slow Queries

Some queries that developers write (either directly or indirectly via libraries they use to talk to the database) can be inherently slow. For instance, they can join too many tables, not use an index, or use inefficient techniques (like Common Table Expressions). However, these queries will work fast enough when running in developer environments because local databases tend to be small and contain small amounts of data. Therefore, queries executed against these databases will finish fast and won’t show any issues.

The problems begin when such a faulty code gets deployed to production. Queries are executed the same and they become much slower because the databases are much bigger. This leads to performance issues that quickly leak to other parts of the ecosystem, including web servers or reporting solutions.

The problem is with how developers test their solutions. They typically focus on correctness only and disregard potential performance issues. They check whether queries read and write the correct data but not how they do that. They can run load tests at the end of the CI/CD pipeline but it’s expensive and inefficient, and it happens very late in the process which leads to even longer time for deploying changes. In short, developers need to test their code better.

Schema Changes

Scripts for schema changes modify tables, columns, indexes, and database configurations. Each script is typically executed once per each database. There is no need to execute it many times. Developers typically test their scripts with local databases and make sure they work properly.

The problem begins when such a script leads to table rewrites. Sometimes it’s impossible to modify the database schema in place. In that case, the database engine needs to copy the data on the side, modify it there, and then bring it back. This is not a problem in local environments, however, it is a big issue in production databases since this requires taking databases effectively offline for some time. Sometimes it’s seconds, sometimes it can be hours.

Unfortunately, developers don’t verify whether scripts will run promptly. All the automated tests run against the new database schema (after changes) and there are no easy solutions to verify that migrations executed fast. Again, this should be a part of the testing procedure, however, very often it’s skipped.

Execution Changes Over Time

Last but not least, databases change over time. They store more and more data, configurations change, and new extensions are installed. Queries that used to be fast can become much slower because of outdated statistics, redundant indexes, or invalid configurations.

Developers can’t predict nor monitor these issues from their local environments. Unfortunately, issues like these are currently triaged by the operations teams and database administrators, and then fixed by developers. This leads to slow and inefficient maintenance.

However, this can be improved by using tools that can build proper understanding and fix issues automatically. Read on to understand how you can benefit from them.

What’s In It For Me?

The issues that we covered above can lead to many problems in your organization. However, Metis lets you avoid these issues or minimize their impact. See what you can get from using Metis.

Prevention and Fast-Moving Forward

Currently, developers can verify their changes only manually or with expensive and slow load tests. However, Metis can indicate issues immediately in their local environments. Metis can find slow queries, analyze schema migrations, or suggest necessary configuration changes.

This way developers don’t need to come back to the whiteboard at the end of the CI/CD pipeline executions. They can immediately see whether things are going to work or not. They don’t need to run slow load tests or analyze migrations manually to see the issues. Metis brings the missing piece to your CI/CD pipelines which is database reviews and can help you rest assured that all issues are handled as part of your testing procedures.

This can help you as a manager with optimizing your team’s processes. Developers will simply work on their technical tasks faster and you can deliver more in a shorter time.

Recommended reading : What Is Database Monitoring & Why You Need It

Reliable Roadmap Planning

Database issues can be detrimental to your project planning. Any issue that pops up in production may require the whole team to help with investigation and fixes. Developers may need to work overtime and overnight to get the issues fixed as soon as possible. While this shows great ownership, it’s also detrimental to their work-life balance, happiness, and day-to-day activities.

However, Metis can prevent these issues from happening. Metis can identify problems in production before any code changes are deployed. This way there is no need to fix issues in production as they simply don’t appear. Thanks to that, you can plan your roadmaps reliably. You can rely on the process going as usual with no critical issues turning your team’s work upside-down.

Lower Maintenance Cost

When things break in production, many teams are involved because it’s unclear how to fix the issues and where they come from. Finding the right owner takes time and requires lots of communication. However, it can be automated.

Metis can analyze everything that happens in your databases and can give you the full picture, including deployments, schema migrations, code changes, configurations, and everything that can affect the database performance. Metis can pinpoint issues and automatically fix them or notify people who can act the fastest. This way communication is minimized and the time for fixing the issue is shortened.

How Does Metis Work?

Metis can analyze your code in three areas. Let’s see them one by one.

Prevention

Metis can analyze queries executed against the database and provide insights and instructions on how to improve the performance.

Metis extracts execution plans and analyzes them to understand if there are any performance issues. It can easily show issues with slow queries, missing indexes, wrong configuration, outdated statistics, and much more.

Metis can also analyze schema migrations and indicate issues that may happen when going to production.

This way developers can make sure their code will not break databases. They can do it early and in their developer environments.

Monitoring

Metis can monitor your databases and show issues. It won’t swamp you with raw metrics but will give you database-oriented dashboards like this one.

You can see the transactions (1), rows (2), temporary files (3), and cache hits (4). You can also examine table sizes (5), schema insights (6), indexes (7), queries (8) and extensions (9). Metis shows things important from the database administrator’s perspective. Metis can also analyze live queries, indexes, and configurations to provide live insights into how things perform.

With Metis, your team can see performance issues and quickly answer where things are slow.

Troubleshooting

Metis can aggregate data from various parts of the software development lifecycle. Metis can use deployment history, code changes, database metrics, and running configuration to reason about what happened and how to fix issues.

Having that, Metis can send notifications and alerts to teams that can fix issues. Metis can also automatically suggest solutions and be a part of your day-to-day process.

I’m Sold! What Do I Do?

Now it’s the right time to use Metis. See our live demo, check out our documentation, or watch videos presenting the platform.

Changing your organizations and your teams will take time. It takes three different things to successfully maintain databases: tooling, processes, and mindset. You can speak with your platform engineers about adopting the tools and changing the processes, and then start moving the responsibility and ownership to your development teams.

Summary

Databases can break for various reasons. Unfortunately, very often this is due to how developers work. For you as a manager, it’s important to make sure that your teams work fast, they don’t get unexpected and unplanned work, and that you don’t need to involve more people than is needed. Metis gives you all of that thanks to its prevention, monitoring, and troubleshooting mechanisms. You can help your developers become true owners and never break your production databases again.

DEV Community