loading...
Cover image for 5 Trends In Big Data And SQL To Be Excited About In 2020

5 Trends In Big Data And SQL To Be Excited About In 2020

seattledataguy profile image SeattleDataGuy Updated on ・8 min read

SQL is one of the most in-demand technical skills in the workplace today. Developed back in the 1970s, it is still the way we interface most of our data systems today. Regardless of what drag and drop tools come around or what new query paradigms try to overtake it, it has remained.

Many modern database technologies we will talk about today in this article are constantly having to comply with SQL policies vs. SQL needs to be updated.

However, this isn't to say the SQL landscape hasn't changed a lot in the past few decades and doesn't continue to evolve. This is one reason SQL has stuck around so much. It grows and evolves with the times.

This article will summarize some of the major trends currently occurring in the SQL and data analytics world that are impacting teams across the world.

We will be discussing how SQL is becoming more collaborative and open. How the majority of databases our world continues to operate on are open source or switching to open source. As well as bringing up a few technologies you may not have heard of but should watch out for.

With so much going on in the technical world, this will help provide a clear picture of some of the more important changes in the SQL and data world.

SQL Is Not Just For Data Engineers And Analysts In Data-Driven Companies

If you have ever worked at a FAANG or even technology-driven start-up like Instacart, then you have probably realized that data drives everything.

To the point that analysts, PMs, and product managers are starting to understand SQL out of necessity. SQL is the language of data and if you want to interact with data, you need to know it.

Do you want to easily figure out the average amount of time a user spends on your product, but don't want to wait for an analyst? You better figure out how to run a query.

This ability to run queries easily is also driven by the fact that SQL editors no longer need to be installed. With cloud-based data, warehouses come SaaS SQL editors. We will talk about a SaaS SQL editor more in the next section.

However, the importance here is you don't have to wait 30 minutes to install and editor and deal with all the hassle of managing it.

Now you can just go to a URL and access your team's data warehouse. This has allowed anyone in the company easy access to their data.

We know both from anecdotal experience as well as the fact that indeed.com's tracking in 2019 has shown a steady requirement for SQL skill sets for the past 5 years.

Overall, we foresee a future where not just big tech companies are using SQL and analytics to drive good decisions. For that, we will need tools that make it easier for anyone to access their companies data.

SQL and Analytics Are Becoming More Collaborative 

Alt Text

SQL and analytics are becoming more collaborative. As discussed earlier, getting insights from data is becoming more prolific. That means more people are getting involved in creating queries, analytics, and metrics.

Collaborative work started with products like Google Sheets. The trend has continued to expand into SaaS products like Figma which is collaborative design and PopSQL which is collaborative SQL.

Technologies like PopSQL offer the ability for your team to collaborate and track your work on queries easily through folders and version control.

Now you don't have to worry about someone accidentally changing your query on a report or dashboard. Version control, allows you to revert what the query was at a previously saved state. This ensures that your team is constantly on the same page as far as SQL and the logic you are using to calculate your metrics.

You also can easily share queries, update them, fork them, and visualize data.

Also, tools like Figma, Google Sheets, and PopSQL integrate easily with other collaborative tools like Slack. These integrations further allow your team to share charts, queries, designs, and insights with ease.

Your team can easily see what work everyone else is doing, the changes they are making, and understanding why they are making the changes.

With the concept of remote work becoming more and more of a reality for many companies, having tools that make it easy to collaborate will be important.

In the end, technologies like PopSQL are a great step in the direction of self-service analytics because they put the power of querying data outside of just the analysts and the data engineers' hands.

Open Source Remains The Most popular Database

Paid licensed database management systems like Oracle and MSSQL might seem like very popular options for teams to develop upon. However, MySQL and Postgres, two open-source database management systems are currently the most popular options for developers to use.

According to a survey conducted in 2018 and 2020 by EverSQL, MySQL continues to be the most popular database management system to develop on. Also, Postgres has recently surpassed MSSQL as the second most popular database according to Stackoverflow.

Postgres is what is called the object-relational database management system or (ORDBMS). This takes on similar properties as object-oriented programming where you can have classes and inheritance. Also, some other nifty features about Postgres are that it allows for arrays and has some PubSub abilities.

This shift to open source isn't new. However, the fact is that many companies are starting to drop the Oracles and Microsofts for the free option. They are opting into paying cloud costs vs. paying licensing cost.

Overall, we are seeing a lot of shifts in the choice of database developers are picking.

Cloud-First Open Source Databases Are Gaining Traction And Funding

Alt Text

Although Postgres has often been a common choice for companies choosing to switch from Oracle to an Open Source solution, Postgres was not developed with Cloud infrastructure and complexities in mind.

This forces teams to develop complex cloud infrastructure to manage applications that are being used all across the world.

But, there are other open-source solutions. In July 2019 YugabyteDB went 100% open source. Now many of you are probably asking(especially if you are in the US), what is YugabyteDB?

YugaByte's proprietary document-oriented storage format --- a heavily customized form of RocksDB, which provides for low-latency access and a high density of data. It runs on popular and known APIs.

YugaByte aim's to fill in all the gaps. You want a NoSQL database that also is ACID, then Yugabyte is looking to take over that market.

It is looking to solve the problems that developers have when deploying SQL databases like MySQL that require sharding and complex infrastructure to run multi-region systems.

YugaByteDB does so by auto-sharding and loading balancing as well as several other features that take advantage of the cloud's first approach.

So why is YugaByte in this update if it has been around since 2016?

This is because Wipro ventures invested 30 million in Yugabyte earlier this month. Also, Wipro plans to take Yugabyte's open-source SQL database to its clients. There are 1000 new possible companies that could be using Yugabyte.

Although Yugabyte is not on EverSQL's survey results. In a couple of years, with more traction and users it may be. Now, truth be told, a database that solves all the problems of both NoSQL and standard relational databases would be a miracle. So in many ways, we are surprised there isn't larger adoption.

We will be curious to see if in the next few years Yugabyte is like many of the other miracle technologies promised to solve all of your organization's problems or if it will disappear like so many others.

Distributed Databases For Data Warehousing Is The Norm Now

In the tech world, there are two main uses of databases: applications, and analytics.

These two major use cases benefit from different database systems as well as different database designs.

In particular, analytical databases that run millions of calculations for thousands of analysts, data scientists, and data engineers at a single company often benefit from having some form of distributed or parallel component. Think Redshift and how it relies on MPP(massively parallel processing).

But there are a lot of new SQL and NoSQL technologies coming into this space.

For example, Starburst, a fork of Facebook's Presto, received an additional $42 million of funding. Starburst is a spin-off of a Facebook open-source project(Presto). Starburst's goal is to create an enterprise version of Presto since Presto in itself does not have access management, connectors to enterprise systems like Teradata, Snowflake, and DB2, or a management console where users can configure the cluster to auto-scale, for example.

This makes utilizing Presto on its own difficult if not impossible for most companies. This is a shame because Presto allows you to easily run queries across databases without loading the data into a data warehouse.

So the recent funding in Starbursts is great to see. We look forward to seeing where this technology will go and hope that more companies can take advantage of Presto without all the hassle of having to manage all of presto's complexities.

While we are talking about distributed database systems like Presto, another interesting development this month was the release of Spark 3.0. With this new version of Spark comes many enhancements. Many of these enhancements have been geared to allow Spark SQL to become more ANSI SQL compliant.

This is an important note. One pattern that seems to remain true is that you can't get rid of SQL as is. Many tools and technologies have tried to develop their query languages. However, at the end of the day, SQL remains. Unlike many programming languages that have died out.

SQL remains the language of data.

How Is Your Team Taking Advantage Of Your Data?

Databases and SQL are not going anywhere. If anything, they are becoming more ubiquitous. Tools like Starburst and PopSQL show the importance of having your engineers, analysts, and even your non-technical employees well versed in data.

These technologies or similar ones will more than certainly be used heavily by both small and large companies alike so they can help improve their decision making.

We love seeing tools like PopSQL and Starburst. These tools are opening up the world of data by making SQL a more collaborative and simplifying deployment of powerful technologies like Presto. This helps elevate companies' abilities to perform data analytics, make better decisions, and develop better data processes.

With that, we will wrap up this bi-weekly update of what is going on in the data and technology world. We aim to continue to provide future updates on up and coming technologies, VC investments, etc. So stay tuned!

Posted on by:

seattledataguy profile

SeattleDataGuy

@seattledataguy

Software Engineer | Consultant | Data Scientist

Discussion

pic
Editor guide