With very large and complex datasets, Excel solution is not very convenient for an organization with that size of datasets. Excel files support a maximum size of: 1,048,576 rows by 16,384 columns: https://support.microsoft.com/en-us/office/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3
and in practice the limit is such smaller than that because many personal computers can't handle that amount of data along with Excel's user interface at the same time.
Furthermore, while having a single very large table may not be so bad, the situation becomes more complicated when we have to look up the data in that table in another very large table, which in turn references another table, etc. After a bunch of VLOOKUP
s across several large tables, not only our Excel will crash, but we will also risk losing our data.
On top of that, collaboration is difficult in Excel, how would we allow our colleagues to work together on the same spreadsheet? One option could be to use Excel Online or Google Spreadsheets, but although these tools are excellent for collaboration on small datasets, they are not optimized for tasks like the one we are describing.
Add the need to input data from online forms and use it in online dashboards that can be accessed by many people at once, we need a database engine instead. This is a piece of software which is purpose-built to do all the things just mentioned in a very efficient and secure way.
The database is the classic location where modern organizations have chosen to store their data for professional use. An informal way to understand database is as "spreadsheets of spreadsheets", that is, spreadsheets that link other (potentially many) spreadsheets. Their advantage over Excel is that databases have very strict rules on what is allowed and what not, thus preventing chaos and data loss, and have extremely good performance. Some popular database engines are:
1) Microsoft SQL Server
2) Oracle Database
3) MySQL (open-source)
4) PostgreSQL (open-source)
5) SQLite (open-source)
We might have noticed that the letters "SQL" appear in several of the names of these products. That is because just as in Excel we have formulas, we can have formulas in databases, and the language we use to write them is called SQL (Structured Query Language). SQL syntax and functions are standardized across database engines for the most part.
SQLite stores tables by compressing them into a single file of extension .db
. Primary and foreign keys are very important concepts.
Primary keys:
1) Uniquely identify a record in the table. Their name usually includes the word "ID". For example, CustomerID
is the primary key of the Customer table, AgentID
is the primary key of the Agent table, and CallID
is the primary key of the call table
2) Do not accept null values (they shouldn't, because the are being used to identify the record)
3) Are limited to one per table.
On the other hand, foreign keys:
1) Are a field in the table that is the primary key in another table
2) Can accept null values
3) Are not limited in any way per table. For example, the a Call table has 2 foreign keys: AgentID
and CustomerID
pointing to the Agent and Customer tables, respectively.
Top comments (0)