Barret Blake

Posted on Feb 1, 2023 • Originally published at barretblake.dev on Nov 22, 2022

An Overview of SQL Data Types

#development #datatypes #database

I have been asked to do more posts on databases, so I’m returning to the world of relational databases this week. I posted an introduction to databases a few months back that you can take a look at for a 10,000-foot view. The next logical step seems to be to take a look at the types of data available across the SQL standard and the more common relational database systems out there.

Note: This is intended as a general overview. While I do have some experience with other database systems, the vast majority of my background is with SQL Server. Hopefully, I didn’t get any of my facts wrong in the comparisons, but if I did let me know and I’ll correct it.

There are several general categories that the data types can placed in to: Numerical, String, Date/Time, Binary, and Other.

Numerical

The numerical data types comprise, funnily enough, number data. This generally falls into two main groups: integer types and decimal point types. Integer data types hold only whole numbers and decimal point types hold anything with a decimal place.

Integer Types

It should be noted that in addition to these, MySQL also supports unsigned versions of each of these, which basically changes the range to be 0 to twice the listed top end. Also, Oracle only has one integer type, NUMBER, with a size value after it.

Decimal Point Types

Most of these systems break these down into two subcategories of types: Fixed point (also called Exact Value) types and Floating point (also called Approximate Value) types. Without delving into the semantic differences between the two which would probably require an advanced math degree to truly understand (at least in my brain), I’ll group them together in one lump for this introductory overview.

Also note that in my discussion, I’m using the US-culture definition of decimal points and commas. I realize that how numbers are formatted depends on where in the world you are and in many cultures, the comma and the decimal are swapped from what I’m used to.

For the most part, the range of numbers each type can hold depends on how you define it. This is because when you define the field, you tell it what level of precision (total number of digits) and scale (the number of digits after the decimal place) you want to be applied. So instead this will be an overview of the various types available and the name each system gives it.

If you want to delve into the nuances of each type and when to use it, I’ll leave that as an exercise for your own brain.

String

Most of the systems support CHAR, VARCHAR, and TEXT.

CHAR is used for fixed-length string storage. It generally doesn’t see much use these days as most people tend to stick with VARCHAR unless you know the field’s value is always going to be of a certain length. VARCHAR is used for variable-length string storage, with only the max size being defined. TEXT, while not deprecated, generally doesn’t see as much use either as again, people tend to use VARCHAR(max) instead of it. It used to be reserved for extremely large fields of text, but in most systems, VARCHAR can support the same data sizes and is often more performant, so TEXT has largely gone by the wayside.

The basic three aside, let’s focus on the aberrations and oddities of each of the systems. Most of these oddities arose in the early days of computing when hard drive storage space was at a premium and extremely expensive. As a result, vendors tried all kinds of things to improve the efficiency of character data storage, which was by far the most costly in terms of space and performance requirements.

SQL Server

The main thing to remember with SQL Server is that in older versions of the systems (pre-SQL Server 2019), CHAR and VARCHAR only supported 8-bit character sets (essentially the ASCII character set depending on how it was set up) and did not support the full Unicode character set. If you wished to use Unicode characters in SQL Server pre-2019, then you had to use the NCHAR, NVARCHAR, and NTEXT variations, which supported the full 16-bit character sets. Starting with 2019, however, this was remedied and full UTF-8 character sets are supported in CHAR/VARCHAR if you set up your database to do so. This is one of the biggest gotchas when working with SQL Server and has come back to burn many developers as any system they are developing expands from US-centric to world-centric in nature. The simple solution is to move to SQL Server 2019 or later or to Azure SQL, which have full Unicode support out of the box.

Oracle

Oracle has a couple of oddities. First, it doesn’t have a TEXT type. Instead, Oracle calls it LONG for some reason. But it’s essentially the same. Also, Oracle has a variation of VARCHAR called VARCHAR2. It’s essentially the same as VARCHAR but supports a couple of additional Oracle-specific standards beyond the standard ANSI definition for VARCHAR. Most Oracle developers just use VARCHAR2.

Oracle provides a couple of other types for large-character data storage: CLOB/NCLOB. These types were essentially designed to be TEXT (LONG), but have twice the storage capacity (4 GB vs 2 GB of data).

MySQL

MySQL provides a couple of additional string types: TINYTEXT, MEDIUMTEXT, and LONGTEXT. These essentially worked the same as TEXT, but have different maximum storage lengths.

Postgres

Postgres really doesn’t have any oddities with string It’s… odd.

Dates and Times

Dates and times are another one where there’s the ANSI-defined standard for SQL, then there’s what the various vendors have put into place.

SQL Server

It should be noted that SQL Server does have a data type called TIMESTAMP. However, this is a read-only field that stores a date-time value of when that record was created or last updated. It is not updatable and is in no way related to the TIMESTAMP data type that other systems (and the ANSI standard) support.

Oracle

Oracle also adds two additional data types called INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND which can be used to store an amount of time between two TIMESTAMP values.

MySQL

Postgres

In Postgres, TIMESTAMP and TIME support are stored with or without time zone information. Postgres also offers the INTERVAL data type to store the difference between two TIMESTAMP values.

Binary

While it’s generally a bad idea to store binary data in a relational database, data types are provided by each system to allow it. SQL Server provides BINARY, VARBINARY, and IMAGE. Oracle provides BINARY, VARBINARY, BLOB, RAW, and LONG RAW. MySQL provides BINARY, VARBINARY, and BLOB. And Postgres offers the BYTEA data type.

These types basically all work the same way. They are designed to store byte data from a file or stream input. The biggest difference is the supported sizes in each type and how it handles the data storage.

Miscellaneous

There are a lot of miscellaneous data types in each system to handle types of data that don’t readily fall into one of the other categories, or that could be put into another data type but the vendor has provided something more specific to that type. For instance: spatial geometry data such as latitude, longitude, and altitude or 3D coordinates; JSON and XML data; UUID/GUID values; enumerations; and so forth.

Conclusion

The main takeaway is this. Every one of the major relational databases supports the core set of data types and what you learn on one system can easily be transferred to one of the others with only minor tweaks. Learn the core principles and you can take that with you to pretty much any job where you need to work with a relational database system.

The post An Overview of SQL Data Types first appeared on Barret Codes.

DEV Community

An Overview of SQL Data Types

Numerical

Integer Types

Decimal Point Types

String

SQL Server

Oracle

MySQL

Postgres

Dates and Times

SQL Server

Oracle

MySQL

Postgres

Binary

Miscellaneous

Conclusion

Top comments (0)

Read next

Constant-Time Operations: The Art of Keeping Secrets... Secret! , Go Crypto 9

Free Google's Generative AI API Key | Gemini API

Beyond LIKE: Advanced Text Search and Keyword Matching in Postgres using Full Text Search

A Brief History of Video Conferencing: From the Beginning to Full Commercial Use