How to learn about database systems
Ozan Onay Feb 9 '17
This post was initially published on the Bradfield School of Computer Science blog
Unfortunately, most self-taught software engineers have a poor grasp of the fundamentals of database systems. While more and more engineers know SQL or can use PostgreSQL, few understand the underlying systems well enough to avoid outages and debug complex queries, or choose and correctly configure the right database management system for a job.
It is absolutely possible to learn these fundamentals outside of a university environment, but there are two barriers: (i) pre-requisite knowledge; and, (ii) lack of good resources. I hope that this short article helps you overcome both.
It is difficult to understand database systems without knowing:
- C and C++
- Undergraduate level computer architecture
- Undergraduate level operating systems
For 1 and 2, I have written short guides How to learn C and Learn how computers work. I am yet to do a similar treatment of operating systems; until then, I suggest Operating Systems: Three Easy Pieces as an easily accessible, free resource.
There is also value to being familiar with compilers, although it is not so strong a pre-requisite.
C/C++ is important because most open source database systems use one or both extensively, so being unable to read the source can be a barrier. Also as you start to write your own database systems, you will find yourself reaching for similarly low level languages.
Computer architecture and operating systems are important because you will need a solid understanding of concepts such as the memory hierarchy, I/O model, memory virtualization and processes and threads.
All of these areas are important to learn about anyway, so don’t fret about prioritizing them over databases.
Unfortunately, there are basically no good, modern textbooks covering database systems. This may be because those who are most qualified to write textbooks are busy starting database companies or holding lucrative roles in industry. For our students, we suggest Database Management Systems as a least worst option.
Thankfully, there is a great set of video lectures from Berkeley’s CS186 course, thanks to Joe Hellerstein. For somebody self-teaching this content without having had much exposure at the undergraduate level, I would strongly suggest these videos as a starting point, using wikipedia or a textbook only to fill in gaps.
Once you understand the basics, the next excellent resource is the Red Book, a collection of papers compiled by Peter Bailis, Joe Hellerstein again and Michael Stonebraker. These papers are seminal works in the field, and while they are not always as easily digestible as a resource targeted at undergraduates, your patience will be rewarded with a deeper level of insight.
There are of course more resources, but between watching Joe Hellerstein’s lectures and studying the Red Book readings, you should find yourself on track to be among the top few percentiles of databases users.
We periodically run an in-person databases course in San Francisco, for those
who prefer a classroom environment over self-study.