Databases are complex systems with many different moving parts that work together to store/retrieve data. There are many database systems each solving unique problems within its space. Postgres is a popular open source relational database. For our case, Postgres is a general purpose database which attempts to solve many different problems for the user.
The best way to understand such complex systems is to break them down into simple "easily understandable" parts. Database systems are composed of the following sub-systems(not an exhaustive list):
- Query Planner
- Optimizer
- Executor
- Tokenizer/Parser
- Storage
- Transaction Management
- Write-Ahead-Logging etc
I've recently began going through the internals of Postgres and I have written an extension PG_WAL_EXT that reads a WAL(Write Ahead Log) file and generates back the raw SQL statements for Insert/Update/Delete commands.
For projects I'm working on, I usually create a simple and easy way to get it setup and running on my machine -- with the help of google of course :) . This helps to tear it down and build it up again as fast as possible when I need to test something or when I mess up.
I am going to share my setup for Postgres . The Dockerfile looks like this
# I have an AMD CPU. Use the right image for your case
FROM amd64/ubuntu:latest
# Prepare the directories and files required
ENV APP_HOME=/postgres/ ID=991 USR=postgres USR_HOME=/home/postgres PG_FILES=/usr/local/pgsql/ BASH_PROFILE=/etc/bash.bashrc
# Permissions
RUN groupadd -g ${ID} ${USR} && \
useradd -r -u ${ID} -g ${USR} ${USR}
# If you have Postgres installed in the current directory. This should be enough to copy
# the source code to the image...You can also download it from the repo directly
# with `RUN git clone https://github.com/misachi/postgres.git`
ADD . ${APP_HOME}
WORKDIR ${APP_HOME}
RUN chown -R ${USR}:${USR} ${APP_HOME} && \
mkdir -p ${USR_HOME} && \
chown -R ${USR}:${USR} ${USR_HOME}
# Requirements installations
RUN apt-get update && apt-get install -y g++ gdb \
zlib1g-dev \
make curl \
tar gzip \
git nano \
libreadline-dev \
flex bison libicu-dev
# Build and Install Postgres
# Install a debug build since want to develop with Postgres
# If debug build is not required, remove the CFLAGS="-O2 -ggdb3" section
# I had issues with installing icu library and making it work with PG, that is why I pass in the --without-icu flag. It can be removed if required
RUN CFLAGS="-O2 -ggdb3" ./configure --without-icu --enable-debug && \
make && \
make all && \
make install
RUN echo "export PATH=$PATH:/usr/local/pgsql/bin/" >> ${BASH_PROFILE} && \
chown -R ${USR}:${USR} ${PG_FILES}
USER ${USR}
# Post-Installation
RUN ${PG_FILES}/bin/pg_ctl -D /usr/local/pgsql/data initdb
CMD [ "pg_ctl", "-D", "/usr/local/pgsql/data", "-l", "logfile start" ]
The file is commented for clarity.
Building the image is as simple as:
docker build -t <name>-<version> .
Once the image build is complete. You can create and run the container with
docker run -d \
--cap-add=SYS_PTRACE \
--init --ulimit core=-1 --mount type=bind,source=/tmp/,target=/tmp/ \
--name postgres-test -p 5432:5432 <image_name> /bin/bash -c '/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l logfile start && tail -f /dev/null'
Here we are mounting /tmp
directory into the container. This is where our core files will be dumped into in case of a system crash or any other issues. We also set size of core-dumped files ulimit
with --ulimit core=-1
. This sets it to unlimited. We set --cap-add=SYS_PTRACE
so that we are able to trace processes within the container.
By default core dumps files are written to /var/lib/apport/coredump/
on most Ubuntu systems. It might be different for your machine. You can change this to the /tmp
directory(that we mounted in the container) with the following command:
echo '/tmp/core.%e.%t' | sudo tee /proc/sys/kernel/core_pattern
This also tells it to format the core dump file a certain way as specified by core.%e.%t
. The format specifiers from core(5) manual are as follows:
%% a single % character
%p PID of dumped process
%i TID of thread that triggered core dump
%c core file size soft resource limit of crashing process (since Linux 2.6.24)
%u Numeric real UID of dumped process.
%g Numeric real GID of dumped process.
%s Number of signal causing dump
%t Time of dump, expressed as seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC).
%h Hostname (same as nodename returned by uname(2)).
%e executable filename (without path prefix)
With this setup, you should be able to run a debugger inside the container, attach it to a Postgres process and step through the code e.g to follow a function call or to help figure out an issue with a core dump file. I'll write about using the debugger with Postgres in another post.
Top comments (0)