DEV Community

Magichu Njoroge
Magichu Njoroge

Posted on • Edited on

A Data Engineer's First Day on a Linux Server: SSH, PostgreSQL & Everything In Between

From Zero to Data Engineering: The First Mile on a Thousand-Mile Road That Refuses to Be Straight.

With just basic and level-one intermediate tech skills and from a non-tech background, when I got this assignment, I actually thought it would be as easy as class examples, but it was far from that. I encountered a series of errors, mistakes, and confusion, and I'm sure that's how the guy in class did it moments, each showing something new. I'm writing exactly how it all took place.

The Environment

  • Server OS: Ubuntu 24.04.4 LTS (DigitalOcean cloud host)
  • Database: PostgreSQL 16.14
  • Local Machine: Windows (Powershell)
  • Dataset: kenya_counties.csv - 1,000 records generated via Mockaroo
  • Tools: SSH, SCP, psql, bash

Section 1: Getting into the server

Establishing a remote connection to the cloud host using SSH was the first step.

ssh root@159.65.222.96
Enter fullscreen mode Exit fullscreen mode

Easy enough. Using root credentials, I connected to port 22, the default SSH port.

What is SSH? The Secure Shell(SSH) protocol is a method for securely sending commands to a computer over an unsecured network. Used for controlling servers remotely, for managing infrastructure, and transferring files.CLICK HERE To read more on

Screenshot: ssh root.png—successful login showing Ubuntu 24.04.4 banner

Section 2: Creating a User - Things start being real; my first error.

The instructions given clearly stated the format for creating a Linux user account was first name + surname initials. My name is Magichu Njoroge. So the intended username was MagichuN

I ran.

sudo adduser MagichuN
Enter fullscreen mode Exit fullscreen mode

Then came the first wall:
Please enter a username matching the regular expression...

Error when creating user with capital letters

Root Cause: From my research, since this was a first, Linux enforces username rules. Usernames must all be lowercase. Capital letters are rejected at the system security layer to prevent directory pathing conflicts.

Fix: I simply changed to lowercase:

sudo adduser magichun
Enter fullscreen mode Exit fullscreen mode

Successfully creating user magichun

Worked like a charm. Got a home directory set up under /home/magichunthe user context registered.

With the user now recognized by the system, I promoted the user to the sudo group so it could run administrative commands:

sudo usermod -aG sudo magichun
Enter fullscreen mode Exit fullscreen mode

What is a sudo Group? The sudo group is the de facto standard for granting elevated privileges in modern Linux distributions. Its purpose is tightly linked to the sudo command, which allows users to execute commands with the permissions of another user (typically the root user, the superuser with unrestricted access). Read more: CLICK HERE To read more on sudo and admin groups in Linux

Adding user magichun to the sudo group with usermod command

Then counterchecked the user existed and had correct group membership:

id magichun
cat /etc/passwd | grep magichun
Enter fullscreen mode Exit fullscreen mode

Verifying user magichu group membership

Finally, I switched from root into my new user to confirm everything was working:

su magichun
Enter fullscreen mode Exit fullscreen mode

Switching active terminal session from root to magichun

whoami
Enter fullscreen mode Exit fullscreen mode

Confirming active identity is magichun

pwd
Enter fullscreen mode Exit fullscreen mode

Confirm working directory

Why not just hang onto root? Not ideal. Security best practice leans on minimal access rights. Root wields total control, one slip, whole machine at risk. Working as a named user with sudo access is the professional standard.CLICK HERE To read more

Section 3: PostgreSQL - The Role Wall

Back as root, I checked what lived on the server already:

psql --version
Enter fullscreen mode Exit fullscreen mode

PostgreSQL 16.14 version confirmed on the server

sudo systemctl status postgresql
Enter fullscreen mode Exit fullscreen mode

PostgreSQL service active and running

PostgreSQL 16.14 was already installed and running in an active state. No installation needed.

Now it was time to set up the database. Had to switch to the postgres admin shell.

sudo -i -u postgres
psql
Enter fullscreen mode Exit fullscreen mode

PostgreSQL 16.14 version confirmed on the server

After that, I attempted to log back as my new user magichun:

psql -U magichun -d magichun
Enter fullscreen mode Exit fullscreen mode

Then trouble showed up for the second time:
psql:error:connection to server on socket...failed: FATAL: role"magichun" does not exist

FATAL error - role magichun does not exist in PostgreSQL

It took me a minute to figure out what was going on; I had to seek help from a friend. This is what I understood from what he told me (I'm open to much simpler explanations): creating a Linux user does not automatically create a matching PostgreSQL role. One lives in the machine. The other only knows what you tell it. A user that exists in Linux is invisible to PostgreSQL until you explicitly register it there.
And that's exactly what I did

CREATE ROLE magichun WITH LOGIN PASSWORD 'StrongPassword123';
ALTER ROLE magichun CREATEDB;
CREATE DATABASE magichun OWNER magichun;
Enter fullscreen mode Exit fullscreen mode

Creating PostgreSQL role and database for user magichun

After that horrifying experience, I logged back as magichun and created the staging schema:

\c magichun
CREATE SCHEMA staging;
Enter fullscreen mode Exit fullscreen mode

Confirmed with:

\dn
Enter fullscreen mode Exit fullscreen mode

This experience made me realize I didn't really understand the database operations and needed to revisit and do much practice. All the corrections I was led on to were what to do, and I would be lying if I said I corrected them all by myself.

Section 4: Generating and Moving Real Data

Rather than inserting manual dummy records, I used Mockaroo to generate a realistic dataset.

Mockaroo: A free test data generator and API mocking tool. It lets you create custom CSV, JSON, SQL, and Excel datasets, perfect for testing database pipelines.

What is a data pipeline? A data pipeline is a set of tools and processes for collecting, processing, and delivering data from one or more sources to a destination where it can be analyzed and used. CLICK HERE To read more on data pipelines.

My dataset: kenya_counties.csv - 1,000 records with these columns:

Column Type
id Integer (Primary Key)
first_name VARCHAR
last_name VARCHAR
email VARCHAR
gender VARCHAR
ip_address VARCHAR
county_name VARCHAR
population Integer
area_sq_km Decimal
population_density Decimal
average_age Decimal

The SCP Error (Running It From the Wrong Place)

What is SCP?: The SCP (Secure Copy Protocol) command is used to securely transfer files and directories between systems over a network. It works over SSH, which means all transferred data is encrypted. It is commonly used by system administrators to move backup files, fetch logs, copy configuration files, and transfer data between servers safely and quickly. CLICK HERE To read more on SCP

scp "D:\Datasets\kenya_counties.csv" magichun@159.65.222.96:/home/magichun/
Enter fullscreen mode Exit fullscreen mode

Error:
ssh: could not resolve hostname d: Temporary failure in name resolution

SCP error when run from inside the SSH session

Root Cause: I used the command on the remote server's terminal; it had no context. D:\ means - that's a Windows drive path on my local machine. For SCP to work, it needs to run from the machine that holds the source file.

Fix: Ran Windows PowerShell locally.

scp "D:\Datasets\kenya_counties.csv" magichun@159.65.222.96:/home/magichun/
Enter fullscreen mode Exit fullscreen mode

Successfully transferring kenya_counties.csv from local machine to server

Then verified on the server:

ls -lh /home/magichun/
head -5 kenya_counties.csv
Enter fullscreen mode Exit fullscreen mode

Confirming kenya_counties.csv arrived on the server with ls -lh

Previewing first 5 rows of kenya_counties.csv with head command

File existed: 89K, headers confirmed.

Creating the Table

\c magichun 
SET search_path TO staging;

CREATE TABLE staging.kenya _counties (
    id INT,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    email VARCHAR (150),
    gender VARCHAR (20),
    ip_address VARCHAR (20),
    county_name VARCHAR (100),
    population INT,
    area_sq_km DECIMAL(10,2),
    population_density DECIMAL (10,2),
    average_age DECIMAL (5,2)
);
Enter fullscreen mode Exit fullscreen mode

Creating kenya_counties table inside the staging schema

The COPY Permission Wall

COPY staging.kenya_counties
FROM '/home/magichun/kenya_counties.csv'
DELIMITER ',' CSV HEADER;
Enter fullscreen mode Exit fullscreen mode

Error:
ERROR: could not open file "/home/magichun/kenya_counties.csv" for reading: Permission denied

COPY command permission denied error when reading from home directory

Another error that had me in a chokehold: I didn't understand what was going on until I realized it was an issue with permissions.

Linux Permissions Linux file permissions form the foundation of the system's security model. They define who can read, write, or execute files and directories. CLICK HERE To read more on Linux Permissions

Fix - Two steps:

chmod 755 /home/magichun
chmod 644 /home/magichun/kenya_counties.csv 
Enter fullscreen mode Exit fullscreen mode

Then used the client-side \copywrapper instead, which streams the file through the active user session rather than the server process:

\copy staging.kenya_counties
FROM '/home/magichun/kenya_counties.csv'
DELIMITER ',' CSV HEADER;
Enter fullscreen mode Exit fullscreen mode

Result: copy 1000

Setting directory and file permissions with chmod 755 and 644

Verifying the Data

SELECT county_name, population_density
FROM staging.kenya_counties
ORDER BY population_density DESC
LIMIT 5;
Enter fullscreen mode Exit fullscreen mode

Mombasa came out as the county with the highest density.

Query output showing ingested kenya counties data

COPY vs \copy: Felt the need to touch on this as it confused me at first and it might help someone else confused about it.

1. COPY (SQL Command)
-
Runs on the PostgreSQL server side.
-File path is relative to the server, not your local machine
-Requires the PostgreSQL server process to have read/write
permissions to that file path.

COPY my_table FROM '/var/lib/postgresql/data/input.csv' CSV HEADER;


sql
-This will fail if the file is on your local machine but not on the server.

2. \copy (psql Meta-command)
-
Runs on the client side** (inside the psql terminal)
-Reads/writes files from your local machine where psql is running
-Uses the COPY command internally, but streams data between the client and server.

\copy my_table FROM 'C:/Users/you/data.csv' CSV HEADER;

-Works even if the PostgreSQL server is remote, because the file is used locally

Rule of thumb
-If your file is on the server, use COPY.
-If your file is on your local machine, use \copy in psql.

Section 5: The Reverse Pipeline - Exporting Back to windows.

The final step completed the loop, moving the file back to my local machine

\copy (SELECT county_name, population_density 
       FROM staging.kenya_counties
       WHERE population_density > 500)
TO '/home/magichun/high_density_counties.csv'
WITH CSV HEADER;
Enter fullscreen mode Exit fullscreen mode

Exporting high density counties to CSV file on the server

Output: COPY 205 - 205 high-density records exported

scp -p 22 magichu@159.65.222.96:/home/magichun/high_density_counties.csv "D:\Datasets"
Enter fullscreen mode Exit fullscreen mode

100% download confirmed

scp file download from server to local machine

Here's a quick reference of the commands used throughout this assignment:

# Command Purpose
1 ssh root@<IP> Remote server login
2 sudo adduser magichun Create a new system user
3 sudo usermod -aG sudo magichun Grant admin privileges
4 id magichun Verify user and group membership
5 cat /etc/passwd \ grep magichun Confirm user in system registry
6 su magichun Switch active user
7 whoami Check current identity
8 pwd Print working directory
9 psql --version Check PostgreSQL version
10 sudo systemctl status postgresql Check service status
11 sudo -i -u postgres Switch to postgres admin
12 ls -lh /home/magichun/ List files with sizes
13 head -5 kenya_counties.csv Preview file contents
14 chmod 755 /home/magichun Open directory permissions
15 chmod 644 /home/magichun/kenya_counties.csv Set file read permissions
16 scp "D:\file.csv" user@IP:/path/ Upload file to server
17 scp -P 22 user@IP:/path/file.csv "D:\dest" Download file from server

Clearly, this assignment wasn't a walk in the park, including writing this article, but I pushed through, and that's where growth happens. Perhaps the biggest lesson was realizing that some of the skills I was most confident in were the very ones that exposed my blind spots.
Working on the errors and mistakes through research and external help really brought growth to my skills and confidence.
As someone still early in this journey, I'd appreciate comments, suggestions, and corrections, as they are my real learning curve and the most valuable part of my journey.

*This article documents a real setup task, configuring servers and pulling data, a project completed as part of the LuxDevHQ Data Engineering Program.

Github Repository

View the full project on Github

Top comments (0)