Intoduction
In this post, I'm going to share with you on how I handled my first Data Engineering assignment.
It covers important Linux fundamentals that every beginner Data Engineer should know. These included navigating directories, creating files and folders, performing file operations, managing permissions, and reviewing command history.
nothing complex, just a beginner's work
What is Linux?
Linux is a free, open-source operating system (OS) based on Unix, designed for stability, security, and flexibility
click the thumbnail to open a youtube video for more info about linux
Why Linux important to Data Engineer
Data Engineers often work with remote servers, cloud virtual machines, ETL pipelines, and automation scripts. Linux provides the environment where many of these tools and processes run efficiently.
Understanding Linux helps in:
- Managing files and directories
- Running scripts and automation tasks
- Working with remote servers
- Organizing datasets
- Securing data files with permissions
Here is how I performed the assignment covering key Linux fundamentals
I learned how to navigate directories, create files/folders, perform file operations, manage permissions, and review command history.
Login Verification
First, I logged into the Ubuntu server to confirm I could access my assigned account. The screenshot below shows a successful login and my current working directory
I used the following commands
rootis the username
143.110.224.135is the server IP address
-p22 specifies the SSH port
whoamiusername of the currently logged-in user(it's screenshot found in the pdf folder at the end)
pwdI used it to shows the exact directory I was currently working in (it's screenshot found in the pdf folder at the end)
Folder & File Structure
After logging in, I created my main project directory using my name, and added subfolders for raw_data, processed_data, logs, and scripts.
Using
mkdirI created "Caleb_data_engineering" as my folder name andmkdir raw_data processed_data logs scriptsto create Subfolders
The screenshot below shows the directory structure using the ls -R command, I used it to list all files and folders I had created
The Subfolder Created stored:
raw_datastores original data
processed_datastores cleaned/modified data
logsstores log files
scriptsstores shell scripts
nano I used it to open and edit files directly from the command line.(terminal-based text editor)
Raw and Processed Data Files
I created sample CSV files to simulate real data: stock_data.csv in raw_data and cleaned_stock_data.csv in processed_data.
These screenshots show the contents of each file.
Scripts and Logs
I also created a simple script and a log file to track operations.
This screenshot shows the process_stock.sh script and the corresponding log file in action.
I used command touch create empty files in each subfolder
Permissions and Navigation
To practice Linux permissions, I restricted access to my main folder and made the script executable.
I also practiced navigating directories and listing hidden files.
At the end, I ran the
historycommand to review all the commands I had executed during the assignment. As shown in the screenshot below
More commands Used in my assignment
For file operations, I used cp to create a backup file, mv to move and rename files, and rm to delete unnecessary files and keep the workspace organized.
For permissions, I used chmod to control access to my files and folders:
chmod 700 ~/Caleb_data_engineering was used to restrict access to my main project folder so that only the owner could access it.
In Linux permission terms, 700 means:
7 = read, write, and execute for the owner
0 = no access for group
0 = no access for others
chmod 600 was used on important files such as the raw data file, cleaned data file, and log file.
This ensures that only the owner can read and write the file.
In this case:
6 = read and write for the owner
0 = no access for group
0 = no access for others
I also used chmod +x to make my shell script executable, allowing it to run directly from the terminal.
To confirm that the permission changes were applied correctly, I used ls -l, which displays file permissions and ownership details.
I then ran ./process_stock.sh to execute my script, ls -la to list all files including hidden ones, and !! to re-run the previous command.
Supporting Evidence
To keep this article concise, I placed the full assignment screenshots in a separate PDF document hosted on GitHub.
Caleb_data_engineering_assignment_screenshot.pdf
Conclusion
This assignment gave me practical exposure to essential Linux fundamentals for beginner Data Engineers.
By creating and organizing files, performing file operations, managing permissions, navigating directories, and reviewing commandhistory, I strengthened my confidence in using the Linux terminal.
Mastering these basic skills is an important first step toward efficiently managing data workflows on remote servers.










Top comments (0)