What is data Engineering
Data engineering is the work of building and maintaining the data pipelines that allows organizations to collect, store, process and use data effectively.
Data engineers use different tools to store data such as digital ocean, google cloud platform, azure and Aws. These Saas run on Linux Operating Systems, thus data engineers have to use the terminal to connect, manage and monitor data flows.
Linux Essentials
Data engineers use different operating systems to connect to virtual private servers (VPS). The most common method that is used - secure shell(SSH).
SSH works on windows after installation of wsl (windows subsystem for linux), however MacOs and Linux operating systems will work with it out of the box.
Since I use windows, I installed wsl, and used ssh @ip - the username and Ip were provided. The VPS had a password for security purpose and inputting it you get access to an Ubuntu Operating System.
>ssh root@159.65.222.96
root@159.65.222.96's password:

When data engineers can't access a particular server because of company restrictions, they use jump servers to connect to a different server.
To check if all the files have the same name contained in all the folders we use find -name 'file_name' as shown below.
To move files to and from a local machine to the server we use secure copy protocol (SCL)
a) Moving files from local machine to the server:
scp C:/Users/Admin/Downloads/Rental_property root@159.65.222.96 /root/<folder>
b) Moving files from server to local machine:
scp /root/<folder>/rental_property.csv C:/Users/Admin/Desktop

To open a file in the server you can use vim editor or nano editor. Once the file is open, you can edit it by clicking esc and using I for inserting. Click escape then full colon then type wq(write-quit) to permanently save the changes that have been added to the file. Adding ! to the command wq!, means that your changes will overide any changes that were previously added/currently added.
root@class:~# vim <filename>

To remove files from a folder you can use rm
rm -r <filename>
To view the contents of a file you can use more or less
root@ip~:# more <filename>
To check the first 5 lines of a file use head and to check the last 5 lines of the code use tail.
root@ip~:# head <filename>
To update the linux operating system currently running in the virtual private server we use sudo apt update
Working with databases in the server - Postgresql
Once you have access to the server, install postgres using the following code: sudo apt install postgresql postgresql-contrib -y. check the status of the server and the version of the psql. The status check type: sudo systemctl status postgresql and the version is psql --version.

To add a user to a server use:sudo adduser <user>. If the server rejects the format of the username that you want to add, use: sudo adduser --force-badname username.

To check if the user was added use more/etc/passwd. Ensure you check the last line of the list displayed.

To access the database using psql shell we access the postgres instance in the server: sudo -i -u postgres.then we use psqlcommand to access the psql shell where we can create a database.

After creating the database of your choice, the original postgres super user is the only one who has write and read privileges which makes it a challenge to connect the created database and the new user added to manage the databases.

The solution I opted for is altering the postgres password since I didn't set it up when the postgres was installed.
The command to alter postgres password in the psql shell is: ALTER USER postgres WITH PASSWORD <newpassword>.
After setting up the postgres database and schema, you need to change a few files in postgresql found in the root folder. The first file is the postgresql.conf that has the 'listenaddress' which should be uncommented after using vim to edit it. The next file to be edited is the pg_hba.conf file that has the firewall setup. Add the following configuration: 'host all all 0.0.0.0 md5' on the last line.
Restart the servers using:sudo systemctl restart postgresql. This will ensure you don't get the same fault as you saw on dbeaver.
To check if the database has data, we access the psql shell and type the following commands: \l : List all databases, \c database_name : Connect to a different database, \dt : List all tables in the current database, \du : List all the database users and their roles, \conninfo: display current connection details, \q : Exit

\c database_name : Connect to a different database

\dt : List all tables in the current database









Top comments (1)
Step by step, The screenshot , I can easily learn from it. Interesting