NOTE:
In this project, I will be using ubuntu terminal, for parallelism, use the same.
We will also be using fake data generated from https://pypi.org/project/Faker/ this package
You will need to open four different terminals, each for zookeeper, Kafka, producer and consumer respectively.
-Let us begin-
Step 1:
Log in into your server
ssh <your_server_ip_address>
e.g ssh main@70.158.43.200
You will then be prompted to fill in your password, and after filling it correctly, you will automatically be logged into the server.
Step 2:
Once in the server, you will first need to update all the dependencies therein using the following command.
sudo apt update
You will then check the availability of java and python in your server using this command:
java --version
python3 --version
If you ascertain that you have java and python, we can proceed to the next step. Otherwise, you will need to install the missing one(s) using the code you will be provided with in your terminal.
Step 3:
Go to your browser and maneuver to this page: https://kafka.apache.org/downloads
Choose the version you want and then copy the link address from among the binary downloads.
For this,I will be using version 3.7.0 Scala 2.12 , using the link below.
https://archive.apache.org/dist/kafka/3.7.0/kafka_2.12-3.7.0.tgz
Back on our terminal, using wget (GNU Wget is a computer program that retrieves content from web servers. Its name derives from "World Wide Web" and "get", a HTTP request method. It supports downloading via HTTP, HTTPS, and FTP.)
We download the Kafka version we copied the link address of.
wget https://archive.apache.org/dist/kafka/3.7.0/kafka_2.12-3.7.0.tgz
As the downloaded Kafka is in a compressed format (.tgz), we will need to extract it.
tar -xvzf kafka_2.12-3.7.0.tgz
When you list ls
in your terminal, you will discover that you have two files, the old compressed one and the new extracted one (one with .tgz and another without).
We will then need to remove the compressed file (.tgz) as we do not have a use with it.
rm -r kafka_2.12-3.7.0.tgz
If you then list ls
the files you have, you will see that only one remains.
Then remaining with the new file, we see that it has an unnecessarily long name (kafka_2.12-3.7.0). So, we will need to clip the name to a format short enough for us (kafka).
mv kafka_2.12-3.7.0 kafka
Step 4
As we already have Kafka on our server, we then proceed to initiate our Zookeeper and our Kafka server
Zookeeper
NOTE: This is run in its own different terminal
Navigate inside the kafka folder cd kafka
and then list ls
. You will see different files there, but we are only interested with two folders for now, bin
and config
folders.
Inside the bin folder, list ls
to see the files therein. And in the context of zookeeper, we are interested in one file here: zookeeper-server-start.sh
And inside the config folder, we need one file zookeeper.properties
With the two files, we move back to the kafka folder and pass the following code:
bin/zookeeper-server-start.sh config/zookeeper.properties
HINT: This code shows the path to the specific files we need to start our zookeeper.
Qn: Why do we need to start our zookeeper before starting our Kafka servers?
Kafka Server
NOTE: This is run in its own different terminal
The next step is to start our kafka servers. We will first need to cd
into the specific folders we mentioned.
Inside the bin folder, we need one file: kafka-server-start.sh
And inside the config folder, we need: server.properties
We then move to the kafka folder and pass the following code:
bin/kafka-server-start.sh config/server. Properties
After running the two commands in different terminals, it will initiate our zookeeper and our kafka server respectively.
Step 5
Now on to the producer and the consumer
Faker_producer code:
`from kafka import KafkaProducer
from faker import Faker
import json
import time
fake = Faker()
producer = KafkaProducer(
bootstrap_servers = 'localhost:9092',
value_serializer = lambda v: json.dumps(v).encode('utf-8')
)
def generate_users():
return{
'name':fake.name(),
'email':fake.email(),
'address':fake.address(),
'phone_no':fake.phone_number()
}
while True:
user = generate_users()
producer.send('fakes',user)
print(f"sent:{user}")
time. Sleep(5)`
Faker_consumer code:
`from kafka import KafkaConsumer
import json
consumer = KafkaConsumer(
'fakes',
bootstrap_servers = 'localhost:9092',
auto_offset_reset = 'earliest',
value_deserializer = lambda m: json.loads(m.decode('utf-8'))
)
for message in consumer:
print(f"received:{message. Value}")`
Afterwards, we then push our code to our server. It is preferred if you create a folder in your server mkdir your_name
where we will push the code to.
On your terminal, write the following code to push the file(s) into the server:
scp name_of_your_file your_server_ip_address:your_path
scp faker_producer.py main@70.100.22.221:/home/main/Ptoo
scp faker_consumer.py main@70.100.22.221:/home/main/Ptoo
Step 6
After ascertaining that our codes have successfuly been pushed into our server, we then create a virtual environment, preferrably in the folder we created.
python3 -m venv env_name
python3 -m venv fakerenv
We then install kafka and all the other missing dependencies using pip
pip install kafka-python
pip install faker
And so on.
Step 7
We then navigate to our third terminal meant for the producer and log in into the server accordingly.
We will then activate our virtual environment:
source fakerenv/bin/activate
After activation, we will then run our producer file:
python3 faker_producer.py
Our stream of never-ending data should be visible on our terminal.
Step 8
On our fourth terminal meant for the consumer, we will log in into the server as usual and we will ascertain that our files are there.
We will then activate our virtual environment:
source fakerenv/bin/activate
And moving to the folder we created, we will run our faker_consumer file.
python3 faker_consumer.py
If you open the producer and consumer terminals side by side, you will be able to see the data stream in real time on both screens in real time.
And that is it.
Top comments (0)