DEV Community: Akalanka Weerasooriya

Building a (somewhat) intelligent agent

Akalanka Weerasooriya — Sat, 09 Nov 2024 15:14:56 +0000

This started on a Saturday night. If you are very social like me, you would know that there is no better time to do some coding than a peaceful Saturday night. So I opened up a pet project I've been working on and realized that it wasn't pushed to Github yet. I didn't remember the commands to set a remote repo and could have easily Googled or "ChatGPTed" it. But, wouldn't it be cooler to add another layer of abstraction and just tell the computer to "set this projects remote as such and such", specially in this era of intelligent agents? And wouldn't it be even cooler to build that agent?. And that's exactly what I did, instead of spending a few seconds on finding the commands to set the remote repo.

I started solving the problem backwards. I would need a way to run shell commands from a program. That's easy, the subprocess module in Python.

import subprocess

def run_shell_command(command):
    try:
        # Run the shell command
        result = subprocess.run(command, shell=True, check=True, text=True, capture_output=True)
        # Return the command output
        return result.stdout
    except subprocess.CalledProcessError as e:
        # Return the error output if the command fails
        return e.stderr

print(run_shell_command('pwd'))

Now I need a way to decide what commands to run. That's where the intelligence comes in. It needs to take the natural language input and convert them to shell commands. Large Language Models (LLMs) are good at this sort of things. So I tried the following prompt on ChatGPT.

I'm computer program created to assist a human. Currently the human is working on a DevOps task. He asked me to "ssh to the development server and install python". Please tell me what shell commands to run to do the activity. If you need more information please let me know what information you need. Please respond in the following manner and don't include anything else in the response.
Type: [Can be either "Commands" or "More information"]
Response: [List of shell commands or list of more information needed, separated by commas]

Example response 1:
Type: More information
Response: user id, key path

Example response 2:
Type: Commands
Response: ssh -i 'keyfile.pem' user1@server
It worked surprisingly well most of the time.

This was the response.

Type: More information
Response: user id, server IP or hostname, key path or password, operating system type (e.g., Ubuntu, CentOS)

And after passing the inputs, it returned the list of commands as,

Type: Commands
Response: ssh -i 'key.pem' user1@192.168.1.8, sudo yum install python3 -y

Not exactly production ready, but this is a promising start. On this high note I had to stop for the day one, or rather hour one since I'm no longer the young man I once was and it was already 10 PM.

A week later...

Zooming out a little bit, "how would I use this?". I would open up a project on the terminal, and type " set the remote repo for this project as " . Then the agent will ask the LLM for the commands to run. If it needs more information, it will prompt me. After getting the information, it will send them to the LLM, for which the LLM will give the commands or ask for more information. This will repeat until a command runs. If the command is successful, it will stop. But if it returns errors the agent will prompt the LLM for commands to resolve the issue. Also, with each request to the LLM , the agent will send the conversation history in window with a suitable size. This will provide the context to the LLM.

We would need to make the queries to LLM a little abstract to make the agent handle a wider range of tasks. After all, it wouldn't be very useful if its only capable of setting remote repo URLs. At the same time, we need to clearly defile its scope. In this case it would be an agent for running shell commands. To help handling a range of commands, we can parameterize the prompt. Those parameters would be,

The natural language input from the human.
Context: This is little tricky, I will use the conversation history for now.
Any errors returned by running a command.

In addition to that we will have to maintain the state such as executing a command or getting more info.

Let's code it. I've changed the LLMs output to a JSON format string since it's easier to write the processing part that way.

I tested it with a few simple commands and they worked as expected.

Seems alright. Let's try another one.

That's not what I asked for. May be we need to be more specific.

That's more like it. Although I should definitely add a mechanism to verify the commands before running them. That should prevent the agent from doing something crazy. Also, explaining a command before it runs would be a good feature - but not for now.

answer = input(f" Shall I run '{command}'? (Yes/ No) ")
                if answer.lower()=='yes': # Execute the command

So, it kind of works, but we need to make it easily accessible. Creating an alias did the trick. I added the following to ~/.bashrc.

alias shelly='/home/akalanka/projects/shelly/venv/bin/python3 /home/akalanka/projects/shelly/main.py'

Let's see how well "Shelly" fulfills her purpose. First I told Shelly to create the remote repo, but it did't work because it was trying to setup gh CLI tools authentication, which was too complex for a simple tool like this. So I created the remote repo and then asked to set it as the origin of the local repo, which also failed the first time. But after improving the prompt template, I asked her to correct the mistake, which actually worked.

Then I went ahead and asked her to commit and push her own code, which also was done nicely enough (ignoring the fact that she ignored the instruction about the commit message).

It's not much useful for commands I use frequently, which I remember, because it's quicker and more reliable to run the shell command directly. But for other cases this actually seem to help.

So about a week later, I was finally able to set the remote repo for the project. Great success!. What a way to spend weekend evenings!.

Obviously, a lot can be done to improve this. To start, some way of persisting the user inputs between the invocations could smooth things up. Using LangChain could be a good idea. Let me know what you think. Also feel free to check out the source code and open a PR to make it more intelligent. It could use some help. Hey, you can use the Shelly to push your feature, hopefully.

P.S. This was entirely written by a human. Absolutely no intelligence - artificial or otherwise was involved in the writing.

Building a Spark cluster with two PCs and a Raspberry Pi.

Akalanka Weerasooriya — Thu, 28 May 2020 18:39:16 +0000

I read this brilliant post by Ashley Broadley which explains how to set up a Spark cluster with docker compose. It inspired me to try out something a little bit different, to use different devices in the same LAN as nodes.

This post describes how to set up a cluster in Spark Standalone mode, which is easier in comparison to using other cluster managers.
Following devices were used as nodes:

Worker 1: A PC running on Windows, with Docker installed.
Worker 2: A PC running on Ubuntu, with Docker installed.
Master: A Raspberry Pi 3 model B running on Raspbian.

Steps are pretty simple and straight forward. Here we go…

Setting up Spark in Raspberry Pi and starting the Master

I used SSH to log in to the Raspberry Pi and used it in headless mode just to avoid keeping another monitor and a keyboard. But if you don’t mind that, skip the SSH set up and continue with JDK installation on RPi terminal.

Setting up SSH server and opening up port 22

The SSH server of the RPi is not enabled by default. There are broadly two options for enabling it.

Placing a ‘ssh’ file in the SD card from another machine.
Using RPi desktop (Yes, for this you need to plug in a monitor once). RPi documentation explains these two options under 2, and 3.

To test the SSH connection, first find the IP address of the RPi using ifconfig. Then from another machine in the same network enter the command

ssh <username>@<ip address of the RPi>

If the IP address is correct and SSH server is running, you will get a prompt for the password. Type in the login password of the RPi for the user.

However, there are security issues involved with allowing remote login, even if you have set a password. This answer suggests that a key based authentication method should be used.

Installing JDK

Spark runs on Java. So, we need to have Java installed on the RPi. Yet, most RPis used today come with JDK installed on Raspbian. In that case, this step is not necessary. Otherwise, execute following commands from the RPi, to install the Java Runtime Environment.

sudo apt-get update sudo apt-get install openjdk-8-jre

Installing Spark on the RPi and starting the master

From the Spark documentation:

To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster.

Execute following commands to install Spark.

wget https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz tar -xzf spark-2.4.5-bin-hadoop2.7.tgz && \ mv spark-2.4.5-bin-hadoop2.7 /spark && \ rm spark-2.4.5-bin-hadoop2.7.tgz

To start the master, use the following command:

/spark/bin/spark-class org.apache.spark.deploy.master.Master --port 7077 --webui-port 8080

This tells Spark to start a master and listen on port 7077, and also use port 8080 for displaying the web User Interface.
If everything goes well, you should see a bunch of logs running on the screen.

Also, you should be able to see the web UI of the master. If you have a monitor for the RPi, UI can be accessed at localhost:8080, or else point a browser to :8080on any other PC in the LAN.

Seems like the master is running fine. Lets fire up some workers and see what happens.

Starting the worker nodes using Docker

I used the same Dockerfile as in Ashley’s article, and updated the Spark download link. Here it is:

FROM openjdk:8-alpine
RUN apk --update add wget tar bash
RUN wget https://downloads.apache.org/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz
RUN tar -xzf spark-2.4.5-bin-hadoop2.7.tgz && \
    mv spark-2.4.5-bin-hadoop2.7 /spark && \
    rm spark-2.4.5-bin-hadoop2.7.tgz

This will build a docker image with Java and Spark installed. Build the image, start the container, and open its shell using following commands:
Set the environment variable MYNAME by

MYNAME=<your name>

on Ubuntu Terminal or by

set MYNAME=<your name>

on Windows CMD. Also, you may need to execute following with sudo on Ubuntu.

docker build -t $MYNAME/spark:latest . docker run -it --rm $MYNAME/spark:latest /bin/sh

Then start the worker on docker container with following:

spark/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8080 spark://<ip-of-master>:7077

This tells Spark to start a worker and connect it with the master at the given IP. Lets go back to the UI of the master:

Yes! The master has accepted us.
Since I had another laptop laying around I added that to the cluster as well. — The more the merrier.
Adding another worker is no different from the above.

You can build a docker image on the second machine from the above docker file, or use a copy of the one built on the first machine. Use

sudo docker save -o <some name>.tar $MYNAME/spark:latest

to build a tar with the image, copy it to the second machine, and, use

docker load -i <path to image tar file>

to load the image.

Submitting a job

I used one of the examples come with the Spark installation, which calculates the value of Pi. Execute following from the RPi to submit the job.

/spark/bin/spark-submit --master spark://<master-ip-address>:7077 --class org.apache.spark.examples.SparkPi /spark/examples/jars/spark-examples_2.11–2.4.5.jar 1000

org.apache.spark.examples.SparkPi is the entry point of our application, and /spark/examples/jars/spark-examples_2.11–2.4.5.jar is the path to the jar containing the application and dependencies. Since our application is a one comes shipped with the Spark installation, its available on all nodes of the cluster. 1000 is an application argument which in this case is related to the number of partitions to which the data set is being distributed.
You can check the job status on the UI.

There will also be some log entries in the master and worker terminals. After successful completion of the job, it will show the result in the terminal where the job was submitted.