As a passionate developer with many years of experience, I've witnessed the evolution of technology and its impact on the way we handle data. While traditional methods have served us well, the advent of Generative AI (GenAI) presents a transformative opportunity for data analysis. Today, I want to share my journey into this exciting new world by exploring how GenAI can revolutionize the way we interact with and analyze data exported from Jira.
Jira, a popular project management tool, is widely used to track tasks, bugs, and project progress. It offers the capability to export data into CSV files, which can then be analyzed to gain insights. However, the manual process of sifting through these CSV files can be tedious and time-consuming. This is where GenAI steps in.
Imagine being able to query your Jira exports using natural language, extracting meaningful insights effortlessly. With the combination of LangChain, OpenAI's powerful language model, and the pandas library, we can create an intelligent agent capable of interpreting and executing complex queries on Jira data exports. This not only simplifies the analysis process.
In this article, I'll walk you through how GenAI can be applied to Jira CSV exports, transforming raw data into actionable insights with ease. We'll delve into a project that leverages these cutting-edge technologies to demonstrate the potential of GenAI in revolutionizing our approach to data analysis.
While it is possible to use a Jira agent developed by LangChain, I chose to work with a CSV file that closely resembles the format of an export from Jira. This decision was made to simplify the demonstration and to avoid to connect to the Jira instance of your Company :). The data in the CSV file used for this project was generated in a nearly random manner to serve as an example.
Let's Start
To get started with this project, you'll need an OpenAI API key. You can obtain this key by subscribing to the OpenAI service on their website. Once you have your API key, save it in a .env
file at the root of the project. The .env
file should look like this:
OPENAI_API_KEY=your-api-key-here
Next, you need to install the necessary libraries. You can do this using pipenv with the following command:
pipenv install pandas numpy python-dotenv langchain langchain-openai openai langchain-experimental tabulate streamlit streamlit-chat
For this project, I used PyCharm, but you can use any other Python IDE such as VS Code or any editor of your choice.
The entire project is available in my GitHub repository langchain_pandas_agent. In this repository, under the csv_dir directory, you will find a CSV file containing tasks for a fictional eCommerce project.
The teams Baggio, Buffon, Del Piero, Mancini, Materazzi, and Zola are working on this project, and the file includes both Technical and Functional tasks recorded over three sprints.
The Main Script
The core of our project is encapsulated in the main.py script, which orchestrates the interaction between various components. Let's take a closer look at how it's structured:
from dotenv import load_dotenv
from agent_manager import getAgentExecutorChatOpenAI
from df_manager import read_df
def ask(query):
df = read_df()
agent_executor = getAgentExecutorChatOpenAI(df)
response = agent_executor.invoke(query)
return response
def start_web_app():
from web_app import build_web_app
build_web_app(ask=ask)
if __name__ == '__main__':
load_dotenv()
start_web_app()
Explanation
-
Imports: The script begins by importing necessary modules:
-
load_dotenv
fromdotenv
to load environment variables. -
getAgentExecutorChatOpenAI
fromagent_manager
to create the agent executor. -
read_df
fromdf_manager
to read the DataFrame.
-
ask Function: The
ask
function is designed to handle the queries. It reads the DataFrame usingread_df()
, initializes the agent executor withgetAgentExecutorChatOpenAI(df)
, and invokes the agent with the given query to obtain a response. This function essentially acts as the bridge between the user input and the underlying data processing logic.start_web_app Function: This function sets up the web application. It imports
build_web_app
fromweb_app
and calls it, passing theask
function as a parameter. This function ensures that the web app is equipped to handle user queries through a streamlined interface.Main Execution: The script concludes with a standard Python convention: checking if the script is being run directly. If it is, it loads the environment variables using
load_dotenv()
and starts the web application by callingstart_web_app()
.
Key Components
- Web Application: The web application is built using the Streamlit framework, which allows for the creation of interactive web interfaces with minimal effort. Streamlit is ideal for data apps, providing a smooth and intuitive user experience.
- Dataframe reader: This module contains the logic related to the reading and lightweight processing of the pandas dataframe.
-
Agent Manager: The
agent_manager
module contains the logic related to LangChain and OpenAI. It is responsible for creating and managing the intelligent agent that processes natural language queries and interacts with the data.
Reading the CSV: df_manager.py
The df_manager.py
file contains a function to read and preprocess our CSV data from Jira.
import pandas as pd
def read_df():
file_name = "csv_dir/Jira_Issues_Simulation_Final_With_Dates.csv"
df = pd.read_csv(file_name, sep="|")
df['Category'] = df['Labels'].apply(lambda x: str(x).split(',')[0].strip() if ',' in str(x) else '')
df['Team'] = df['Labels'].apply(lambda x: str(x).split(',')[1].strip() if ',' in str(x) else '')
return df
It mainly uses three steps:
-
Read the CSV: The
read_df
function reads the CSV file located in thecsv_dir
directory using pandas, with a pipe (|
) as the separator. -
Process Labels: It extracts
Category
andTeam
from theLabels
column, assuming labels are comma-separated. - Return DataFrame: Finally, it returns the processed DataFrame ready for analysis.
This simple script ensures that our data is correctly loaded and preprocessed, making it ready for the natural language queries handled by our LangChain agent.
Creating the Agent: agent_manager.py
The agent_manager.py
file contains the logic for creating the LangChain agent using OpenAI's GPT-3.5-turbo model.
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI, OpenAI
def getAgentExecutorChatOpenAI(df):
prefix = """
Use the column Team to retrieve the team;
technical or functional refer to the column category;
Consider that the single row can be called ECOM;
Consider that state refers to column Status;
The URL of an ECOM is https://robby.com/{Issue Key}, where {Issue Key} is the value of the column "Issue Key"
"""
agent = create_pandas_dataframe_agent(
ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
df,
prefix=prefix,
verbose=True,
return_intermediate_steps=True,
agent_type="openai-tools",
max_iterations=10,
agent_executor_kwargs={"handle_parsing_errors": True}
)
return agent
Even here we can describe mainly three steps:
-
Imports: The script imports necessary modules:
-
create_pandas_dataframe_agent
fromlangchain_experimental.agents.agent_toolkits
. -
ChatOpenAI
andOpenAI
fromlangchain_openai
.
-
-
getAgentExecutorChatOpenAI Function: This function creates and configures the LangChain agent.
-
Prefix Instructions: A prefix is provided to guide the agent on how to interpret certain columns in the DataFrame. I've added these prefix statements after many attempts. To be honest, I was very impressed when I was able to explain how to build the URL even id I'm not an expert on
prompt engineering
-
Agent Creation: The agent is created using the
create_pandas_dataframe_agent
function with several parameters:-
ChatOpenAI
: The language model is set togpt-3.5-turbo-0613
with a temperature of 0 for deterministic responses. -
df
: The DataFrame that the agent will interact with. -
prefix
: Instructions to guide the agent. -
verbose
: Enables detailed logging. -
return_intermediate_steps
: Returns intermediate steps for debugging. -
agent_type
: Specifies the type of agent asopenai-tools
. -
max_iterations
: Limits the number of iterations to 10. -
agent_executor_kwargs
: Additional arguments to handle parsing errors gracefully.
-
-
Prefix Instructions: A prefix is provided to guide the agent on how to interpret certain columns in the DataFrame. I've added these prefix statements after many attempts. To be honest, I was very impressed when I was able to explain how to build the URL even id I'm not an expert on
Return Agent: Finally, the configured agent is returned for use in querying the DataFrame.
This setup equips our LangChain agent with the ability to process natural language queries on the Jira CSV data, leveraging the power of OpenAI's model. It's worth describing that in this case we've used GPT-3.5-turbo model, but with few changes it is possible to switch to a local llm or to a different llm (ie: Google Gemini)
Building the Web Application: web_app.py
The web_app.py
file sets up a web interface using Streamlit, allowing users to interact with the LangChain agent through natural language queries.
import streamlit as st
from streamlit_chat import message
import uuid
def build_web_app(ask):
def answer_request():
if st.session_state.text_input:
with st.spinner("Generating response..."):
generated_response = ask(st.session_state.text_input)
st.session_state.chat_history.insert(0, (st.session_state.text_input, generated_response["output"]))
st.session_state.user_prompt_history.insert(0, st.session_state.text_input)
st.session_state.chat_answers_history.insert(0, generated_response['output'])
st.session_state.text_input=''
st.header("LangChain🦜🔗 Jira facilitator example")
if (
"chat_answers_history" not in st.session_state
and "user_prompt_history" not in st.session_state
and "chat_history" not in st.session_state
and "text_input" not in st.session_state
):
st.session_state["chat_answers_history"] = []
st.session_state["user_prompt_history"] = []
st.session_state["chat_history"] = []
st.session_state.text_input=''
prompt = st.text_area("Prompt", key='text_input', on_change=answer_request)
if st.session_state["chat_answers_history"]:
for generated_response, user_query in zip(
st.session_state["chat_answers_history"],
st.session_state["user_prompt_history"],
):
message(
user_query,
is_user=True,
key=str(uuid.uuid4())
)
message(
generated_response,
is_user=False,
key=str(uuid.uuid4())
)
Two main points here:
-
Imports: The script imports necessary modules:
-
streamlit
asst
for creating the web interface. -
message
fromstreamlit_chat
for chat message formatting. -
uuid
for generating unique keys for chat messages.
-
-
build_web_app Function: This function sets up the Streamlit web application.
-
answer_request Function: Handles user queries. When the user inputs a query, it invokes the
ask
function to generate a response and updates the chat history. - Header: Sets the header of the web app.
- Session State Initialization: Initializes session state variables for storing chat history, user prompts, and generated responses.
-
Prompt Input: Creates a text area for user input. When the input changes,
answer_request
is called. -
Display Chat History: Iterates over the chat history and displays user queries and generated responses using the
message
function.
-
answer_request Function: Handles user queries. When the user inputs a query, it invokes the
This file creates a user-friendly web interface for interacting with the LangChain agent, enabling real-time natural language queries on the Jira CSV data.
Conclusion
I'm not a guru in Large Language Models (LLM) or Generative AI (GenAI), I am just beginning to explore the vast potential these technologies offer. The project we've discussed here is a small step into the transformative world of GenAI. The possibilities are truly endless, from simplifying data analysis to creating intelligent agents that can understand and process natural language queries.
This project is just an example of what can be achieved, and I encourage you to experiment and build upon it. As we continue to learn and adapt, we can look forward to a future where AI plays an integral role in our development processes.
Let's embrace this transformation and explore the endless opportunities that GenAI has to offer.
Top comments (0)