DEV Community

Cover image for GenAI meets Jira: Transforming CSV Exports into Insights
rosidotidev
rosidotidev

Posted on

GenAI meets Jira: Transforming CSV Exports into Insights

As a passionate developer with many years of experience, I've witnessed the evolution of technology and its impact on the way we handle data. While traditional methods have served us well, the advent of Generative AI (GenAI) presents a transformative opportunity for data analysis. Today, I want to share my journey into this exciting new world by exploring how GenAI can revolutionize the way we interact with and analyze data exported from Jira.

Jira, a popular project management tool, is widely used to track tasks, bugs, and project progress. It offers the capability to export data into CSV files, which can then be analyzed to gain insights. However, the manual process of sifting through these CSV files can be tedious and time-consuming. This is where GenAI steps in.
Imagine being able to query your Jira exports using natural language, extracting meaningful insights effortlessly. With the combination of LangChain, OpenAI's powerful language model, and the pandas library, we can create an intelligent agent capable of interpreting and executing complex queries on Jira data exports. This not only simplifies the analysis process.
In this article, I'll walk you through how GenAI can be applied to Jira CSV exports, transforming raw data into actionable insights with ease. We'll delve into a project that leverages these cutting-edge technologies to demonstrate the potential of GenAI in revolutionizing our approach to data analysis.

While it is possible to use a Jira agent developed by LangChain, I chose to work with a CSV file that closely resembles the format of an export from Jira. This decision was made to simplify the demonstration and to avoid to connect to the Jira instance of your Company :). The data in the CSV file used for this project was generated in a nearly random manner to serve as an example.

Let's Start

To get started with this project, you'll need an OpenAI API key. You can obtain this key by subscribing to the OpenAI service on their website. Once you have your API key, save it in a .env file at the root of the project. The .env file should look like this:

OPENAI_API_KEY=your-api-key-here
Enter fullscreen mode Exit fullscreen mode

Next, you need to install the necessary libraries. You can do this using pipenv with the following command:

pipenv install pandas numpy python-dotenv langchain langchain-openai openai langchain-experimental tabulate streamlit streamlit-chat
Enter fullscreen mode Exit fullscreen mode

For this project, I used PyCharm, but you can use any other Python IDE such as VS Code or any editor of your choice.

The entire project is available in my GitHub repository langchain_pandas_agent. In this repository, under the csv_dir directory, you will find a CSV file containing tasks for a fictional eCommerce project.

Image description
The teams Baggio, Buffon, Del Piero, Mancini, Materazzi, and Zola are working on this project, and the file includes both Technical and Functional tasks recorded over three sprints.

The Main Script

The core of our project is encapsulated in the main.py script, which orchestrates the interaction between various components. Let's take a closer look at how it's structured:

from dotenv import load_dotenv
from agent_manager import getAgentExecutorChatOpenAI
from df_manager import read_df

def ask(query):
    df = read_df()
    agent_executor = getAgentExecutorChatOpenAI(df)
    response = agent_executor.invoke(query)
    return response

def start_web_app():
    from web_app import build_web_app
    build_web_app(ask=ask)

if __name__ == '__main__':
    load_dotenv()
    start_web_app()
Enter fullscreen mode Exit fullscreen mode

Explanation

  1. Imports: The script begins by importing necessary modules:

    • load_dotenv from dotenv to load environment variables.
    • getAgentExecutorChatOpenAI from agent_manager to create the agent executor.
    • read_df from df_manager to read the DataFrame.
  2. ask Function: The ask function is designed to handle the queries. It reads the DataFrame using read_df(), initializes the agent executor with getAgentExecutorChatOpenAI(df), and invokes the agent with the given query to obtain a response. This function essentially acts as the bridge between the user input and the underlying data processing logic.

  3. start_web_app Function: This function sets up the web application. It imports build_web_app from web_app and calls it, passing the ask function as a parameter. This function ensures that the web app is equipped to handle user queries through a streamlined interface.

  4. Main Execution: The script concludes with a standard Python convention: checking if the script is being run directly. If it is, it loads the environment variables using load_dotenv() and starts the web application by calling start_web_app().

Key Components

  • Web Application: The web application is built using the Streamlit framework, which allows for the creation of interactive web interfaces with minimal effort. Streamlit is ideal for data apps, providing a smooth and intuitive user experience.
  • Dataframe reader: This module contains the logic related to the reading and lightweight processing of the pandas dataframe.
  • Agent Manager: The agent_manager module contains the logic related to LangChain and OpenAI. It is responsible for creating and managing the intelligent agent that processes natural language queries and interacts with the data.

Reading the CSV: df_manager.py

The df_manager.py file contains a function to read and preprocess our CSV data from Jira.

import pandas as pd

def read_df():
    file_name = "csv_dir/Jira_Issues_Simulation_Final_With_Dates.csv"
    df = pd.read_csv(file_name, sep="|")
    df['Category'] = df['Labels'].apply(lambda x: str(x).split(',')[0].strip() if ',' in str(x) else '')
    df['Team'] = df['Labels'].apply(lambda x: str(x).split(',')[1].strip() if ',' in str(x) else '')
    return df
Enter fullscreen mode Exit fullscreen mode

It mainly uses three steps:

  1. Read the CSV: The read_df function reads the CSV file located in the csv_dir directory using pandas, with a pipe (|) as the separator.
  2. Process Labels: It extracts Category and Team from the Labels column, assuming labels are comma-separated.
  3. Return DataFrame: Finally, it returns the processed DataFrame ready for analysis.

This simple script ensures that our data is correctly loaded and preprocessed, making it ready for the natural language queries handled by our LangChain agent.

Creating the Agent: agent_manager.py

The agent_manager.py file contains the logic for creating the LangChain agent using OpenAI's GPT-3.5-turbo model.

from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI, OpenAI

def getAgentExecutorChatOpenAI(df):
    prefix = """
           Use the column Team to retrieve the team;
           technical or functional refer to the column category;
           Consider that the single row can be called ECOM;
           Consider that state refers to column Status;
           The URL of an ECOM is https://robby.com/{Issue Key}, where {Issue Key} is the value of the column "Issue Key"
          """
    agent = create_pandas_dataframe_agent(
        ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613"),
        df,
        prefix=prefix,
        verbose=True,
        return_intermediate_steps=True,
        agent_type="openai-tools",
        max_iterations=10,
        agent_executor_kwargs={"handle_parsing_errors": True}
    )
    return agent
Enter fullscreen mode Exit fullscreen mode

Even here we can describe mainly three steps:

  1. Imports: The script imports necessary modules:

    • create_pandas_dataframe_agent from langchain_experimental.agents.agent_toolkits.
    • ChatOpenAI and OpenAI from langchain_openai.
  2. getAgentExecutorChatOpenAI Function: This function creates and configures the LangChain agent.

    • Prefix Instructions: A prefix is provided to guide the agent on how to interpret certain columns in the DataFrame. I've added these prefix statements after many attempts. To be honest, I was very impressed when I was able to explain how to build the URL even id I'm not an expert on prompt engineering
    • Agent Creation: The agent is created using the create_pandas_dataframe_agent function with several parameters:
      • ChatOpenAI: The language model is set to gpt-3.5-turbo-0613 with a temperature of 0 for deterministic responses.
      • df: The DataFrame that the agent will interact with.
      • prefix: Instructions to guide the agent.
      • verbose: Enables detailed logging.
      • return_intermediate_steps: Returns intermediate steps for debugging.
      • agent_type: Specifies the type of agent as openai-tools.
      • max_iterations: Limits the number of iterations to 10.
      • agent_executor_kwargs: Additional arguments to handle parsing errors gracefully.
  3. Return Agent: Finally, the configured agent is returned for use in querying the DataFrame.

This setup equips our LangChain agent with the ability to process natural language queries on the Jira CSV data, leveraging the power of OpenAI's model. It's worth describing that in this case we've used GPT-3.5-turbo model, but with few changes it is possible to switch to a local llm or to a different llm (ie: Google Gemini)

Building the Web Application: web_app.py

The web_app.py file sets up a web interface using Streamlit, allowing users to interact with the LangChain agent through natural language queries.

import streamlit as st
from streamlit_chat import message
import uuid

def build_web_app(ask):
    def answer_request():
        if st.session_state.text_input:
            with st.spinner("Generating response..."):
                generated_response = ask(st.session_state.text_input)
                st.session_state.chat_history.insert(0, (st.session_state.text_input, generated_response["output"]))
                st.session_state.user_prompt_history.insert(0, st.session_state.text_input)
                st.session_state.chat_answers_history.insert(0, generated_response['output'])
            st.session_state.text_input=''

    st.header("LangChain🦜🔗 Jira facilitator example")
    if (
            "chat_answers_history" not in st.session_state
            and "user_prompt_history" not in st.session_state
            and "chat_history" not in st.session_state
            and "text_input" not in st.session_state
    ):
        st.session_state["chat_answers_history"] = []
        st.session_state["user_prompt_history"] = []
        st.session_state["chat_history"] = []
        st.session_state.text_input=''

    prompt = st.text_area("Prompt", key='text_input', on_change=answer_request)
    if st.session_state["chat_answers_history"]:

        for generated_response, user_query in zip(
                st.session_state["chat_answers_history"],
                st.session_state["user_prompt_history"],
        ):
            message(
                user_query,
                is_user=True,
                key=str(uuid.uuid4())
            )
            message(
                generated_response,
                is_user=False,
                key=str(uuid.uuid4())
            )
Enter fullscreen mode Exit fullscreen mode

Two main points here:

  1. Imports: The script imports necessary modules:

    • streamlit as st for creating the web interface.
    • message from streamlit_chat for chat message formatting.
    • uuid for generating unique keys for chat messages.
  2. build_web_app Function: This function sets up the Streamlit web application.

    • answer_request Function: Handles user queries. When the user inputs a query, it invokes the ask function to generate a response and updates the chat history.
    • Header: Sets the header of the web app.
    • Session State Initialization: Initializes session state variables for storing chat history, user prompts, and generated responses.
    • Prompt Input: Creates a text area for user input. When the input changes, answer_request is called.
    • Display Chat History: Iterates over the chat history and displays user queries and generated responses using the message function.

This file creates a user-friendly web interface for interacting with the LangChain agent, enabling real-time natural language queries on the Jira CSV data.

Conclusion

I'm not a guru in Large Language Models (LLM) or Generative AI (GenAI), I am just beginning to explore the vast potential these technologies offer. The project we've discussed here is a small step into the transformative world of GenAI. The possibilities are truly endless, from simplifying data analysis to creating intelligent agents that can understand and process natural language queries.

This project is just an example of what can be achieved, and I encourage you to experiment and build upon it. As we continue to learn and adapt, we can look forward to a future where AI plays an integral role in our development processes.

Let's embrace this transformation and explore the endless opportunities that GenAI has to offer.

Top comments (0)