DEV Community

Akshay Keerthi
Akshay Keerthi

Posted on

Multifile Analyzer: A Guide to Efficient Data Analysis with Lyzr SDK

In today’s data-driven world, businesses and individuals alike are inundated with vast amounts of data . However, extracting meaningful insights from this data can be a daunting task without the right tools and methodologies. In this blog post, we’ll explore how Lyzr SDK , helps in building a comprehensive data analysis tool, that can streamline the process of deriving insights from diverse datasets .

The Multifile Analyzer offers a precise and streamlined approach to data analysis across multiple file formats, including CSV, Excel, and JSON .

What sets the Multifile Analyzer apart is its ability to integrate seamlessly across formats. Whether dealing with CSV, Excel, JSON or a mix, users benefit from a unified platform for analysis, consolidating efforts and gaining a comprehensive view of their data landscape. This versatility enhances decision-making processes, driving meaningful outcomes for users.

Why use Lyzr SDK’s ?

With Lyzr SDKs , crafting your own GenAI application is a breeze, requiring only a few lines of code to get up and running swiftly.

Checkout the Lyzr SDK’s

Lets get Started!
Create a new file app.py and use that

import os
from pathlib import Path
import streamlit as st
from utils import utils
from lyzr import DataConnector, DataAnalyzr
import pandas as pd
Enter fullscreen mode Exit fullscreen mode

This snippet enables tasks like interacting with the operating system (os), handling file paths (pathlib), building interactive web applications (streamlit), implementing custom utilities (utils), managing data (lyzr.DataConnector and lyzr.DataAnalyzr).

Next Set Up OpenAI API Key and using Streamlit’s secrets management, set up the OpenAI API key within your Streamlit application. Replace "OPENAI_API_KEY" with the actual key name defined in your Streamlit secrets where your OpenAI API key is stored.

# Set OpenAI API key
os.environ["OPENAI_API_KEY"] = st.secrets["apikey"]
Enter fullscreen mode Exit fullscreen mode
def data_uploader():
    st.subheader("Upload your files here")

    # Dictionary to map file types to their respective extensions
    file_types = {"CSV": ["csv"], "Excel": ["xlsx", "xls"], "JSON": ["json"]}

    # File type selection
    file_type = st.radio("Select file type:", list(file_types.keys()))

    # Upload file based on selection
    uploaded_file = st.file_uploader(f"Choose {file_type} file", type=file_types[file_type])

    # Process uploaded file
    if uploaded_file is not None:
        utils.save_uploaded_file(uploaded_file)
    else:
        utils.remove_existing_files(data)
        utils.remove_existing_files(plot)
Enter fullscreen mode Exit fullscreen mode

The data_uploader function in a Streamlit application provides a streamlined interface for users to upload files of various types, such as CSV, Excel, or JSON . After selecting a file type, users can upload their files, which are then processed accordingly. Uploaded files are saved using a utility function called save_uploaded_file , while the function also removes existing data and plot files if no file is uploaded. However, it's important to ensure that variables data and plot referenced within the function are defined elsewhere in the application for proper functionality .

def analyzr():
    # Get list of files in the data directory
    files = file_checker()

    # Check if any files are available
    if len(files) > 0:
        # Assuming the first file in the list is the desired file
        file_path = files[0]

        # Determine file extension
        file_extension = Path(file_path).suffix.lower()

        # Load data based on file type
        if file_extension == '.csv':
            dataframe = DataConnector().fetch_dataframe_from_csv(file_path=Path(file_path))
        elif file_extension in ('.xlsx', '.xls'):
            dataframe = DataConnector().fetch_dataframe_from_excel(file_path=Path(file_path))
        elif file_extension == '.json':
            dataframe = pd.read_json(file_path)  # Load JSON file using pandas
        else:
            st.error("Unsupported file format. Please upload a CSV, Excel, or JSON file.")
            return None

        # Initialize DataAnalyzr instance
        analyzr_instance = DataAnalyzr(df=dataframe, api_key=st.secrets["apikey"])
        return analyzr_instance
    else:
        st.error("Please upload a CSV, Excel, or JSON file.")
        return None
Enter fullscreen mode Exit fullscreen mode

The analyzr function checks for available files in the data directory of a Streamlit application and loads the first file found, assuming it to be the desired one. It then determines the file's type (CSV, Excel, or JSON) and loads the data accordingly using appropriate methods. If the file format is unsupported, it displays an error message. Upon successful data loading, it initializes a DataAnalyzr instance for further analysis, returning it to the user. In case no files are found, it prompts the user to upload the required file types. Overall, the function ensures efficient data loading, analysis, and error handling within the Streamlit application.

# Function to display the dataset description
def display_description(analyzr):
    description = analyzr.dataset_description()
    if description is not None:
        st.subheader("Dataset Description:")
        st.write(description)
Enter fullscreen mode Exit fullscreen mode

The display_description function is intended to showcase the description of a dataset analyzed by the provided analyzr instance within a Streamlit application. Here's a succinct summary:

This function generates a dataset description using the dataset_description method of the analyzr instance. If a description is available, it displays it under a subheader titled "Dataset Description" using Streamlit's st.subheader and st.write functions. It provides users with a clear overview of the dataset being analyzed within the application.

# Function to display queries
def display_queries(analyzr):
    queries = analyzr.ai_queries_df()
    if queries is not None:
        st.subheader("These Queries you can run on the data:")
        st.write(queries)
Enter fullscreen mode Exit fullscreen mode

The display_queries function is designed to exhibit a set of queries that can be executed on the dataset using the provided analyzr instance within a Streamlit application. Here's a brief summary:

This function retrieves a DataFrame containing AI-generated queries using the ai_queries_df method of the analyzr instance. If there are queries available, it displays them under a subheader titled "These Queries you can run on the data" using Streamlit's st.subheader and st.write functions. This functionality offers users a list of predefined queries that can be applied to the dataset for further analysis or exploration.

In a world where data comes in myriad formats, the Multifile Analyzer emerges as a game-changer, empowering users to extract insights from CSV, Excel, and JSON files with unparalleled ease and efficiency. By streamlining the analysis process across diverse formats, the Multifile Analyzer heralds a new era of data-driven decision-making, enabling organizations and individuals to make informed choices and drive meaningful outcomes.

Watch the tutorial : https://www.youtube.com/watch?v=aayfOIqwwwk

References
Lyzr Website: Lyzr

Book a Demo: Demo

Lyzr Community Channels: Discord

Slack : Slack

Top comments (0)