DEV Community

Cover image for A Beginner's Guide to Unstructured Data Analysis with LangChain and DeepInfra
Mike Young
Mike Young

Posted on • Originally published at notes.aimodels.fyi

A Beginner's Guide to Unstructured Data Analysis with LangChain and DeepInfra

Hey there, startup founders and developers! In today's digital age, making decisions based solely on intuition is no longer enough for businesses to thrive. The key to success lies in data-driven insights, which makes the process of data analysis and interpretation crucial for strategic decision-making.

That's where LangChain comes into the picture—a powerful framework that's data-aware and agentic. When combined with DeepInfra's robust API, LangChain becomes an incredibly potent tool for extracting insights from both structured and unstructured data, helping businesses chart their path to growth.

In this post, I'll guide you through using LangChain and DeepInfra for unstructured data analysis. We'll explore their capabilities, understand the importance of data-driven decisions, and learn how to extract valuable insights from structured and unstructured data. Get ready to uncover hidden patterns and make informed choices using these powerful tools. Let's dive in!

What is DeepInfra?

DeepInfra is a powerful machine learning platform that offers fast and scalable inference for top AI models. With its simple API, you can easily run AI models and pay only for what you use. It provides a low-cost, production-ready infrastructure that allows you to turn models into scalable APIs with just a few clicks. DeepInfra is designed to be a self-serve platform, making it easy for developers to deploy their machine learning models and benefit from its efficient and cost-effective infrastructure.

Understanding the Magic of LangChain for Data Analysis

The true power of LangChain lies in its ability to unlock valuable insights from both structured and unstructured data. Now, structured data is already organized in a way that machines can easily understand. However, unstructured data, like social media posts, text documents, and customer reviews, is a bit trickier to handle because it lacks inherent organization. Yet, this type of data often holds a goldmine of untapped insights just waiting to be discovered and used for strategic decision-making.

Let's take an example of a collection of customer reviews, overflowing with unstructured yet vital data. LangChain, equipped with advanced Natural Language Processing (NLP) techniques, can sift through this data, perform sentiment analysis, and provide invaluable insights into customer attitudes towards a product or service. Similarly, by analyzing social media posts, LangChain can identify emerging trends, helping businesses align their strategies with current market dynamics.

But LangChain isn't limited to just unstructured data. It's equally effective in analyzing structured data as well. For instance, it can be used to analyze sales data and uncover trends over time, identify top-selling products, or identify patterns in customer buying behavior. However, in this guide, we'll focus primarily on unstructured data and how LangChain, with the help of the FLAN-T5 model, handles it.

Using the FLAN-T5 Model to Analyze the Data

The FLAN-T5 model is a language model that has been fine-tuned on a diverse array of over a thousand tasks, and it has proven its excellence by achieving remarkable performance across various benchmarks. In fact, it surpasses even larger models in its ability to learn from limited data, which is a testament to the incredible ingenuity of the Google team that created it.

What's more, the FLAN-T5 model isn't just efficient—it's also impressively versatile in terms of language support. It can effortlessly handle a wide range of languages, from commonly spoken ones like English, Spanish, French, and German to lesser-known languages such as Yoruba, Kurdish, and Zhuang. However, it's important to exercise caution when using FLAN-T5, or any AI model for that matter, as it does have its limitations, which you can read about here.

Step-by-Step Guide: Using LangChain for Data Analysis with DeepInfra

Now that we have a good understanding of LangChain and the FLAN-T5 model, let's dive into how we can leverage them for data analysis by using DeepInfra. The following is a step-by-step guide to analyze an example file with unstructured data, in this case, the State of the Union address. You can find the file we’ll be evaluating here.

Setting up your environment

To get started, you need to import the necessary libraries and set up your DeepInfra API token. Replace 'YOURTOKEN' with your actual DeepInfra API token. Here's the code:

from langchain import ConversationChain, LLMChain, PromptTemplate
from langchain.llms import DeepInfra
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains.question_answering import load_qa_chain
from getpass import getpass
import os
DEEPINFRA_API_TOKEN = getpass()
os.environ["DEEPINFRA_API_TOKEN"] = "YOURTOKEN"
Enter fullscreen mode Exit fullscreen mode

Create the DeepInfra instance

For this demonstration, we'll be using the 'google/flan-t5-xl' model. Here's the code you need - so short!

llm = DeepInfra(model_id="google/flan-t5-xl")
Enter fullscreen mode Exit fullscreen mode

Load your documents

You can load your unstructured data text files into LangChain. In this example, we're using a file named 'state_of_the_union.txt'. Here's the code:

loader = TextLoader('./state_of_the_union.txt')
docs = loader.load()
Enter fullscreen mode Exit fullscreen mode

Query your data

Now, you can perform queries on the loaded documents. For instance, if you want to find mentions of 'freedom' in the 'state_of_the_union.txt' file, you would use the following code:

query = "What did the president say about freedom?"
Enter fullscreen mode Exit fullscreen mode

Run the question answering chain

Finally, run the question answering chain using the loaded documents and your query. Here's the code:

chain = load_qa_chain(llm)
output = chain.run(input_documents=docs, question=query)
print(output)
Enter fullscreen mode Exit fullscreen mode

What output do you get? Here’s what I got:

freedom will always triumph over tyranny
Enter fullscreen mode Exit fullscreen mode

Resources and Examples

To dive deeper into data analysis using LangChain and DeepInfra, here are some resources worth exploring:

  1. Langchain’s guide to question-answering over docs

  2. The QA conceptual guide

  3. Introduction to LangChain Use Cases with DeepInfra

  4. Question Answering and Document Analysis with LangChain and DeepInfra

  5. Building a Customer Support Chatbot with LangChain and DeepInfra: A Step-by-Step Guide

Conclusion

In conclusion, LangChain and DeepInfra provide startups with powerful tools for data analysis. By leveraging LangChain's data-aware and agentic framework along with DeepInfra's scalable infrastructure, businesses can extract valuable insights from structured and unstructured data to drive informed decision-making.

Embrace the power of LangChain and DeepInfra to extract insights from data. Have fun!

Top comments (0)