DEV Community

Cover image for LLM pipeline for marketing research insights
Andrew Panfilov
Andrew Panfilov

Posted on

LLM pipeline for marketing research insights

Originally this article was published here https://www.linkedin.com/pulse/llm-pipeline-marketing-research-insights-andrew-panfilov-gt1bf/

The past year marked a significant evolution in the marketing research landscape. The advent of widely available consumer-grade Large Language Models (LLMs) has transformed the traditional dichotomy between quantitative and qualitative research methods into a more integrated approach. This new paradigm leverages chatbots to interact with participants, eliciting insights about, for example, their brand preferences. This shift has led to a more nuanced type of research, blurring the lines between quantitative and qualitative methodologies.

High-level representation of Conversational AI pipeline for marketing research:
High-level representation of Conversational AI pipeline for marketing research

This innovative approach can be conceptualized as a four-phase process:

  1. Chatbot AI and Survey Integration: This initial phase involves setting up an AI-driven chatbot conversation, sometimes integrated with a traditional survey framework.
    • Transition to Phase 2: Once the conversation flow is thoroughly tested, it's published to become accessible to respondents.
  2. Engagement through Conversations: In this phase, the chatbot engages in conversations with respondents.
    • Transition to Phase 3: The conversation sessions are concluded once the required respondent data is gathered.
  3. Data Processing and LLM Analysis: This critical stage involves cleansing and analyzing the collected data using LLMs.
    • Transition to Phase 4: This phase concludes once the data analysis is complete.
  4. Visual Reporting: The final phase focuses on creating visual reports that effectively communicate the insights derived from the analysis.

In the first stage, researchers develop a finite-state machine to guide the chatbot's conversations. This involves creating a series of prompts, including statements, questions, and follow-up questions used by the LLM during interactions with respondents. In this phase, the researcher employs a method akin to the Read-Eval-Print Loop (REPL). It enables the researcher to observe and assess the chatbot's behaviour in real-time during a conversation, following any modifications made to the dialogue descriptors. This stage is crucial for quality assurance, ensuring that the chatbot's conversations are relevant and error-free. Additionally, the LLM can be utilized for tasks like translating content into various languages.

Respondents engage with the chatbot during the second phase, with each conversation driven by the finite state machine established in the first stage. The dialogue elements are fed into the LLM to generate subsequent statements or questions. A key requirement here is a responsive LLM with low latency (ideally in seconds or less) to prevent long waits for the chatbot's responses. While the OpenAI API may occasionally encounter issues such as 502 and 503 HTTP status codes or connection timeouts, its practical value in production for marketing research remains significant, especially considering that there are no costs associated with failed respondent interactions.

The third stage involves processing the collected chatbot-respondent dialogues with the LLM, following the removal of Personally Identifiable Information (PII). This processing is done in batches to avoid exceeding the LLM's token limits. The LLM's role here is to perform tasks such as categorization, tagging, entity extraction, and sentiment analysis, all essential for deriving meaningful insights from the data. This analysis can be time-consuming, requiring tens of minutes to hours. The system is designed to distribute the dialogues to prevent overloading the LLM API evenly and includes a retry mechanism to handle potential API issues like 5XX HTTP status codes or timeouts. The LLM is also may be employed for validating the results of data aggregation.

In the final stage, the LLM is not utilized. Instead, the aggregated data, now representing valuable marketing insights, is presented in an easily understandable format on a dashboard. This stage focuses on visualizing the insights and offers options to export the data as PDF or PowerPoint reports, facilitating further analysis and presentation.

Components diagram for Phase 1, 3, 4:
Components diagram for Phase 1, 3, 4

Components diagram for Phase 2:
Components diagram for Phase 2

The diagrams illustrate that the LLM is utilized as an opaque entity accessed through an API. In practical applications, especially for marketing research, there is no requirement for fine-tuning the LLM. Currently, gpt-3.5-turbo is the preferred option due to its optimal combination of cost-effectiveness and features. It's important to note that the system represented in these diagrams is a simplified version for the sake of clarity. A real-world implementation would include additional deployable artifacts and specific characteristics. Despite these simplifications, the basic concept remains easily comprehensible.

Top comments (0)