Jag Thind for Super Payments

Posted on Sep 9

How We Use OpenAI and Gemini Batch APIs to Qualify Thousands of Sales Leads

#openai #gemini #ai #dataengineering

The following blog details how the Data team used AI to solve a specific problem for our Marketing and Sales teams - Qualify 3000 websites (Salesforce Accounts) to determine if they are ecommerce and can take payments.

It is broken down into:

Problem at hand and what are we trying to solve?
Process design
Why use LLMs from 2 AI providers
Prompt engineering and using prompt templates
Scaling up using OpenAI batch API and google batch predictions

TL;DR

We implemented a batch data enrichment pipeline that uses OpenAI and Gemini Large Large Models (LLMs) via the OpenAI Batch API and Google Batch Predictions for a cost effective way to enrich data using the power of LLMs.

To ensure maximum accuracy and minimise the effects of hallucinations from the LLMs, we use a simple consensus system: each website is checked by both AIs 3 times each, and only results where they agree are accepted. Yes, this makes it more expensive, but we optimised for time to value and getting good leads into the hands of the Sales team.

We used a prompt template configured to use the web search tool to ground the LLM with real-time information about the website, overcoming the model's static knowledge cutoff date.

We trained the Marketing team in writing effective prompts for the LLMs before we scaled up using the batch mode.

A great example of tech and the business working together to achieve a shared outcome and spreading the use of AI in the business.

Problem at Hand

The Marketing team periodically builds lists of potential merchants that can integrate Super as a payment method on their website checkout. These leads are then provided to Account Executives (AEs) to sign up.

When assigned a website the first thing AEs do is manually double check is the website ecommerce:

Can you buy products on the website?
Is there a checkout on the website?
Does it accept card payments?

⚠️ Many websites were not ecommerce ⚠️ resulting in:

AEs wasting time doing manual checking
Many leads getting dis-qualified at the top of the sales funnel
AEs getting frustrated with leads they were assigned
AEs resorting to self-sourcing leads and taking them away from their core responsibilities of closing deals

What are we trying to solve?

Questions we asked ourselves:

Can we increase the number of leads at the top of the sales funnel?
Can we automate the is ecommerce check instead of manually qualifying each website?
Can we scale this check across N (hundreds/thousands) websites?

Process Design

Before we dive into the details of Prompt Engineering and how the Batch pipeline works. The below illustrates the process and its 2 parts.

Why use LLMs from 2 AI Providers?

Even though it was more costly to do so, we needed to be confident in the accuracy of what we were telling the AEs in the Sales team. Instead of relying on a single AI, we used LLMs from two different AI providers, then based our final decision on their consensus.

Think of it like getting a second opinion from a trusted expert. If two independent specialists examine the same data and come to the same conclusion, your confidence in that outcome increases dramatically.

Some benefits include:

Accuracy Through Consensus: The core of our strategy is built on consensus. An ecommerce qualification is only confirmed if both LLMs independently agree. This simple but powerful rule acts as a powerful filter, significantly reducing the risk of a single LLM making a mistake, hallucinating, or misinterpreting a site.
Mitigating Model-Specific Weaknesses: Every LLM has its own unique architecture, training data, and inherent biases. One LLM might be brilliant at identifying traditional retail sites but struggle with subscription services, while the other might have the opposite strengths. Using a single LLM means you also inherit all of its blind spots. By using two, we diversify our "cognitive portfolio," allowing the strengths of one LLM to compensate for the weaknesses of the other, leading to a more balanced and consistently accurate outcome.
Automatic Quality Control: Perhaps the most valuable benefit is what happens when the LLMs disagree. A disagreement is a critical signal. It tells us that a website is ambiguous, an edge case, or complex in a way that could have easily fooled a single AI. Our system automatically flags these disagreements for manual review.

Prompt Engineering

Prompt engineering is the process of writing effective instructions for a LLM, such that it consistently generates content that meets your requirements.

We used the OpenAI developer platform to iteratively develop a reusable prompt template that could be used in the responses API. The platform allows testing different versions of a prompt side-by-side to evaluate changes.

Advantages of using a Prompt Template

You can use variables via {{placeholder}} and your integration code remains the same, e.g.

response = client.responses.create(
    model="gpt-4.1",
    prompt={
        "id": "pmpt_abc123",
        "version": "2",
        "variables": {
            "website_url": "xyz.com"
        }
    }
)

You can also configure the prompt to use the web search tool to allow the LLM to search the web for the latest information before generating a response:

{
    "type": "web_search_preview",
    "user_location": {
        "type": "approximate",
        "country": "GB",
        "search_context_size": "high",
    }
}

Prompt Template

The Marketing team produced a prompt template that had clear instructions for the LLM to check if a single website URL is ecommerce.

Please research the website {{url}} provided by the user. You must only return the data requested in the "InformationRequested" section and in a format according to the "OutputFormat" section. Do not include any explanations, reasoning, or commentary.

## InformationRequested
- url: {{url}}
- is_url_valid: Y/N — Is the URL valid and accessible?
- is_ecommerce: Y/N - You MUST use rules from section "Evaluation Rules for column is_ecommerce"

## OutputFormat
Output as JSON with the following fields. Do not include markdown around the JSON:
- url
- is_url_valid
- is_ecommerce

## Evaluation Rules for column is_ecommerce

*Mark "Y" only if all of the following are true, based on explicit evidence available*:
* rule 1
* rule 2
* etc

*Mark "N" in any of the following cases*:
* rule 1
* rule 2
* etc

## Final Reminder

- You must only return the data requested in the "InformationRequested" section.
- You must only return it in the format according to the "OutputFormat" section.
- You must not include any explanations, reasoning, or commentary.

Scaling it up - OpenAI Batch API

OpenAI has a Batch API that allows you to send asynchronous groups of requests with 50% lower costs, a separate pool of significantly higher rate limits, and a clear 24-hour turnaround time. The workflow is:

The uploaded batch file containing the requests will have one line per website as below,

{"custom_id": "request-[1756480801.159196]-xyz.com", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1", "input": "Run the following prompt", "prompt": {"id": "pmpt_XXX", "version": "2", "variables": {"url": "xyz.com"}}}}
{"custom_id": "request-[1756480802.1434196]-abc.com", "method": "POST", "url": "/v1/responses", "body": {"model": "gpt-4.1", "input": "Run the following prompt", "prompt": {"id": "pmpt_XXX", "version": "2", "variables": {"url": "abc.com"}}}}

The benefits of this are:

Significant Cost Reduction: The 50% discount on pricing is a major advantage for processing thousands of URLs, leading to substantial cost savings compared to using the real-time API.
Increased Throughput: The much higher rate limits allow for processing a large volume of requests in parallel, drastically reducing the overall time it takes to enrich a large dataset.
Asynchronous "Fire-and-Forget" Workflow: You can submit a large batch job and not have to wait for it to complete. This is perfect for non-time-sensitive, offline processing tasks, as you can retrieve the results later without keeping a connection open.
Simplified Client-Side Logic: It removes the need for you to build and maintain complex logic to handle rate limiting, concurrent requests, and retries. You simply prepare and upload a file.
Enhanced Resilience and Error Handling: Since requests are independent, the success or failure of one doesn't impact others. The output file clearly indicates the status of each request, making it easy to identify and retry only the failed ones.
Up to date context: The prompt template is configured to use the web search tool to ground the LLM with real-time information about the website. This search is performed independently for each website.

Scaling it up - Google Batch Predictions

Google Batch Predictions also allows you to generate predictions from Gemini models using a Batch Job, the workflow is:

Similar to OpenAI the batch job file contains one request per line, but you cannot use a prompt template, so each request in the file has the full prompt. Also web search tools in Gemini are not available via Batch Predictions, but we still found the results to be accurate.

Where we Ended Up

We now have a repeatable way to enrich data using the power of LLMs for a large number of websites. We have already started using it to conduct other checks.

The Salesforce Accounts we enriched with is commerce = Y/N was used to create a better qualified list at the top of the sales funnel.

AEs were no longer reporting the website as not ecommerce.

A job well done by AI and Humans!

DEV Community