We've been working on a new AI+data course to show how you can build a AI chatbot by moving e-commerce data from Stripe into PGVector running on Supabase, via Airbyte PGVector connector to create OpenAI embeddings, using OpenAI client libraries to add natural language support into an app. This is a pretty common "intelligent data stack" app pattern that many of our customers are implementing. The source and destination may change, but pattern (data source > move data and create embeddings > vector-enabled data store > web app with OpenAI) stays the same.
Since we are working on course that is intended for folks to get hands on with, we wanted to make set up as easy as possible. A big part of this was creating sufficient test data in Stripe so there would be a reasonable dataset for the chatbot to interact with. If you've used Stripe before, you know they have a great Sandbox where you can experiment with. The only problem is that it doesn't have sample data pre-loaded.
There are a few sample datasets you can load via the CLI fixtures command. But, for our use, these didn't fit the need. We wanted a larger data set, and since this material will be used online and in workshops, asking learners to install something, like the CLI, on their local machines opens you up to a whole bunch of complexity dealing. You never know what OS version the user is running, whether they have the correct permissions to install things, and so much more. I've been burnt too many times to go down that road.
Thankfully, Stripe also has fantastic APIs, and a great Python client which means we could quickly create a collab notebook for learners to run and insert the data we wanted.
After installing the stripe library via !pip install stripe
and passing in a test key using Google Collab secrets, we had to set up some random names for customers and products. The goal was to insert a random collection customers, products with different prices, and purchases. This way when we ask the chatbot questions like "who made the cheapest purchase? How much did they pay, and what did they buy?" there was sufficient data.
import stripe
import random
from google.colab import userdata
stripe.api_key = userdata.get('STRIPE_TEST_KEY')
# Sample data for generating random names
first_names = ["Alice", "Bob", "Charlie", "Diana", "Eve", "Frank", "Grace", "Hank", "Ivy", "Jack", "Quinton", "Akriti", "Justin", "Marcos"]
last_names = ["Smith", "Johnson", "Williams", "Jones", "Brown", "Davis", "Miller", "Wilson", "Moore", "Taylor", "Wall", "Chau", "Keswani", "Marx"]
# Sample clothing product names
clothing_names = [
"T-Shirt", "Jeans", "Jacket", "Sweater", "Hoodie",
"Shorts", "Dress", "Blouse", "Skirt", "Pants",
"Shoes", "Sandals", "Sneakers", "Socks", "Hat",
"Scarf", "Gloves", "Coat", "Belt", "Tie",
"Tank Top", "Cardigan", "Overalls", "Tracksuit", "Polo Shirt",
"Cargo Pants", "Capris", "Dungarees", "Boots", "Cufflinks",
"Raincoat", "Peacoat", "Blazer", "Slippers", "Underwear",
"Leggings", "Windbreaker", "Tracksuit Bottoms", "Beanie", "Bikini"
]
# List of random colors
colors = [
"Red", "Blue", "Green", "Yellow", "Black", "White", "Gray",
"Pink", "Purple", "Orange", "Brown", "Teal", "Navy", "Maroon",
"Gold", "Silver", "Beige", "Lavender", "Turquoise", "Coral"
]
Next, it was time to add functions for each of the data types in Stripe that we needed.
# Function to create sample customers with random names
def create_customers(count=5):
customers = []
for _ in range(count):
first_name = random.choice(first_names)
last_name = random.choice(last_names)
name = f"{first_name} {last_name}"
email = f"{first_name.lower()}.{last_name.lower()}@example.com"
customer = stripe.Customer.create(
name=name,
email=email,
description="Sample customer for testing"
)
customers.append(customer)
print(f"Created Customer: {customer['name']} (ID: {customer['id']})")
return customers
# Function to create sample products with random clothing names and colors
def create_products(count=3):
products = []
for _ in range(count):
color = random.choice(colors)
product_name = random.choice(clothing_names)
full_name = f"{color} {product_name}"
product = stripe.Product.create(
name=full_name,
description=f"This is a {color.lower()} {product_name.lower()}"
)
products.append(product)
print(f"Created Product: {product['name']} (ID: {product['id']})")
return products
# Function to create prices for the products with random unit_amount
def create_prices(products, min_price=500, max_price=5000):
prices = []
for product in products:
unit_amount = random.randint(min_price, max_price) # Random amount in cents
price = stripe.Price.create(
unit_amount=unit_amount,
currency="usd",
product=product['id']
)
prices.append(price)
print(f"Created Price: ${unit_amount / 100:.2f} for Product {product['name']} (ID: {price['id']})")
return prices
# Function to create random purchases for each customer
def create_purchases(customers, prices, max_purchases_per_customer=5):
purchases = []
for customer in customers:
num_purchases = random.randint(1, max_purchases_per_customer) # Random number of purchases per customer
for _ in range(num_purchases):
price = random.choice(prices) # Randomly select a product's price
purchase = stripe.PaymentIntent.create(
amount=price['unit_amount'], # Amount in cents
currency=price['currency'],
customer=customer['id'],
payment_method_types=["card"], # Simulate card payment
description=f"Purchase of {price['product']} by {customer['name']}"
)
purchases.append(purchase)
print(f"Created Purchase for Customer {customer['name']} (Amount: ${price['unit_amount'] / 100:.2f})")
return purchases
All that was left is to run the script and specify how much data we need.
# Main function to create sample data
def main():
print("Creating sample customers with random names...")
customers = create_customers(count=20)
print("\nCreating sample products with random clothing names and colors...")
products = create_products(count=30)
print("\nCreating prices for products with random amounts...")
prices = create_prices(products, min_price=500, max_price=5000)
print("\nCreating random purchases for each customer...")
purchases = create_purchases(customers, prices, max_purchases_per_customer=10)
print("\nSample data creation complete!")
print(f"Created {len(customers)} customers, {len(products)} products, and {len(purchases)} purchases.")
if __name__ == "__main__":
main()
With the data loaded into our Stripe Sandbox, hooking it up to Airbyte only took a few minutes by using the Connector Builder to map the API endpoints to streams for each data type and setting up a sync job.
Problem solved! Our Collab Python script is super easy for the learner to insert test data into Stripe. Hope it's helpful for someone else doing similar testing.
Top comments (0)