Price is the single most important factor in e-commerce purchase decisions. Studies consistently show that 80%+ of online shoppers compare prices across retailers before buying. If you're a seller, a deal hunter, or a business analyst, having real-time price data from Amazon, Walmart, and Target gives you a significant edge.
In this guide, I'll walk you through building a complete multi-retailer price monitoring system — from data collection through normalization, alerting, and visualization. We'll use Apify actors for reliable scraping at scale and Node.js for the processing pipeline.
Why Build Your Own Price Monitor?
Commercial price tracking tools like Prisync, Competera, or Price2Spy charge $100-500+/month and lock you into their dashboards. Building your own gives you:
- Full data ownership — export, query, and visualize however you want
- Custom alerting logic — not just "price dropped" but complex conditions like "price dropped below competitor by more than 5% on a product with 4+ star rating"
- Historical data — build years of pricing history for analysis
- Scalability — monitor 100 or 100,000 products at the same marginal cost
- Integration — pipe data into your existing systems (ERP, POS, BI tools)
System Architecture
Here's the full architecture of what we're building:
┌────────────────────────────────────────────────────┐
│ Data Collection Layer │
├──────────────┬───────────────┬─────────────────────┤
│ Amazon │ Walmart │ Target │
│ Scraper │ Scraper │ Scraper │
│ (Apify) │ (Apify) │ (Apify) │
└──────┬───────┴───────┬───────┴─────────┬───────────┘
│ │ │
└───────────────┼─────────────────┘
│
┌────────▼─────────┐
│ Normalization │
│ & Matching │
│ Engine │
└────────┬─────────┘
│
┌────────▼─────────┐
│ PostgreSQL │
│ (Price History) │
└────────┬─────────┘
│
┌────────────┼────────────┐
│ │ │
┌─────▼────┐ ┌────▼─────┐ ┌───▼──────┐
│ Alert │ │ REST API │ │Dashboard │
│ System │ │ │ │ (Web) │
└──────────┘ └──────────┘ └──────────┘
Step 1: Data Collection with Apify Actors
The hardest part of price monitoring is reliable data collection. Amazon, Walmart, and Target all have sophisticated anti-bot systems. Rather than building and maintaining scrapers ourselves, we'll use proven Apify actors that handle proxy rotation, CAPTCHA solving, and page rendering.
Amazon Product Scraping
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({
token: process.env.APIFY_TOKEN,
});
async function scrapeAmazonProducts(asins) {
const run = await client.actor('junglee/amazon-crawler').call({
asins: asins,
country: 'US',
maxItems: asins.length,
proxy: {
useApifyProxy: true,
apifyProxyGroups: ['RESIDENTIAL'],
},
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
return items.map(item => ({
retailer: 'amazon',
asin: item.asin,
title: item.title,
price: parseFloat(item.price?.replace(/[^0-9.]/g, '')) || null,
originalPrice: parseFloat(item.listPrice?.replace(/[^0-9.]/g, '')) || null,
currency: 'USD',
rating: item.stars,
reviewCount: item.reviewsCount,
inStock: item.inStock,
seller: item.seller,
url: item.url,
imageUrl: item.thumbnailImage,
scrapedAt: new Date().toISOString(),
}));
}
// Example: Monitor specific products by ASIN
const amazonData = await scrapeAmazonProducts([
'B0BSHF7WHW', // MacBook Air M2
'B0CHX3QBCH', // iPad Air
'B0D1XD1ZV3', // AirPods Pro
]);
Walmart Product Scraping
async function scrapeWalmartProducts(urls) {
const run = await client.actor('epctex/walmart-scraper').call({
startUrls: urls.map(url => ({ url })),
maxItems: urls.length,
proxy: {
useApifyProxy: true,
},
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
return items.map(item => ({
retailer: 'walmart',
productId: item.id || item.usItemId,
title: item.name || item.title,
price: parseFloat(item.price || item.currentPrice) || null,
originalPrice: parseFloat(item.wasPrice || item.listPrice) || null,
currency: 'USD',
rating: item.rating?.average || item.averageRating,
reviewCount: item.rating?.count || item.numberOfReviews,
inStock: item.availabilityStatus === 'IN_STOCK',
seller: item.sellerName || 'Walmart',
url: item.url || item.productUrl,
imageUrl: item.imageUrl || item.thumbnailUrl,
scrapedAt: new Date().toISOString(),
}));
}
Target Product Scraping
async function scrapeTargetProducts(urls) {
const run = await client.actor('epctex/target-scraper').call({
startUrls: urls.map(url => ({ url })),
maxItems: urls.length,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
return items.map(item => ({
retailer: 'target',
tcin: item.tcin || item.id,
title: item.title || item.name,
price: parseFloat(item.price?.current || item.currentPrice) || null,
originalPrice: parseFloat(item.price?.regular || item.regularPrice) || null,
currency: 'USD',
rating: item.rating,
reviewCount: item.reviewCount,
inStock: item.availability !== 'OUT_OF_STOCK',
url: item.url,
imageUrl: item.image,
scrapedAt: new Date().toISOString(),
}));
}
Step 2: Product Matching Across Retailers
The biggest challenge in multi-retailer comparison is matching the same product across different stores. A "Samsung Galaxy S24 128GB" on Amazon might be listed as "Samsung Galaxy S24 Unlocked 128GB Smartphone" on Walmart.
UPC/EAN Matching (Most Reliable)
When you have UPC codes, matching is trivial:
class ProductMatcher {
constructor(db) {
this.db = db;
}
// Primary matching strategy: UPC/EAN codes
async matchByUPC(products) {
const matched = new Map();
for (const product of products) {
if (product.upc) {
const existing = matched.get(product.upc);
if (existing) {
existing.retailers[product.retailer] = product;
} else {
matched.set(product.upc, {
upc: product.upc,
canonicalTitle: product.title,
retailers: { [product.retailer]: product },
});
}
}
}
return Array.from(matched.values());
}
// Fallback: fuzzy title matching
async matchByTitle(products) {
const groups = [];
for (const product of products) {
let bestMatch = null;
let bestScore = 0;
for (const group of groups) {
const score = this.titleSimilarity(
product.title,
group.canonicalTitle
);
if (score > bestScore && score > 0.75) {
bestScore = score;
bestMatch = group;
}
}
if (bestMatch) {
bestMatch.retailers[product.retailer] = product;
} else {
groups.push({
canonicalTitle: product.title,
retailers: { [product.retailer]: product },
});
}
}
return groups.filter(g => Object.keys(g.retailers).length > 1);
}
titleSimilarity(a, b) {
// Normalize titles
const normalize = (s) => s.toLowerCase()
.replace(/[^a-z0-9\s]/g, '')
.replace(/\s+/g, ' ')
.trim();
const tokensA = new Set(normalize(a).split(' '));
const tokensB = new Set(normalize(b).split(' '));
const intersection = new Set([...tokensA].filter(x => tokensB.has(x)));
const union = new Set([...tokensA, ...tokensB]);
return intersection.size / union.size; // Jaccard similarity
}
}
Manual Product Mapping Configuration
For high-value products, create explicit mappings:
const productCatalog = [
{
name: 'MacBook Air M2 13-inch 256GB',
category: 'laptops',
mappings: {
amazon: { asin: 'B0BSHF7WHW' },
walmart: { url: 'https://www.walmart.com/ip/MacBook-Air-M2/1234567' },
target: { tcin: '87654321' },
},
},
{
name: 'AirPods Pro 2nd Gen',
category: 'audio',
mappings: {
amazon: { asin: 'B0D1XD1ZV3' },
walmart: { url: 'https://www.walmart.com/ip/AirPods-Pro/9876543' },
target: { tcin: '12345678' },
},
},
// ... hundreds more products
];
Step 3: Price Storage and History
We need a database schema that supports historical tracking, fast queries, and analytics:
-- Core tables for price monitoring
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(500) NOT NULL,
category VARCHAR(100),
upc VARCHAR(20),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE product_listings (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products(id),
retailer VARCHAR(50) NOT NULL,
retailer_product_id VARCHAR(200),
url TEXT,
UNIQUE(product_id, retailer)
);
CREATE TABLE price_snapshots (
id SERIAL PRIMARY KEY,
listing_id INTEGER REFERENCES product_listings(id),
price DECIMAL(10, 2),
original_price DECIMAL(10, 2),
in_stock BOOLEAN DEFAULT TRUE,
seller VARCHAR(200),
scraped_at TIMESTAMP DEFAULT NOW()
);
-- Index for fast time-series queries
CREATE INDEX idx_snapshots_listing_time
ON price_snapshots(listing_id, scraped_at DESC);
-- Index for finding price drops
CREATE INDEX idx_snapshots_price
ON price_snapshots(listing_id, price, scraped_at DESC);
-- Materialized view for current prices
CREATE MATERIALIZED VIEW current_prices AS
SELECT DISTINCT ON (pl.id)
p.id AS product_id,
p.name AS product_name,
p.category,
pl.retailer,
ps.price,
ps.original_price,
ps.in_stock,
ps.scraped_at
FROM products p
JOIN product_listings pl ON p.id = pl.product_id
JOIN price_snapshots ps ON pl.id = ps.listing_id
ORDER BY pl.id, ps.scraped_at DESC;
Recording Prices
import pg from 'pg';
const pool = new pg.Pool({
connectionString: process.env.DATABASE_URL,
});
async function recordPrices(matchedProducts) {
const client = await pool.connect();
try {
await client.query('BEGIN');
for (const product of matchedProducts) {
// Ensure product exists
const { rows: [dbProduct] } = await client.query(
`INSERT INTO products (name, category, upc)
VALUES ($1, $2, $3)
ON CONFLICT (upc) DO UPDATE SET name = EXCLUDED.name
RETURNING id`,
[product.canonicalTitle, product.category, product.upc]
);
// Record each retailer's price
for (const [retailer, data] of Object.entries(product.retailers)) {
const { rows: [listing] } = await client.query(
`INSERT INTO product_listings (product_id, retailer, retailer_product_id, url)
VALUES ($1, $2, $3, $4)
ON CONFLICT (product_id, retailer)
DO UPDATE SET url = EXCLUDED.url
RETURNING id`,
[dbProduct.id, retailer, data.asin || data.productId || data.tcin, data.url]
);
await client.query(
`INSERT INTO price_snapshots (listing_id, price, original_price, in_stock, seller)
VALUES ($1, $2, $3, $4, $5)`,
[listing.id, data.price, data.originalPrice, data.inStock, data.seller]
);
}
}
await client.query('COMMIT');
// Refresh the materialized view
await client.query('REFRESH MATERIALIZED VIEW CONCURRENTLY current_prices');
} catch (err) {
await client.query('ROLLBACK');
throw err;
} finally {
client.release();
}
}
Step 4: Alert System
The alert system is what makes a price monitor truly useful. We want to support complex conditions, not just simple thresholds:
class PriceAlertEngine {
constructor(db) {
this.db = db;
this.alertHandlers = [];
}
onAlert(handler) {
this.alertHandlers.push(handler);
}
async checkAlerts(rules) {
const alerts = [];
for (const rule of rules) {
const triggered = await this.evaluateRule(rule);
if (triggered) {
alerts.push(triggered);
}
}
// Notify all handlers
for (const alert of alerts) {
for (const handler of this.alertHandlers) {
await handler(alert);
}
}
return alerts;
}
async evaluateRule(rule) {
switch (rule.type) {
case 'PRICE_DROP':
return this.checkPriceDrop(rule);
case 'PRICE_BELOW':
return this.checkPriceBelow(rule);
case 'CHEAPEST_RETAILER':
return this.checkCheapestRetailer(rule);
case 'PRICE_HISTORY_LOW':
return this.checkHistoryLow(rule);
case 'COMPETITOR_UNDERCUT':
return this.checkCompetitorUndercut(rule);
default:
console.warn(`Unknown rule type: ${rule.type}`);
return null;
}
}
async checkPriceDrop(rule) {
const { rows } = await this.db.query(`
SELECT
ps1.price AS current_price,
ps2.price AS previous_price,
p.name,
pl.retailer,
((ps2.price - ps1.price) / ps2.price * 100) AS drop_pct
FROM price_snapshots ps1
JOIN price_snapshots ps2 ON ps1.listing_id = ps2.listing_id
JOIN product_listings pl ON ps1.listing_id = pl.id
JOIN products p ON pl.product_id = p.id
WHERE pl.product_id = $1
AND ps1.scraped_at = (
SELECT MAX(scraped_at) FROM price_snapshots
WHERE listing_id = ps1.listing_id
)
AND ps2.scraped_at = (
SELECT MAX(scraped_at) FROM price_snapshots
WHERE listing_id = ps2.listing_id
AND scraped_at < ps1.scraped_at
)
AND ((ps2.price - ps1.price) / ps2.price * 100) >= $2
`, [rule.productId, rule.minDropPercent]);
if (rows.length > 0) {
return {
type: 'PRICE_DROP',
product: rows[0].name,
details: rows.map(r => ({
retailer: r.retailer,
currentPrice: r.current_price,
previousPrice: r.previous_price,
dropPercent: parseFloat(r.drop_pct).toFixed(1),
})),
};
}
return null;
}
async checkHistoryLow(rule) {
const { rows } = await this.db.query(`
WITH current AS (
SELECT DISTINCT ON (pl.id)
pl.id AS listing_id, pl.retailer, ps.price
FROM product_listings pl
JOIN price_snapshots ps ON pl.id = ps.listing_id
WHERE pl.product_id = $1
ORDER BY pl.id, ps.scraped_at DESC
),
historical_min AS (
SELECT
ps.listing_id,
MIN(ps.price) AS min_price
FROM price_snapshots ps
JOIN product_listings pl ON ps.listing_id = pl.id
WHERE pl.product_id = $1
AND ps.scraped_at < NOW() - INTERVAL '7 days'
GROUP BY ps.listing_id
)
SELECT
c.retailer,
c.price AS current_price,
h.min_price AS historical_min,
p.name
FROM current c
JOIN historical_min h ON c.listing_id = h.listing_id
JOIN product_listings pl ON c.listing_id = pl.id
JOIN products p ON pl.product_id = p.id
WHERE c.price <= h.min_price
`, [rule.productId]);
if (rows.length > 0) {
return {
type: 'ALL_TIME_LOW',
product: rows[0].name,
details: rows.map(r => ({
retailer: r.retailer,
currentPrice: r.current_price,
previousLow: r.historical_min,
})),
};
}
return null;
}
}
Notification Channels
// Slack notification
async function notifySlack(webhookUrl, alert) {
const blocks = [
{
type: 'header',
text: {
type: 'plain_text',
text: `Price Alert: ${alert.type}`,
},
},
{
type: 'section',
text: {
type: 'mrkdwn',
text: `*${alert.product}*`,
},
},
];
for (const detail of alert.details) {
blocks.push({
type: 'section',
fields: [
{ type: 'mrkdwn', text: `*Retailer:* ${detail.retailer}` },
{ type: 'mrkdwn', text: `*Price:* $${detail.currentPrice}` },
{ type: 'mrkdwn', text: `*Previous:* $${detail.previousPrice || detail.previousLow}` },
{ type: 'mrkdwn', text: `*Drop:* ${detail.dropPercent || 'N/A'}%` },
],
});
}
await fetch(webhookUrl, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ blocks }),
});
}
// Email notification via SendGrid
async function notifyEmail(to, alert) {
const sgMail = require('@sendgrid/mail');
sgMail.setApiKey(process.env.SENDGRID_API_KEY);
await sgMail.send({
to,
from: 'alerts@pricebot.example.com',
subject: `Price Alert: ${alert.product} - ${alert.type}`,
html: renderAlertEmail(alert),
});
}
Step 5: REST API for Querying Price Data
Expose the data through a clean API so dashboards and external tools can consume it:
import express from 'express';
const app = express();
// Get current prices for a product across retailers
app.get('/api/products/:id/prices', async (req, res) => {
const { rows } = await pool.query(`
SELECT retailer, price, original_price, in_stock, scraped_at
FROM current_prices
WHERE product_id = $1
ORDER BY price ASC
`, [req.params.id]);
res.json({
productId: req.params.id,
prices: rows,
cheapest: rows[0] || null,
priceRange: rows.length > 0 ? {
min: Math.min(...rows.map(r => r.price)),
max: Math.max(...rows.map(r => r.price)),
spread: Math.max(...rows.map(r => r.price)) - Math.min(...rows.map(r => r.price)),
} : null,
});
});
// Get price history for a product at a specific retailer
app.get('/api/products/:id/history', async (req, res) => {
const { retailer, days = 30 } = req.query;
const { rows } = await pool.query(`
SELECT ps.price, ps.in_stock, ps.scraped_at, pl.retailer
FROM price_snapshots ps
JOIN product_listings pl ON ps.listing_id = pl.id
WHERE pl.product_id = $1
AND ($2::text IS NULL OR pl.retailer = $2)
AND ps.scraped_at > NOW() - INTERVAL '1 day' * $3
ORDER BY ps.scraped_at ASC
`, [req.params.id, retailer || null, days]);
res.json({
productId: req.params.id,
period: `${days} days`,
dataPoints: rows,
});
});
// Get products with biggest current price drops
app.get('/api/deals', async (req, res) => {
const { minDrop = 10, category, limit = 20 } = req.query;
const { rows } = await pool.query(`
WITH latest AS (
SELECT DISTINCT ON (listing_id)
listing_id, price, scraped_at
FROM price_snapshots
ORDER BY listing_id, scraped_at DESC
),
previous AS (
SELECT DISTINCT ON (ps.listing_id)
ps.listing_id, ps.price
FROM price_snapshots ps
JOIN latest l ON ps.listing_id = l.listing_id
WHERE ps.scraped_at < l.scraped_at
ORDER BY ps.listing_id, ps.scraped_at DESC
)
SELECT
p.id, p.name, p.category,
pl.retailer,
l.price AS current_price,
prev.price AS previous_price,
ROUND((prev.price - l.price) / prev.price * 100, 1) AS drop_pct
FROM latest l
JOIN previous prev ON l.listing_id = prev.listing_id
JOIN product_listings pl ON l.listing_id = pl.id
JOIN products p ON pl.product_id = p.id
WHERE (prev.price - l.price) / prev.price * 100 >= $1
AND ($2::text IS NULL OR p.category = $2)
ORDER BY drop_pct DESC
LIMIT $3
`, [minDrop, category || null, limit]);
res.json({ deals: rows });
});
app.listen(3000, () => console.log('Price API running on port 3000'));
Step 6: Dashboard Visualization Ideas
While a full dashboard implementation is beyond this article's scope, here are the key views you should build:
Price Comparison View
A simple table showing the same product across all three retailers with current price, stock status, and a "best deal" indicator. Use color coding — green for the cheapest option, red for the most expensive.
Historical Price Chart
A line chart with time on the x-axis and price on the y-axis. One line per retailer, color-coded. Mark significant events (Prime Day, Black Friday, new product launches) with vertical annotations.
Deal Feed
A real-time feed showing the latest price drops sorted by magnitude. Each card shows the product, the price drop percentage, the old and new prices, and a direct link to buy. Think of it as a personalized CamelCamelCamel.
Category Analytics
Aggregate views showing average prices by category, which retailer is cheapest most often, price volatility by product type, and seasonal trends.
Scheduling the Full Pipeline
Here's how to orchestrate everything using an Apify actor on a schedule:
import { Actor } from 'apify';
await Actor.init();
const input = await Actor.getInput();
// 1. Scrape all three retailers in parallel
const [amazonData, walmartData, targetData] = await Promise.all([
scrapeAmazonProducts(input.amazonAsins),
scrapeWalmartProducts(input.walmartUrls),
scrapeTargetProducts(input.targetUrls),
]);
console.log(`Collected: Amazon=${amazonData.length}, Walmart=${walmartData.length}, Target=${targetData.length}`);
// 2. Match products across retailers
const matcher = new ProductMatcher(pool);
const allProducts = [...amazonData, ...walmartData, ...targetData];
const matched = await matcher.matchByTitle(allProducts);
console.log(`Matched ${matched.length} products across retailers`);
// 3. Record prices to database
await recordPrices(matched);
// 4. Check alert rules
const alertEngine = new PriceAlertEngine(pool);
alertEngine.onAlert(async (alert) => {
await notifySlack(input.slackWebhook, alert);
if (alert.type === 'ALL_TIME_LOW') {
await notifyEmail(input.alertEmail, alert);
}
});
const rules = await loadAlertRules(pool);
const triggeredAlerts = await alertEngine.checkAlerts(rules);
console.log(`Triggered ${triggeredAlerts.length} alerts`);
// 5. Save results
await Actor.pushData({
scrapeDate: new Date().toISOString(),
productsScraped: allProducts.length,
productsMatched: matched.length,
alertsTriggered: triggeredAlerts.length,
});
await Actor.exit();
Schedule this to run every 4-6 hours on Apify for near real-time monitoring at reasonable cost.
Cost Breakdown
Running this system on Apify is much cheaper than commercial alternatives:
| Component | Monthly Cost |
|---|---|
| Amazon scraper (500 products, 6x/day) | ~$30-50 |
| Walmart scraper (500 products, 6x/day) | ~$20-40 |
| Target scraper (500 products, 6x/day) | ~$20-35 |
| Apify platform (compute) | ~$10-20 |
| PostgreSQL (managed, small) | ~$15-25 |
| Total | ~$95-170/month |
Compare to Prisync ($99-399/month for fewer features) or building a custom scraping infrastructure from scratch ($500+/month in server costs plus engineering time).
Production Tips
Rate limiting: Don't scrape too aggressively. 4-6 times per day is sufficient for most use cases. For competitive repricing, you might need hourly updates on specific products.
Error handling: Scrapers will occasionally fail. Build retry logic and track scrape success rates. If a particular product consistently fails, investigate whether the listing changed.
Data validation: Always sanity-check scraped prices. If a $999 laptop suddenly shows as $9.99, that's likely a scraping error, not a deal. Set bounds based on historical prices.
Proxy rotation: Apify handles this, but if you're building custom scrapers, residential proxies are essential for Amazon and Walmart.
Legal considerations: Price scraping for personal use and competitive analysis is generally acceptable. Republishing scraped prices at scale or using them to undermine retailers' terms of service may raise legal issues. Consult with legal counsel if building a commercial product.
Conclusion
Building a multi-retailer price monitor is a highly practical project that delivers immediate value. Whether you're tracking prices for personal shopping, competitive intelligence, or building a price comparison service, the architecture outlined here scales from monitoring 10 products to 100,000.
The combination of Apify actors for reliable data collection, PostgreSQL for historical storage, and a custom alert engine gives you capabilities that rival commercial tools at a fraction of the cost — with full control over your data and logic.
Start with a small product catalog, prove the system works, then scale up. The hardest part isn't the code — it's curating the right product list and tuning your alert thresholds to minimize noise while catching genuine deals.
Ready to start? Browse Apify's marketplace for Amazon, Walmart, and Target scrapers, or build custom actors with the Apify SDK.
Top comments (0)