DEV Community: Abdullah Ahmed

Data Normalization in Frontend Development: Simplifying Complex State Introduction

Abdullah Ahmed — Sat, 08 Nov 2025 18:15:01 +0000

When working on modern frontend applications, especially those with dynamic and interconnected data (like posts, comments, and users), managing state can quickly become complicated. APIs often return deeply nested data, and performing updates, deletions, or additions on such structures can feel like navigating a maze.

For instance, imagine an app like Facebook or Reddit. Each post has comments, each comment has users, and each user might appear in multiple comments or posts. If you try to edit or delete a user, you’d have to find and update that user’s data in every place they appear. This approach quickly becomes unmanageable.

That’s where data normalization comes in.

What Is Data Normalization?

Normalization means organizing your data so that each piece of information exists in exactly one place.
Instead of keeping a deeply nested structure, you store data in flat, relational objects, similar to how relational databases work.

For example:

Instead of keeping users nested inside comments, which are nested inside posts,

You separate them into different collections (posts, comments, users),

Then you use IDs to link them together.

This makes it much easier to update, delete, or add new data consistently.

The Problem: Nested Data Example

Let’s say our API returns this data for posts:

const posts = [
  {
    id: 1,
    title: "Understanding React",
    comments: [
      {
        id: 101,
        text: "Great post!",
        user: {
          id: 1001,
          name: "Abdallah Ahmed",
        },
      },
      {
        id: 102,
        text: "Thanks for sharing",
        user: {
          id: 1002,
          name: "Sara Ali",
        },
      },
    ],
  },
];

Now, imagine you need to update the user’s name (for example, Abdallah changes his name).
You’ll have to:

Loop through each post,

Then through each comment,

Then find the user,

And finally update the name.

That’s a lot of unnecessary traversal — and if the same user appears in multiple posts or comments, you’ll need to update every occurrence manually.
This leads to data duplication and inconsistent states.

The Solution: Normalized Data Structure

After normalization, we can represent the same data like this:

const normalizedData = {
  posts: {
    1: { id: 1, title: "Understanding React", comments: [101, 102] },
  },
  comments: {
    101: { id: 101, text: "Great post!", user: 1001 },
    102: { id: 102, text: "Thanks for sharing", user: 1002 },
  },
  users: {
    1001: { id: 1001, name: "Abdallah Ahmed" },
    1002: { id: 1002, name: "Sara Ali" },
  },
};

Now, all entities (posts, comments, users) are stored separately and referenced by IDs.

Example Operations

Updating a User’s Name

Before normalization:

// Update user's name (nested approach)
posts.forEach(post => {
  post.comments.forEach(comment => {
    if (comment.user.id === 1001) {
      comment.user.name = "Abdallah A.";
    }
  });
});

After normalization:

// Simple and efficient
normalizedData.users[1001].name = "Abdallah A.";

Only one update is needed — and all parts of your app that reference this user will automatically get the updated name.

Deleting a Comment

Before normalization:

// Remove comment 101
posts[0].comments = posts[0].comments.filter(c => c.id !== 101);

After normalization:

// Delete from comment list
delete normalizedData.comments[101];

// Remove reference from the post

normalizedData.posts[1].comments = normalizedData.posts[1].comments.filter(
  id => id !== 101
);

Clean, simple, and consistent — no deeply nested loops.

Adding a New Comment

Before normalization:


const newComment = {
  id: 103,
  text: "Very helpful!",
  user: { id: 1003, name: "Ali Mohamed" },
};

posts[0].comments.push(newComment);

After normalization:

normalizedData.users[1003] = { id: 1003, name: "Ali Mohamed" };
normalizedData.comments[103] = { id: 103, text: "Very helpful!", user: 1003 };
normalizedData.posts[1].comments.push(103);

The data remains structured, and each entity is tracked in one place.

Benefits of Normalization

✅ Easier updates – Change data in one place, not everywhere.
✅ Less duplication – Each entity exists only once.
✅ Simplified logic – Easier to add, remove, or merge entities.
✅ Improved performance – Less re-rendering and simpler lookups in UI frameworks like React.
✅ Scalable structure – Works great as your app and data grow in complexity.

🚀 Building an Express.js API for the Amazon Scraper

Abdullah Ahmed — Sat, 08 Nov 2025 14:31:44 +0000

How I Finally Beat Amazon’s Bot Detection (and Built a Powerful Web Scraper That Works!

Once your scraper function (scrapeAmazonProductPage) is ready, the next step is to wrap it inside a simple Express.js API.
This allows you (or any client app) to send a request with a product URL and get structured data in return.

📦 Step 1 — Install Dependencies

If you haven’t already:

npm install express puppeteer cheerio crawler

You should now have these main dependencies:

{ "cheerio": "^1.0.0-rc.12", "crawler": "^1.5.0", "puppeteer": "^16.2.0", "express": "^4.19.2" }

Make sure your Node.js version is 20 or above for optimal Puppeteer compatibility.

🧱 Step 2 — Create Project Structure

Here’s a suggested folder layout:

amazon-scraper/
├── package.json
├── server.js
└── src/
├── scraper/
│ └── amazon.js
└── services/
└── scrapping.js

server.js → entry point for Express

src/scraper/amazon.js → your scraper logic (the code you already have)

src/services/scrapping.js → optional, for error logging (you can mock this for now)

🧠 Step 3 — Example Mock for Error Saver

Create a dummy service in src/services/scrapping.js:

// src/services/scrapping.js
async function saveScrappingErrors(errorObj) {
    console.error("Scraping error:", errorObj);
}
module.exports = { saveScrappingErrors };

🧠 Step 4 — The Scraper (amazon.js)

Use your scraper function exactly as before.
Let’s slightly clean it for API use and export it properly:

// src/scraper/amazon.js
const puppeteer = require('puppeteer');
const cheerio = require('cheerio');
const Crawler = require('crawler');
const { saveScrappingErrors } = require('../services/scrapping');

const crawlPage = (url, browser) => {
    return new Promise((resolve, reject) => {
        const c = new Crawler({
            maxConnections: 100000,
            skipDuplicates: true,
            callback: async (error, res, done) => {
                if (error) return reject(error);
                try {
                    const $ = cheerio.load(res.body);
                    if (!$('#histogramTable').length) return resolve(await crawlPage(url, browser));

                    const reviews = [];
                    const reviewElements = $('.a-section.review[data-hook="review"]');
                    const review_rating = $('[data-hook="average-star-rating"]').text().trim();
                    const review_count = $('[data-hook="total-review-count"]').text().trim().split(' ')[0];
                    const name = $('#productTitle').text().trim();
                    const description = $('#feature-bullets .a-list-item').text().trim();
                    const product_author = $('#bylineInfo').text().trim();

                    const regex = /\b\d+(\.\d+)?\b/;
                    reviewElements.each((_, el) => {
                        const author = $(el).find('.a-profile-name').text().trim();
                        const content = $(el).find('.review-text').text().trim();
                        const title = $(el).find('[data-hook="review-title"]').text().trim();
                        const date = $(el).find('[data-hook="review-date"]').text().trim();
                        let stars = $(el).find('.review-rating span').text().trim();
                        const match = stars.match(regex);
                        stars = match ? parseFloat(match[0]) : '';
                        reviews.push({ author, content, title, date, rating: stars });
                    });

                    const extractStars = () => {
                        const starsPercentageArray = [];
                        $('#histogramTable .a-histogram-row').each((_, el) => {
                            const percentageText = $(el).find('.a-text-right a').text();
                            const percentage = parseInt(percentageText.replace('%', ''), 10);
                            const starsText = $(el).find('a.a-size-base').text();
                            const number_of_stars = parseInt(starsText, 10);
                            starsPercentageArray.push({ percentage: percentage || 0, number_of_stars });
                        });
                        return starsPercentageArray;
                    };

                    const extractMainImage = () => $('#imgTagWrapperId img').attr('src') || '';

                    const core_price = $('#corePriceDisplay_desktop_feature_div .a-section .aok-offscreen').text().trim();
                    const currencyPattern = /\$\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?/;
                    const match = core_price.match(currencyPattern);
                    const extractedCurrency = match ? match[0] : "";

                    const extractImages = async () => {
                        const htmlContent = res.body;
                        const page = await browser.newPage();
                        await page.setContent(htmlContent, { waitUntil: 'load', timeout: 0 });
                        const thumbnails = await page.$$('#altImages ul .imageThumbnail');
                        for (const thumbnail of thumbnails) {
                            await page.evaluate(el => el instanceof HTMLElement && el.scrollIntoView(), thumbnail);
                            await thumbnail.hover();
                        }
                        await page.waitForTimeout(1000);
                        const productData = await page.evaluate(() => {
                            const images = [];
                            document.querySelectorAll('.a-unordered-list .image .imgTagWrapper img').forEach(img => {
                                if (img && img.src && !img.src.endsWith('.svg')) images.push(img.src);
                            });
                            return images;
                        });
                        return productData;
                    };

                    const images_data = await extractImages();
                    resolve({
                        websiteName: 'Amazon',
                        reviews,
                        product_images_links: images_data,
                        review_rating,
                        review_count,
                        price: extractedCurrency,
                        name,
                        description,
                        product_author,
                        stars: extractStars(),
                        image_url: extractMainImage(),
                    });
                } catch (err) {
                    reject(err);
                } finally {
                    done();
                }
            },
        });
        c.queue(url);
    });
};

async function scrapeAmazonProductPage(homeUrl) {
    const browser = await puppeteer.launch({
        headless: true,
        ignoreHTTPSErrors: true,
        args: [
            "--disable-gpu",
            "--disable-dev-shm-usage",
            "--disable-setuid-sandbox",
            "--no-sandbox",
        ],
    });
    try {
        const data = await crawlPage(homeUrl, browser);
        return data;
    } catch (e) {
        await saveScrappingErrors({ error: e.message || e, url: homeUrl });
        return null;
    } finally {
        await browser.close();
    }
}

module.exports = { scrapeAmazonProductPage };

⚡ Step 5 — Create Express API

Now create server.js in the root:

// server.js
const express = require('express');
const cors = require('cors');
const { scrapeAmazonProductPage } = require('./src/scraper/amazon');

const app = express();
app.use(express.json());
app.use(cors());

// Health check
app.get('/', (req, res) => {
    res.send('✅ Amazon Scraper API is running...');
});

// Main API endpoint
app.post('/api/scrape', async (req, res) => {
    const { url } = req.body;

    if (!url || !url.includes('amazon')) {
        return res.status(400).json({ error: 'Invalid or missing Amazon URL' });
    }

    try {
        const data = await scrapeAmazonProductPage(url);
        if (!data) {
            return res.status(500).json({ error: 'Failed to scrape product data' });
        }
        res.json(data);
    } catch (error) {
        console.error('Scrape failed:', error);
        res.status(500).json({ error: error.message || 'Unexpected error' });
    }
});

const PORT = process.env.PORT || 4000;
app.listen(PORT, () => console.log(`🚀 Server running on port ${PORT}`));

🧪 Step 6 — Test the API

Run the server:


node server.js

Then use Postman, curl, or any HTTP client to test:

Request:
POST http://localhost:4000/api/scrape

Content-Type: application/json
{
"url": "https://www.amazon.com/dp/B0BP9Z7K5V"
}
Response:
{
"websiteName": "Amazon",
"name": "Apple AirPods (3rd Generation)",
"price": "$169.99",
"review_rating": "4.7 out of 5 stars",
"review_count": "145,201",
"description": "Spatial Audio with dynamic head tracking...",
"product_author": "Apple",
"stars": [
{ "number_of_stars": 5, "percentage": 85 },
{ "number_of_stars": 4, "percentage": 10 }
],
"product_images_links": [
"https://m.media-amazon.com/images/I/61ZRU9gnbxL._AC_SL1500_.jpg",
"https://m.media-amazon.com/images/I/61dw1VHfwbL._AC_SL1500_.jpg"
]
}
⚙️ Step 7 — Tips for Production

✅ Use rate limiting to avoid Amazon blocking.
✅ Deploy behind a proxy or rotating IP system if scraping frequently.
✅ Consider puppeteer-extra-plugin-stealth for better evasion.
✅ Cache results in a database if you’ll reuse them often.