Anuj Singh

Posted on Aug 27

Building a Lightning-Fast NSE Stock Data Scraper: From API Calls to Full-Stack Web App

#webscraping #nse

Introduction

Ever tried to download historical stock data from the National Stock Exchange (NSE) of India and found yourself waiting hours for selenium-based scrapers to crawl through web pages? I built a solution that's 10x faster using direct API calls and wrapped it in a beautiful full-stack web application.

🔗 Live Demo: https://nse-scrap.onrender.com

📁 GitHub Repository: https://github.com/singhanuj620/nse_scrap

🚀 What We're Building

A complete stock data scraping solution that includes:

Backend API scraper using Node.js with direct NSE API calls
React frontend with real-time progress tracking
Session management for concurrent scraping jobs
Downloadable ZIP exports of all scraped data
Responsive UI with modern design

🎯 The Problem with Traditional Scraping

Most stock data scrapers rely on browser automation tools like Selenium or Puppeteer. While functional, they have significant drawbacks:

Slow: Loading full web pages for each request
Unreliable: Breaking when UI changes
Resource-heavy: Requires browser instances
Limited scalability: Can't handle many concurrent requests

💡 The Solution: Direct API Approach

Instead of scraping web pages, I discovered that NSE provides direct API endpoints for historical data. Here's the core approach:

class NSEAPIClient {
    constructor() {
        this.baseUrl = 'https://www.nseindia.com';
        this.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Accept': 'application/json, text/plain, */*',
            'Referer': 'https://www.nseindia.com/report-detail/eq_security',
            // ... other headers for authentication
        };
    }

    async initializeSession() {
        // Get session cookies from NSE
        const response = await https.get(`${this.baseUrl}/report-detail/eq_security`);
        this.cookies = extractCookies(response);
    }

    async fetchData(symbol, year) {
        const fromDate = `01-01-${year}`;
        const toDate = `31-12-${year}`;

        const url = `${this.baseUrl}/api/historicalOR/generateSecurityWiseHistoricalData?from=${fromDate}&to=${toDate}&symbol=${symbol}&type=priceVolumeDeliverable&series=EQ&csv=true`;

        return await this.makeRequest(url);
    }
}

🏗️ Architecture Overview

Backend (Node.js + Express)

The backend handles three main responsibilities:

API Scraping Engine: Direct calls to NSE APIs
Session Management: Track multiple scraping jobs
File Management: Organize and zip downloaded data

// Session tracking for concurrent scraping jobs
const activeSessions = new Map();

app.post('/api/start-scraping', async (req, res) => {
    const { stocks, startYear, endYear, sessionId } = req.body;

    // Initialize session tracking
    const session = {
        sessionId,
        total: stocks.length * (endYear - startYear + 1),
        completed: 0,
        failed: 0,
        currentStock: null,
        currentYear: null,
        status: 'starting',
        results: [],
        startTime: new Date()
    };

    activeSessions.set(sessionId, session);

    // Start scraping in background
    scrapingProcess(stocks, startYear, endYear, sessionId);

    res.json({ 
        message: 'Scraping started successfully', 
        sessionId,
        total: session.total 
    });
});

Frontend (React + Vite)

The frontend provides a clean interface with:

Stock management: Add/remove stocks to scrape
Real-time progress: Live updates during scraping
Result visualization: Success/failure status for each stock
Download functionality: One-click ZIP download

const NSEScraper = () => {
  const [stocks, setStocks] = useState(['RELIANCE', 'TCS', 'HDFCBANK']);
  const [progress, setProgress] = useState(0);
  const [isRunning, setIsRunning] = useState(false);

  // Real-time progress polling
  useEffect(() => {
    if (sessionId && isRunning) {
      const interval = setInterval(async () => {
        const response = await axios.get(`/api/progress/${sessionId}`);
        const data = response.data;

        setProgress(data.progress);
        setCurrentStock(data.currentStock);
        setCurrentYear(data.currentYear);
        setCompleted(data.completed);
        setFailed(data.failed);

        if (data.status === 'completed' || data.status === 'error') {
          setIsRunning(false);
        }
      }, 1000);

      return () => clearInterval(interval);
    }
  }, [sessionId, isRunning]);

  return (
    <div className="nse-scraper">
      {/* Beautiful UI components */}
    </div>
  );
};

🔧 Key Technical Features

1. Session Management

Each scraping job gets a unique session ID, allowing multiple users to run concurrent scraping operations without conflicts.

// Progress tracking endpoint
app.get('/api/progress/:sessionId', (req, res) => {
  const sessionId = req.params.sessionId;
  const session = activeSessions.get(sessionId);

  if (!session) {
    return res.status(404).json({ error: 'Session not found' });
  }

  res.json({
    sessionId,
    total: session.total,
    completed: session.completed,
    failed: session.failed,
    currentStock: session.currentStock,
    currentYear: session.currentYear,
    status: session.status,
    progress: session.total > 0 ? (session.completed / session.total) * 100 : 0,
    results: session.results
  });
});

2. Real-Time Progress Updates

The app provides live feedback showing:

Current stock being processed
Current year being scraped
Overall progress percentage
Success/failure counts
Detailed results for each operation

3. Intelligent Rate Limiting

To avoid overwhelming NSE servers:

1-second delay between individual requests
2-second delay between different stocks
Proper session initialization with cookies

// Add delay between requests
await new Promise(resolve => setTimeout(resolve, 1000));

// Longer delay between stocks
if (stockIndex < stocks.length - 1) {
  await new Promise(resolve => setTimeout(resolve, 2000));
}

4. Automated File Organization

Downloaded files are automatically organized and can be downloaded as a ZIP:

app.get('/api/download/:sessionId', async (req, res) => {
  // Create zip file
  const archive = archiver('zip', { zlib: { level: 9 } });

  res.attachment(`nse_data_${sessionId}.zip`);
  res.setHeader('Content-Type', 'application/zip');

  archive.pipe(res);
  archive.directory(dataDir, false);
  await archive.finalize();
});

📊 Performance Comparison

Method	Time for 10 stocks (3 years)	Reliability	Resource Usage
Selenium	~45 minutes	Medium	High
Direct API	~4.5 minutes	High	Low

🎨 UI/UX Highlights

The frontend features a modern, responsive design with:

Clean stock management: Easy add/remove interface with validation
Visual progress indicators: Real-time progress bars and status icons
Live updates: Current stock and year display without page refresh
Responsive design: Works seamlessly on desktop and mobile
Modern styling: Clean, professional interface using Lucide React icons

🚀 Deployment & Architecture

Flexible Deployment Options

The project supports multiple deployment strategies:

Single Deployment (Full-Stack)

// Serve static files only if frontend is not deployed separately
if (!process.env.FRONTEND_URL) {
  app.use(express.static('frontend/dist'));
}

// Configure CORS based on deployment type
const corsOptions = {
  origin: process.env.FRONTEND_URL ? 
    [process.env.FRONTEND_URL, 'http://localhost:5173'] : true,
  credentials: true
};

Separate Deployments

Backend: Railway, Render, or any Node.js hosting
Frontend: Vercel, Netlify, or any static hosting

Live Demo Deployment

The live demo is hosted on Render with:

Automatic builds from GitHub
Environment variable configuration
Zero-downtime deployments

💾 Data Format & Quality

Each CSV file contains comprehensive trading data:

Price Data: Open, High, Low, Close, Previous Close, Last Traded Price, VWAP
Volume Data: Total Traded Quantity, Total Traded Value, Number of Trades
Delivery Data: Delivery Quantity, Delivery Percentage
Metadata: Symbol, Series, Date, Timestamps

Sample data structure:

Date,Symbol,Series,Open,High,Low,Close,Last,Prevclose,TOTTRDQTY,TOTTRDVAL,TIMESTAMP,TOTALTRADES,ISIN,DELIVERYQTY,DELIVERYPER
01-Jan-2024,RELIANCE,EQ,2915.00,2932.00,2901.05,2920.15,2920.15,2918.75,5234567,15234567890,01-JAN-2024,89234,INE002A01018,2617283,50.01

🔮 Future Enhancements

Planned improvements include:

Data visualization: Built-in charts and analytics dashboard
Scheduled scraping: Automated daily/weekly downloads with cron jobs
Database integration: Store data in PostgreSQL/MongoDB for persistence
REST API: Additional endpoints for programmatic data access
Export formats: JSON, Excel, and direct database exports
User authentication: Personal dashboards and saved configurations

📝 Key Lessons Learned

API Discovery: Sometimes the best solution is finding the right API endpoint rather than scraping
Session Management: Proper cookie handling and headers are crucial for API access
User Experience: Real-time feedback transforms the user experience
Error Handling: Always plan for network failures and implement retry mechanisms
Performance: Direct API calls can be orders of magnitude faster than browser automation
Deployment Flexibility: Supporting both monolithic and microservice architectures increases adoption

🛠️ Tech Stack Deep Dive

Backend Technologies:

Node.js: Runtime environment for server-side JavaScript
Express.js: Fast, unopinionated web framework
fs-extra: Enhanced file system operations
archiver: ZIP file creation for bulk downloads
https: Native Node.js module for API calls

Frontend Technologies:

React 18: Modern React with hooks
Vite: Lightning-fast build tool and dev server
Axios: Promise-based HTTP client
Lucide React: Beautiful, customizable icons
CSS3: Modern styling with flexbox and grid

DevOps & Deployment:

Render: Cloud hosting platform
GitHub Actions: CI/CD pipeline (potential future addition)
Environment Variables: Configuration management

🎯 Getting Started

Ready to try it yourself?

# Clone the repository
git clone https://github.com/singhanuj620/nse_scrap.git
cd nse_scrap

# Install all dependencies
npm run install-all

# Start development server (runs both frontend and backend)
npm run dev

# Or run backend only
npm run server

# Build for production
npm run build

Quick Test:

# Test with sample stocks
npm test

# Download all configured stocks
npm start

🌟 Try It Live

Don't want to set up locally? Try the live demo:
👉 https://nse-scrap.onrender.com

Features available in the live demo:

Add/remove stocks from the default list
Set custom date ranges
Real-time progress tracking
Download complete datasets as ZIP files
Mobile-responsive interface

💭 Conclusion

Building this NSE scraper taught me the value of choosing the right approach over the obvious one. By leveraging direct API calls instead of browser automation, we achieved a solution that's:

10x faster than traditional scrapers
More reliable with fewer points of failure
Easier to maintain without UI dependency
Better user experience with real-time feedback

The full-stack implementation with session management and real-time progress tracking creates a professional tool that makes bulk data downloading actually enjoyable.

Whether you're a quantitative analyst, researcher, or developer working with Indian stock market data, this approach can save you hours of waiting time and provide more reliable data access.

🔗 Connect & Contribute

Found this helpful? Let's connect!

💼 LinkedIn: Anuj Singh
🐙 GitHub: @singhanuj620
📁 Project Repository: nse_scrap

Have suggestions, found a bug, or want to contribute? Feel free to:

Open an issue on GitHub
Submit a pull request
Connect with me on LinkedIn

DEV Community