DEV Community

Anuj Singh
Anuj Singh

Posted on

Building a Lightning-Fast NSE Stock Data Scraper: From API Calls to Full-Stack Web App

Introduction

Ever tried to download historical stock data from the National Stock Exchange (NSE) of India and found yourself waiting hours for selenium-based scrapers to crawl through web pages? I built a solution that's 10x faster using direct API calls and wrapped it in a beautiful full-stack web application.

๐Ÿ”— Live Demo: https://nse-scrap.onrender.com

๐Ÿ“ GitHub Repository: https://github.com/singhanuj620/nse_scrap

๐Ÿš€ What We're Building

A complete stock data scraping solution that includes:

  • Backend API scraper using Node.js with direct NSE API calls
  • React frontend with real-time progress tracking
  • Session management for concurrent scraping jobs
  • Downloadable ZIP exports of all scraped data
  • Responsive UI with modern design

๐ŸŽฏ The Problem with Traditional Scraping

Most stock data scrapers rely on browser automation tools like Selenium or Puppeteer. While functional, they have significant drawbacks:

  • Slow: Loading full web pages for each request
  • Unreliable: Breaking when UI changes
  • Resource-heavy: Requires browser instances
  • Limited scalability: Can't handle many concurrent requests

๐Ÿ’ก The Solution: Direct API Approach

Instead of scraping web pages, I discovered that NSE provides direct API endpoints for historical data. Here's the core approach:

class NSEAPIClient {
    constructor() {
        this.baseUrl = 'https://www.nseindia.com';
        this.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Accept': 'application/json, text/plain, */*',
            'Referer': 'https://www.nseindia.com/report-detail/eq_security',
            // ... other headers for authentication
        };
    }

    async initializeSession() {
        // Get session cookies from NSE
        const response = await https.get(`${this.baseUrl}/report-detail/eq_security`);
        this.cookies = extractCookies(response);
    }

    async fetchData(symbol, year) {
        const fromDate = `01-01-${year}`;
        const toDate = `31-12-${year}`;

        const url = `${this.baseUrl}/api/historicalOR/generateSecurityWiseHistoricalData?from=${fromDate}&to=${toDate}&symbol=${symbol}&type=priceVolumeDeliverable&series=EQ&csv=true`;

        return await this.makeRequest(url);
    }
}
Enter fullscreen mode Exit fullscreen mode

๐Ÿ—๏ธ Architecture Overview

Backend (Node.js + Express)

The backend handles three main responsibilities:

  1. API Scraping Engine: Direct calls to NSE APIs
  2. Session Management: Track multiple scraping jobs
  3. File Management: Organize and zip downloaded data
// Session tracking for concurrent scraping jobs
const activeSessions = new Map();

app.post('/api/start-scraping', async (req, res) => {
    const { stocks, startYear, endYear, sessionId } = req.body;

    // Initialize session tracking
    const session = {
        sessionId,
        total: stocks.length * (endYear - startYear + 1),
        completed: 0,
        failed: 0,
        currentStock: null,
        currentYear: null,
        status: 'starting',
        results: [],
        startTime: new Date()
    };

    activeSessions.set(sessionId, session);

    // Start scraping in background
    scrapingProcess(stocks, startYear, endYear, sessionId);

    res.json({ 
        message: 'Scraping started successfully', 
        sessionId,
        total: session.total 
    });
});
Enter fullscreen mode Exit fullscreen mode

Frontend (React + Vite)

The frontend provides a clean interface with:

  • Stock management: Add/remove stocks to scrape
  • Real-time progress: Live updates during scraping
  • Result visualization: Success/failure status for each stock
  • Download functionality: One-click ZIP download
const NSEScraper = () => {
  const [stocks, setStocks] = useState(['RELIANCE', 'TCS', 'HDFCBANK']);
  const [progress, setProgress] = useState(0);
  const [isRunning, setIsRunning] = useState(false);

  // Real-time progress polling
  useEffect(() => {
    if (sessionId && isRunning) {
      const interval = setInterval(async () => {
        const response = await axios.get(`/api/progress/${sessionId}`);
        const data = response.data;

        setProgress(data.progress);
        setCurrentStock(data.currentStock);
        setCurrentYear(data.currentYear);
        setCompleted(data.completed);
        setFailed(data.failed);

        if (data.status === 'completed' || data.status === 'error') {
          setIsRunning(false);
        }
      }, 1000);

      return () => clearInterval(interval);
    }
  }, [sessionId, isRunning]);

  return (
    <div className="nse-scraper">
      {/* Beautiful UI components */}
    </div>
  );
};
Enter fullscreen mode Exit fullscreen mode

๐Ÿ”ง Key Technical Features

1. Session Management

Each scraping job gets a unique session ID, allowing multiple users to run concurrent scraping operations without conflicts.

// Progress tracking endpoint
app.get('/api/progress/:sessionId', (req, res) => {
  const sessionId = req.params.sessionId;
  const session = activeSessions.get(sessionId);

  if (!session) {
    return res.status(404).json({ error: 'Session not found' });
  }

  res.json({
    sessionId,
    total: session.total,
    completed: session.completed,
    failed: session.failed,
    currentStock: session.currentStock,
    currentYear: session.currentYear,
    status: session.status,
    progress: session.total > 0 ? (session.completed / session.total) * 100 : 0,
    results: session.results
  });
});
Enter fullscreen mode Exit fullscreen mode

2. Real-Time Progress Updates

The app provides live feedback showing:

  • Current stock being processed
  • Current year being scraped
  • Overall progress percentage
  • Success/failure counts
  • Detailed results for each operation

3. Intelligent Rate Limiting

To avoid overwhelming NSE servers:

  • 1-second delay between individual requests
  • 2-second delay between different stocks
  • Proper session initialization with cookies
// Add delay between requests
await new Promise(resolve => setTimeout(resolve, 1000));

// Longer delay between stocks
if (stockIndex < stocks.length - 1) {
  await new Promise(resolve => setTimeout(resolve, 2000));
}
Enter fullscreen mode Exit fullscreen mode

4. Automated File Organization

Downloaded files are automatically organized and can be downloaded as a ZIP:

app.get('/api/download/:sessionId', async (req, res) => {
  // Create zip file
  const archive = archiver('zip', { zlib: { level: 9 } });

  res.attachment(`nse_data_${sessionId}.zip`);
  res.setHeader('Content-Type', 'application/zip');

  archive.pipe(res);
  archive.directory(dataDir, false);
  await archive.finalize();
});
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Š Performance Comparison

Method Time for 10 stocks (3 years) Reliability Resource Usage
Selenium ~45 minutes Medium High
Direct API ~4.5 minutes High Low

๐ŸŽจ UI/UX Highlights

The frontend features a modern, responsive design with:

  • Clean stock management: Easy add/remove interface with validation
  • Visual progress indicators: Real-time progress bars and status icons
  • Live updates: Current stock and year display without page refresh
  • Responsive design: Works seamlessly on desktop and mobile
  • Modern styling: Clean, professional interface using Lucide React icons

๐Ÿš€ Deployment & Architecture

Flexible Deployment Options

The project supports multiple deployment strategies:

Single Deployment (Full-Stack)

// Serve static files only if frontend is not deployed separately
if (!process.env.FRONTEND_URL) {
  app.use(express.static('frontend/dist'));
}

// Configure CORS based on deployment type
const corsOptions = {
  origin: process.env.FRONTEND_URL ? 
    [process.env.FRONTEND_URL, 'http://localhost:5173'] : true,
  credentials: true
};
Enter fullscreen mode Exit fullscreen mode

Separate Deployments

  • Backend: Railway, Render, or any Node.js hosting
  • Frontend: Vercel, Netlify, or any static hosting

Live Demo Deployment

The live demo is hosted on Render with:

  • Automatic builds from GitHub
  • Environment variable configuration
  • Zero-downtime deployments

๐Ÿ’พ Data Format & Quality

Each CSV file contains comprehensive trading data:

  • Price Data: Open, High, Low, Close, Previous Close, Last Traded Price, VWAP
  • Volume Data: Total Traded Quantity, Total Traded Value, Number of Trades
  • Delivery Data: Delivery Quantity, Delivery Percentage
  • Metadata: Symbol, Series, Date, Timestamps

Sample data structure:

Date,Symbol,Series,Open,High,Low,Close,Last,Prevclose,TOTTRDQTY,TOTTRDVAL,TIMESTAMP,TOTALTRADES,ISIN,DELIVERYQTY,DELIVERYPER
01-Jan-2024,RELIANCE,EQ,2915.00,2932.00,2901.05,2920.15,2920.15,2918.75,5234567,15234567890,01-JAN-2024,89234,INE002A01018,2617283,50.01
Enter fullscreen mode Exit fullscreen mode

๐Ÿ”ฎ Future Enhancements

Planned improvements include:

  • Data visualization: Built-in charts and analytics dashboard
  • Scheduled scraping: Automated daily/weekly downloads with cron jobs
  • Database integration: Store data in PostgreSQL/MongoDB for persistence
  • REST API: Additional endpoints for programmatic data access
  • Export formats: JSON, Excel, and direct database exports
  • User authentication: Personal dashboards and saved configurations

๐Ÿ“ Key Lessons Learned

  1. API Discovery: Sometimes the best solution is finding the right API endpoint rather than scraping
  2. Session Management: Proper cookie handling and headers are crucial for API access
  3. User Experience: Real-time feedback transforms the user experience
  4. Error Handling: Always plan for network failures and implement retry mechanisms
  5. Performance: Direct API calls can be orders of magnitude faster than browser automation
  6. Deployment Flexibility: Supporting both monolithic and microservice architectures increases adoption

๐Ÿ› ๏ธ Tech Stack Deep Dive

Backend Technologies:

  • Node.js: Runtime environment for server-side JavaScript
  • Express.js: Fast, unopinionated web framework
  • fs-extra: Enhanced file system operations
  • archiver: ZIP file creation for bulk downloads
  • https: Native Node.js module for API calls

Frontend Technologies:

  • React 18: Modern React with hooks
  • Vite: Lightning-fast build tool and dev server
  • Axios: Promise-based HTTP client
  • Lucide React: Beautiful, customizable icons
  • CSS3: Modern styling with flexbox and grid

DevOps & Deployment:

  • Render: Cloud hosting platform
  • GitHub Actions: CI/CD pipeline (potential future addition)
  • Environment Variables: Configuration management

๐ŸŽฏ Getting Started

Ready to try it yourself?

# Clone the repository
git clone https://github.com/singhanuj620/nse_scrap.git
cd nse_scrap

# Install all dependencies
npm run install-all

# Start development server (runs both frontend and backend)
npm run dev

# Or run backend only
npm run server

# Build for production
npm run build
Enter fullscreen mode Exit fullscreen mode

Quick Test:

# Test with sample stocks
npm test

# Download all configured stocks
npm start
Enter fullscreen mode Exit fullscreen mode

๐ŸŒŸ Try It Live

Don't want to set up locally? Try the live demo:
๐Ÿ‘‰ https://nse-scrap.onrender.com

Features available in the live demo:

  • Add/remove stocks from the default list
  • Set custom date ranges
  • Real-time progress tracking
  • Download complete datasets as ZIP files
  • Mobile-responsive interface

๐Ÿ’ญ Conclusion

Building this NSE scraper taught me the value of choosing the right approach over the obvious one. By leveraging direct API calls instead of browser automation, we achieved a solution that's:

  • 10x faster than traditional scrapers
  • More reliable with fewer points of failure
  • Easier to maintain without UI dependency
  • Better user experience with real-time feedback

The full-stack implementation with session management and real-time progress tracking creates a professional tool that makes bulk data downloading actually enjoyable.

Whether you're a quantitative analyst, researcher, or developer working with Indian stock market data, this approach can save you hours of waiting time and provide more reliable data access.

๐Ÿ”— Connect & Contribute

Found this helpful? Let's connect!

Have suggestions, found a bug, or want to contribute? Feel free to:

  • Open an issue on GitHub
  • Submit a pull request
  • Connect with me on LinkedIn

Top comments (0)