DEV Community

Anuj Singh
Anuj Singh

Posted on

Building a Lightning-Fast NSE Stock Data Scraper: From API Calls to Full-Stack Web App

Introduction

Ever tried to download historical stock data from the National Stock Exchange (NSE) of India and found yourself waiting hours for selenium-based scrapers to crawl through web pages? I built a solution that's 10x faster using direct API calls and wrapped it in a beautiful full-stack web application.

🔗 Live Demo: https://nse-scrap.onrender.com

📁 GitHub Repository: https://github.com/singhanuj620/nse_scrap

🚀 What We're Building

A complete stock data scraping solution that includes:

  • Backend API scraper using Node.js with direct NSE API calls
  • React frontend with real-time progress tracking
  • Session management for concurrent scraping jobs
  • Downloadable ZIP exports of all scraped data
  • Responsive UI with modern design

🎯 The Problem with Traditional Scraping

Most stock data scrapers rely on browser automation tools like Selenium or Puppeteer. While functional, they have significant drawbacks:

  • Slow: Loading full web pages for each request
  • Unreliable: Breaking when UI changes
  • Resource-heavy: Requires browser instances
  • Limited scalability: Can't handle many concurrent requests

💡 The Solution: Direct API Approach

Instead of scraping web pages, I discovered that NSE provides direct API endpoints for historical data. Here's the core approach:

class NSEAPIClient {
    constructor() {
        this.baseUrl = 'https://www.nseindia.com';
        this.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
            'Accept': 'application/json, text/plain, */*',
            'Referer': 'https://www.nseindia.com/report-detail/eq_security',
            // ... other headers for authentication
        };
    }

    async initializeSession() {
        // Get session cookies from NSE
        const response = await https.get(`${this.baseUrl}/report-detail/eq_security`);
        this.cookies = extractCookies(response);
    }

    async fetchData(symbol, year) {
        const fromDate = `01-01-${year}`;
        const toDate = `31-12-${year}`;

        const url = `${this.baseUrl}/api/historicalOR/generateSecurityWiseHistoricalData?from=${fromDate}&to=${toDate}&symbol=${symbol}&type=priceVolumeDeliverable&series=EQ&csv=true`;

        return await this.makeRequest(url);
    }
}
Enter fullscreen mode Exit fullscreen mode

🏗️ Architecture Overview

Backend (Node.js + Express)

The backend handles three main responsibilities:

  1. API Scraping Engine: Direct calls to NSE APIs
  2. Session Management: Track multiple scraping jobs
  3. File Management: Organize and zip downloaded data
// Session tracking for concurrent scraping jobs
const activeSessions = new Map();

app.post('/api/start-scraping', async (req, res) => {
    const { stocks, startYear, endYear, sessionId } = req.body;

    // Initialize session tracking
    const session = {
        sessionId,
        total: stocks.length * (endYear - startYear + 1),
        completed: 0,
        failed: 0,
        currentStock: null,
        currentYear: null,
        status: 'starting',
        results: [],
        startTime: new Date()
    };

    activeSessions.set(sessionId, session);

    // Start scraping in background
    scrapingProcess(stocks, startYear, endYear, sessionId);

    res.json({ 
        message: 'Scraping started successfully', 
        sessionId,
        total: session.total 
    });
});
Enter fullscreen mode Exit fullscreen mode

Frontend (React + Vite)

The frontend provides a clean interface with:

  • Stock management: Add/remove stocks to scrape
  • Real-time progress: Live updates during scraping
  • Result visualization: Success/failure status for each stock
  • Download functionality: One-click ZIP download
const NSEScraper = () => {
  const [stocks, setStocks] = useState(['RELIANCE', 'TCS', 'HDFCBANK']);
  const [progress, setProgress] = useState(0);
  const [isRunning, setIsRunning] = useState(false);

  // Real-time progress polling
  useEffect(() => {
    if (sessionId && isRunning) {
      const interval = setInterval(async () => {
        const response = await axios.get(`/api/progress/${sessionId}`);
        const data = response.data;

        setProgress(data.progress);
        setCurrentStock(data.currentStock);
        setCurrentYear(data.currentYear);
        setCompleted(data.completed);
        setFailed(data.failed);

        if (data.status === 'completed' || data.status === 'error') {
          setIsRunning(false);
        }
      }, 1000);

      return () => clearInterval(interval);
    }
  }, [sessionId, isRunning]);

  return (
    <div className="nse-scraper">
      {/* Beautiful UI components */}
    </div>
  );
};
Enter fullscreen mode Exit fullscreen mode

🔧 Key Technical Features

1. Session Management

Each scraping job gets a unique session ID, allowing multiple users to run concurrent scraping operations without conflicts.

// Progress tracking endpoint
app.get('/api/progress/:sessionId', (req, res) => {
  const sessionId = req.params.sessionId;
  const session = activeSessions.get(sessionId);

  if (!session) {
    return res.status(404).json({ error: 'Session not found' });
  }

  res.json({
    sessionId,
    total: session.total,
    completed: session.completed,
    failed: session.failed,
    currentStock: session.currentStock,
    currentYear: session.currentYear,
    status: session.status,
    progress: session.total > 0 ? (session.completed / session.total) * 100 : 0,
    results: session.results
  });
});
Enter fullscreen mode Exit fullscreen mode

2. Real-Time Progress Updates

The app provides live feedback showing:

  • Current stock being processed
  • Current year being scraped
  • Overall progress percentage
  • Success/failure counts
  • Detailed results for each operation

3. Intelligent Rate Limiting

To avoid overwhelming NSE servers:

  • 1-second delay between individual requests
  • 2-second delay between different stocks
  • Proper session initialization with cookies
// Add delay between requests
await new Promise(resolve => setTimeout(resolve, 1000));

// Longer delay between stocks
if (stockIndex < stocks.length - 1) {
  await new Promise(resolve => setTimeout(resolve, 2000));
}
Enter fullscreen mode Exit fullscreen mode

4. Automated File Organization

Downloaded files are automatically organized and can be downloaded as a ZIP:

app.get('/api/download/:sessionId', async (req, res) => {
  // Create zip file
  const archive = archiver('zip', { zlib: { level: 9 } });

  res.attachment(`nse_data_${sessionId}.zip`);
  res.setHeader('Content-Type', 'application/zip');

  archive.pipe(res);
  archive.directory(dataDir, false);
  await archive.finalize();
});
Enter fullscreen mode Exit fullscreen mode

📊 Performance Comparison

Method Time for 10 stocks (3 years) Reliability Resource Usage
Selenium ~45 minutes Medium High
Direct API ~4.5 minutes High Low

🎨 UI/UX Highlights

The frontend features a modern, responsive design with:

  • Clean stock management: Easy add/remove interface with validation
  • Visual progress indicators: Real-time progress bars and status icons
  • Live updates: Current stock and year display without page refresh
  • Responsive design: Works seamlessly on desktop and mobile
  • Modern styling: Clean, professional interface using Lucide React icons

🚀 Deployment & Architecture

Flexible Deployment Options

The project supports multiple deployment strategies:

Single Deployment (Full-Stack)

// Serve static files only if frontend is not deployed separately
if (!process.env.FRONTEND_URL) {
  app.use(express.static('frontend/dist'));
}

// Configure CORS based on deployment type
const corsOptions = {
  origin: process.env.FRONTEND_URL ? 
    [process.env.FRONTEND_URL, 'http://localhost:5173'] : true,
  credentials: true
};
Enter fullscreen mode Exit fullscreen mode

Separate Deployments

  • Backend: Railway, Render, or any Node.js hosting
  • Frontend: Vercel, Netlify, or any static hosting

Live Demo Deployment

The live demo is hosted on Render with:

  • Automatic builds from GitHub
  • Environment variable configuration
  • Zero-downtime deployments

💾 Data Format & Quality

Each CSV file contains comprehensive trading data:

  • Price Data: Open, High, Low, Close, Previous Close, Last Traded Price, VWAP
  • Volume Data: Total Traded Quantity, Total Traded Value, Number of Trades
  • Delivery Data: Delivery Quantity, Delivery Percentage
  • Metadata: Symbol, Series, Date, Timestamps

Sample data structure:

Date,Symbol,Series,Open,High,Low,Close,Last,Prevclose,TOTTRDQTY,TOTTRDVAL,TIMESTAMP,TOTALTRADES,ISIN,DELIVERYQTY,DELIVERYPER
01-Jan-2024,RELIANCE,EQ,2915.00,2932.00,2901.05,2920.15,2920.15,2918.75,5234567,15234567890,01-JAN-2024,89234,INE002A01018,2617283,50.01
Enter fullscreen mode Exit fullscreen mode

🔮 Future Enhancements

Planned improvements include:

  • Data visualization: Built-in charts and analytics dashboard
  • Scheduled scraping: Automated daily/weekly downloads with cron jobs
  • Database integration: Store data in PostgreSQL/MongoDB for persistence
  • REST API: Additional endpoints for programmatic data access
  • Export formats: JSON, Excel, and direct database exports
  • User authentication: Personal dashboards and saved configurations

📝 Key Lessons Learned

  1. API Discovery: Sometimes the best solution is finding the right API endpoint rather than scraping
  2. Session Management: Proper cookie handling and headers are crucial for API access
  3. User Experience: Real-time feedback transforms the user experience
  4. Error Handling: Always plan for network failures and implement retry mechanisms
  5. Performance: Direct API calls can be orders of magnitude faster than browser automation
  6. Deployment Flexibility: Supporting both monolithic and microservice architectures increases adoption

🛠️ Tech Stack Deep Dive

Backend Technologies:

  • Node.js: Runtime environment for server-side JavaScript
  • Express.js: Fast, unopinionated web framework
  • fs-extra: Enhanced file system operations
  • archiver: ZIP file creation for bulk downloads
  • https: Native Node.js module for API calls

Frontend Technologies:

  • React 18: Modern React with hooks
  • Vite: Lightning-fast build tool and dev server
  • Axios: Promise-based HTTP client
  • Lucide React: Beautiful, customizable icons
  • CSS3: Modern styling with flexbox and grid

DevOps & Deployment:

  • Render: Cloud hosting platform
  • GitHub Actions: CI/CD pipeline (potential future addition)
  • Environment Variables: Configuration management

🎯 Getting Started

Ready to try it yourself?

# Clone the repository
git clone https://github.com/singhanuj620/nse_scrap.git
cd nse_scrap

# Install all dependencies
npm run install-all

# Start development server (runs both frontend and backend)
npm run dev

# Or run backend only
npm run server

# Build for production
npm run build
Enter fullscreen mode Exit fullscreen mode

Quick Test:

# Test with sample stocks
npm test

# Download all configured stocks
npm start
Enter fullscreen mode Exit fullscreen mode

🌟 Try It Live

Don't want to set up locally? Try the live demo:
👉 https://nse-scrap.onrender.com

Features available in the live demo:

  • Add/remove stocks from the default list
  • Set custom date ranges
  • Real-time progress tracking
  • Download complete datasets as ZIP files
  • Mobile-responsive interface

💭 Conclusion

Building this NSE scraper taught me the value of choosing the right approach over the obvious one. By leveraging direct API calls instead of browser automation, we achieved a solution that's:

  • 10x faster than traditional scrapers
  • More reliable with fewer points of failure
  • Easier to maintain without UI dependency
  • Better user experience with real-time feedback

The full-stack implementation with session management and real-time progress tracking creates a professional tool that makes bulk data downloading actually enjoyable.

Whether you're a quantitative analyst, researcher, or developer working with Indian stock market data, this approach can save you hours of waiting time and provide more reliable data access.

🔗 Connect & Contribute

Found this helpful? Let's connect!

Have suggestions, found a bug, or want to contribute? Feel free to:

  • Open an issue on GitHub
  • Submit a pull request
  • Connect with me on LinkedIn

Top comments (0)