Introduction
Ever tried to download historical stock data from the National Stock Exchange (NSE) of India and found yourself waiting hours for selenium-based scrapers to crawl through web pages? I built a solution that's 10x faster using direct API calls and wrapped it in a beautiful full-stack web application.
🔗 Live Demo: https://nse-scrap.onrender.com
📁 GitHub Repository: https://github.com/singhanuj620/nse_scrap
🚀 What We're Building
A complete stock data scraping solution that includes:
- Backend API scraper using Node.js with direct NSE API calls
- React frontend with real-time progress tracking
- Session management for concurrent scraping jobs
- Downloadable ZIP exports of all scraped data
- Responsive UI with modern design
🎯 The Problem with Traditional Scraping
Most stock data scrapers rely on browser automation tools like Selenium or Puppeteer. While functional, they have significant drawbacks:
- Slow: Loading full web pages for each request
- Unreliable: Breaking when UI changes
- Resource-heavy: Requires browser instances
- Limited scalability: Can't handle many concurrent requests
💡 The Solution: Direct API Approach
Instead of scraping web pages, I discovered that NSE provides direct API endpoints for historical data. Here's the core approach:
class NSEAPIClient {
constructor() {
this.baseUrl = 'https://www.nseindia.com';
this.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json, text/plain, */*',
'Referer': 'https://www.nseindia.com/report-detail/eq_security',
// ... other headers for authentication
};
}
async initializeSession() {
// Get session cookies from NSE
const response = await https.get(`${this.baseUrl}/report-detail/eq_security`);
this.cookies = extractCookies(response);
}
async fetchData(symbol, year) {
const fromDate = `01-01-${year}`;
const toDate = `31-12-${year}`;
const url = `${this.baseUrl}/api/historicalOR/generateSecurityWiseHistoricalData?from=${fromDate}&to=${toDate}&symbol=${symbol}&type=priceVolumeDeliverable&series=EQ&csv=true`;
return await this.makeRequest(url);
}
}
🏗️ Architecture Overview
Backend (Node.js + Express)
The backend handles three main responsibilities:
- API Scraping Engine: Direct calls to NSE APIs
- Session Management: Track multiple scraping jobs
- File Management: Organize and zip downloaded data
// Session tracking for concurrent scraping jobs
const activeSessions = new Map();
app.post('/api/start-scraping', async (req, res) => {
const { stocks, startYear, endYear, sessionId } = req.body;
// Initialize session tracking
const session = {
sessionId,
total: stocks.length * (endYear - startYear + 1),
completed: 0,
failed: 0,
currentStock: null,
currentYear: null,
status: 'starting',
results: [],
startTime: new Date()
};
activeSessions.set(sessionId, session);
// Start scraping in background
scrapingProcess(stocks, startYear, endYear, sessionId);
res.json({
message: 'Scraping started successfully',
sessionId,
total: session.total
});
});
Frontend (React + Vite)
The frontend provides a clean interface with:
- Stock management: Add/remove stocks to scrape
- Real-time progress: Live updates during scraping
- Result visualization: Success/failure status for each stock
- Download functionality: One-click ZIP download
const NSEScraper = () => {
const [stocks, setStocks] = useState(['RELIANCE', 'TCS', 'HDFCBANK']);
const [progress, setProgress] = useState(0);
const [isRunning, setIsRunning] = useState(false);
// Real-time progress polling
useEffect(() => {
if (sessionId && isRunning) {
const interval = setInterval(async () => {
const response = await axios.get(`/api/progress/${sessionId}`);
const data = response.data;
setProgress(data.progress);
setCurrentStock(data.currentStock);
setCurrentYear(data.currentYear);
setCompleted(data.completed);
setFailed(data.failed);
if (data.status === 'completed' || data.status === 'error') {
setIsRunning(false);
}
}, 1000);
return () => clearInterval(interval);
}
}, [sessionId, isRunning]);
return (
<div className="nse-scraper">
{/* Beautiful UI components */}
</div>
);
};
🔧 Key Technical Features
1. Session Management
Each scraping job gets a unique session ID, allowing multiple users to run concurrent scraping operations without conflicts.
// Progress tracking endpoint
app.get('/api/progress/:sessionId', (req, res) => {
const sessionId = req.params.sessionId;
const session = activeSessions.get(sessionId);
if (!session) {
return res.status(404).json({ error: 'Session not found' });
}
res.json({
sessionId,
total: session.total,
completed: session.completed,
failed: session.failed,
currentStock: session.currentStock,
currentYear: session.currentYear,
status: session.status,
progress: session.total > 0 ? (session.completed / session.total) * 100 : 0,
results: session.results
});
});
2. Real-Time Progress Updates
The app provides live feedback showing:
- Current stock being processed
- Current year being scraped
- Overall progress percentage
- Success/failure counts
- Detailed results for each operation
3. Intelligent Rate Limiting
To avoid overwhelming NSE servers:
- 1-second delay between individual requests
- 2-second delay between different stocks
- Proper session initialization with cookies
// Add delay between requests
await new Promise(resolve => setTimeout(resolve, 1000));
// Longer delay between stocks
if (stockIndex < stocks.length - 1) {
await new Promise(resolve => setTimeout(resolve, 2000));
}
4. Automated File Organization
Downloaded files are automatically organized and can be downloaded as a ZIP:
app.get('/api/download/:sessionId', async (req, res) => {
// Create zip file
const archive = archiver('zip', { zlib: { level: 9 } });
res.attachment(`nse_data_${sessionId}.zip`);
res.setHeader('Content-Type', 'application/zip');
archive.pipe(res);
archive.directory(dataDir, false);
await archive.finalize();
});
📊 Performance Comparison
| Method | Time for 10 stocks (3 years) | Reliability | Resource Usage |
|---|---|---|---|
| Selenium | ~45 minutes | Medium | High |
| Direct API | ~4.5 minutes | High | Low |
🎨 UI/UX Highlights
The frontend features a modern, responsive design with:
- Clean stock management: Easy add/remove interface with validation
- Visual progress indicators: Real-time progress bars and status icons
- Live updates: Current stock and year display without page refresh
- Responsive design: Works seamlessly on desktop and mobile
- Modern styling: Clean, professional interface using Lucide React icons
🚀 Deployment & Architecture
Flexible Deployment Options
The project supports multiple deployment strategies:
Single Deployment (Full-Stack)
// Serve static files only if frontend is not deployed separately
if (!process.env.FRONTEND_URL) {
app.use(express.static('frontend/dist'));
}
// Configure CORS based on deployment type
const corsOptions = {
origin: process.env.FRONTEND_URL ?
[process.env.FRONTEND_URL, 'http://localhost:5173'] : true,
credentials: true
};
Separate Deployments
- Backend: Railway, Render, or any Node.js hosting
- Frontend: Vercel, Netlify, or any static hosting
Live Demo Deployment
The live demo is hosted on Render with:
- Automatic builds from GitHub
- Environment variable configuration
- Zero-downtime deployments
💾 Data Format & Quality
Each CSV file contains comprehensive trading data:
- Price Data: Open, High, Low, Close, Previous Close, Last Traded Price, VWAP
- Volume Data: Total Traded Quantity, Total Traded Value, Number of Trades
- Delivery Data: Delivery Quantity, Delivery Percentage
- Metadata: Symbol, Series, Date, Timestamps
Sample data structure:
Date,Symbol,Series,Open,High,Low,Close,Last,Prevclose,TOTTRDQTY,TOTTRDVAL,TIMESTAMP,TOTALTRADES,ISIN,DELIVERYQTY,DELIVERYPER
01-Jan-2024,RELIANCE,EQ,2915.00,2932.00,2901.05,2920.15,2920.15,2918.75,5234567,15234567890,01-JAN-2024,89234,INE002A01018,2617283,50.01
🔮 Future Enhancements
Planned improvements include:
- Data visualization: Built-in charts and analytics dashboard
- Scheduled scraping: Automated daily/weekly downloads with cron jobs
- Database integration: Store data in PostgreSQL/MongoDB for persistence
- REST API: Additional endpoints for programmatic data access
- Export formats: JSON, Excel, and direct database exports
- User authentication: Personal dashboards and saved configurations
📝 Key Lessons Learned
- API Discovery: Sometimes the best solution is finding the right API endpoint rather than scraping
- Session Management: Proper cookie handling and headers are crucial for API access
- User Experience: Real-time feedback transforms the user experience
- Error Handling: Always plan for network failures and implement retry mechanisms
- Performance: Direct API calls can be orders of magnitude faster than browser automation
- Deployment Flexibility: Supporting both monolithic and microservice architectures increases adoption
🛠️ Tech Stack Deep Dive
Backend Technologies:
- Node.js: Runtime environment for server-side JavaScript
- Express.js: Fast, unopinionated web framework
- fs-extra: Enhanced file system operations
- archiver: ZIP file creation for bulk downloads
- https: Native Node.js module for API calls
Frontend Technologies:
- React 18: Modern React with hooks
- Vite: Lightning-fast build tool and dev server
- Axios: Promise-based HTTP client
- Lucide React: Beautiful, customizable icons
- CSS3: Modern styling with flexbox and grid
DevOps & Deployment:
- Render: Cloud hosting platform
- GitHub Actions: CI/CD pipeline (potential future addition)
- Environment Variables: Configuration management
🎯 Getting Started
Ready to try it yourself?
# Clone the repository
git clone https://github.com/singhanuj620/nse_scrap.git
cd nse_scrap
# Install all dependencies
npm run install-all
# Start development server (runs both frontend and backend)
npm run dev
# Or run backend only
npm run server
# Build for production
npm run build
Quick Test:
# Test with sample stocks
npm test
# Download all configured stocks
npm start
🌟 Try It Live
Don't want to set up locally? Try the live demo:
👉 https://nse-scrap.onrender.com
Features available in the live demo:
- Add/remove stocks from the default list
- Set custom date ranges
- Real-time progress tracking
- Download complete datasets as ZIP files
- Mobile-responsive interface
💭 Conclusion
Building this NSE scraper taught me the value of choosing the right approach over the obvious one. By leveraging direct API calls instead of browser automation, we achieved a solution that's:
- 10x faster than traditional scrapers
- More reliable with fewer points of failure
- Easier to maintain without UI dependency
- Better user experience with real-time feedback
The full-stack implementation with session management and real-time progress tracking creates a professional tool that makes bulk data downloading actually enjoyable.
Whether you're a quantitative analyst, researcher, or developer working with Indian stock market data, this approach can save you hours of waiting time and provide more reliable data access.
🔗 Connect & Contribute
Found this helpful? Let's connect!
- 💼 LinkedIn: Anuj Singh
- 🐙 GitHub: @singhanuj620
- 📁 Project Repository: nse_scrap
Have suggestions, found a bug, or want to contribute? Feel free to:
- Open an issue on GitHub
- Submit a pull request
- Connect with me on LinkedIn
Top comments (0)