Introduction
Ever tried to download historical stock data from the National Stock Exchange (NSE) of India and found yourself waiting hours for selenium-based scrapers to crawl through web pages? I built a solution that's 10x faster using direct API calls and wrapped it in a beautiful full-stack web application.
๐ Live Demo: https://nse-scrap.onrender.com
๐ GitHub Repository: https://github.com/singhanuj620/nse_scrap
๐ What We're Building
A complete stock data scraping solution that includes:
- Backend API scraper using Node.js with direct NSE API calls
- React frontend with real-time progress tracking
- Session management for concurrent scraping jobs
- Downloadable ZIP exports of all scraped data
- Responsive UI with modern design
๐ฏ The Problem with Traditional Scraping
Most stock data scrapers rely on browser automation tools like Selenium or Puppeteer. While functional, they have significant drawbacks:
- Slow: Loading full web pages for each request
- Unreliable: Breaking when UI changes
- Resource-heavy: Requires browser instances
- Limited scalability: Can't handle many concurrent requests
๐ก The Solution: Direct API Approach
Instead of scraping web pages, I discovered that NSE provides direct API endpoints for historical data. Here's the core approach:
class NSEAPIClient {
constructor() {
this.baseUrl = 'https://www.nseindia.com';
this.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept': 'application/json, text/plain, */*',
'Referer': 'https://www.nseindia.com/report-detail/eq_security',
// ... other headers for authentication
};
}
async initializeSession() {
// Get session cookies from NSE
const response = await https.get(`${this.baseUrl}/report-detail/eq_security`);
this.cookies = extractCookies(response);
}
async fetchData(symbol, year) {
const fromDate = `01-01-${year}`;
const toDate = `31-12-${year}`;
const url = `${this.baseUrl}/api/historicalOR/generateSecurityWiseHistoricalData?from=${fromDate}&to=${toDate}&symbol=${symbol}&type=priceVolumeDeliverable&series=EQ&csv=true`;
return await this.makeRequest(url);
}
}
๐๏ธ Architecture Overview
Backend (Node.js + Express)
The backend handles three main responsibilities:
- API Scraping Engine: Direct calls to NSE APIs
- Session Management: Track multiple scraping jobs
- File Management: Organize and zip downloaded data
// Session tracking for concurrent scraping jobs
const activeSessions = new Map();
app.post('/api/start-scraping', async (req, res) => {
const { stocks, startYear, endYear, sessionId } = req.body;
// Initialize session tracking
const session = {
sessionId,
total: stocks.length * (endYear - startYear + 1),
completed: 0,
failed: 0,
currentStock: null,
currentYear: null,
status: 'starting',
results: [],
startTime: new Date()
};
activeSessions.set(sessionId, session);
// Start scraping in background
scrapingProcess(stocks, startYear, endYear, sessionId);
res.json({
message: 'Scraping started successfully',
sessionId,
total: session.total
});
});
Frontend (React + Vite)
The frontend provides a clean interface with:
- Stock management: Add/remove stocks to scrape
- Real-time progress: Live updates during scraping
- Result visualization: Success/failure status for each stock
- Download functionality: One-click ZIP download
const NSEScraper = () => {
const [stocks, setStocks] = useState(['RELIANCE', 'TCS', 'HDFCBANK']);
const [progress, setProgress] = useState(0);
const [isRunning, setIsRunning] = useState(false);
// Real-time progress polling
useEffect(() => {
if (sessionId && isRunning) {
const interval = setInterval(async () => {
const response = await axios.get(`/api/progress/${sessionId}`);
const data = response.data;
setProgress(data.progress);
setCurrentStock(data.currentStock);
setCurrentYear(data.currentYear);
setCompleted(data.completed);
setFailed(data.failed);
if (data.status === 'completed' || data.status === 'error') {
setIsRunning(false);
}
}, 1000);
return () => clearInterval(interval);
}
}, [sessionId, isRunning]);
return (
<div className="nse-scraper">
{/* Beautiful UI components */}
</div>
);
};
๐ง Key Technical Features
1. Session Management
Each scraping job gets a unique session ID, allowing multiple users to run concurrent scraping operations without conflicts.
// Progress tracking endpoint
app.get('/api/progress/:sessionId', (req, res) => {
const sessionId = req.params.sessionId;
const session = activeSessions.get(sessionId);
if (!session) {
return res.status(404).json({ error: 'Session not found' });
}
res.json({
sessionId,
total: session.total,
completed: session.completed,
failed: session.failed,
currentStock: session.currentStock,
currentYear: session.currentYear,
status: session.status,
progress: session.total > 0 ? (session.completed / session.total) * 100 : 0,
results: session.results
});
});
2. Real-Time Progress Updates
The app provides live feedback showing:
- Current stock being processed
- Current year being scraped
- Overall progress percentage
- Success/failure counts
- Detailed results for each operation
3. Intelligent Rate Limiting
To avoid overwhelming NSE servers:
- 1-second delay between individual requests
- 2-second delay between different stocks
- Proper session initialization with cookies
// Add delay between requests
await new Promise(resolve => setTimeout(resolve, 1000));
// Longer delay between stocks
if (stockIndex < stocks.length - 1) {
await new Promise(resolve => setTimeout(resolve, 2000));
}
4. Automated File Organization
Downloaded files are automatically organized and can be downloaded as a ZIP:
app.get('/api/download/:sessionId', async (req, res) => {
// Create zip file
const archive = archiver('zip', { zlib: { level: 9 } });
res.attachment(`nse_data_${sessionId}.zip`);
res.setHeader('Content-Type', 'application/zip');
archive.pipe(res);
archive.directory(dataDir, false);
await archive.finalize();
});
๐ Performance Comparison
Method | Time for 10 stocks (3 years) | Reliability | Resource Usage |
---|---|---|---|
Selenium | ~45 minutes | Medium | High |
Direct API | ~4.5 minutes | High | Low |
๐จ UI/UX Highlights
The frontend features a modern, responsive design with:
- Clean stock management: Easy add/remove interface with validation
- Visual progress indicators: Real-time progress bars and status icons
- Live updates: Current stock and year display without page refresh
- Responsive design: Works seamlessly on desktop and mobile
- Modern styling: Clean, professional interface using Lucide React icons
๐ Deployment & Architecture
Flexible Deployment Options
The project supports multiple deployment strategies:
Single Deployment (Full-Stack)
// Serve static files only if frontend is not deployed separately
if (!process.env.FRONTEND_URL) {
app.use(express.static('frontend/dist'));
}
// Configure CORS based on deployment type
const corsOptions = {
origin: process.env.FRONTEND_URL ?
[process.env.FRONTEND_URL, 'http://localhost:5173'] : true,
credentials: true
};
Separate Deployments
- Backend: Railway, Render, or any Node.js hosting
- Frontend: Vercel, Netlify, or any static hosting
Live Demo Deployment
The live demo is hosted on Render with:
- Automatic builds from GitHub
- Environment variable configuration
- Zero-downtime deployments
๐พ Data Format & Quality
Each CSV file contains comprehensive trading data:
- Price Data: Open, High, Low, Close, Previous Close, Last Traded Price, VWAP
- Volume Data: Total Traded Quantity, Total Traded Value, Number of Trades
- Delivery Data: Delivery Quantity, Delivery Percentage
- Metadata: Symbol, Series, Date, Timestamps
Sample data structure:
Date,Symbol,Series,Open,High,Low,Close,Last,Prevclose,TOTTRDQTY,TOTTRDVAL,TIMESTAMP,TOTALTRADES,ISIN,DELIVERYQTY,DELIVERYPER
01-Jan-2024,RELIANCE,EQ,2915.00,2932.00,2901.05,2920.15,2920.15,2918.75,5234567,15234567890,01-JAN-2024,89234,INE002A01018,2617283,50.01
๐ฎ Future Enhancements
Planned improvements include:
- Data visualization: Built-in charts and analytics dashboard
- Scheduled scraping: Automated daily/weekly downloads with cron jobs
- Database integration: Store data in PostgreSQL/MongoDB for persistence
- REST API: Additional endpoints for programmatic data access
- Export formats: JSON, Excel, and direct database exports
- User authentication: Personal dashboards and saved configurations
๐ Key Lessons Learned
- API Discovery: Sometimes the best solution is finding the right API endpoint rather than scraping
- Session Management: Proper cookie handling and headers are crucial for API access
- User Experience: Real-time feedback transforms the user experience
- Error Handling: Always plan for network failures and implement retry mechanisms
- Performance: Direct API calls can be orders of magnitude faster than browser automation
- Deployment Flexibility: Supporting both monolithic and microservice architectures increases adoption
๐ ๏ธ Tech Stack Deep Dive
Backend Technologies:
- Node.js: Runtime environment for server-side JavaScript
- Express.js: Fast, unopinionated web framework
- fs-extra: Enhanced file system operations
- archiver: ZIP file creation for bulk downloads
- https: Native Node.js module for API calls
Frontend Technologies:
- React 18: Modern React with hooks
- Vite: Lightning-fast build tool and dev server
- Axios: Promise-based HTTP client
- Lucide React: Beautiful, customizable icons
- CSS3: Modern styling with flexbox and grid
DevOps & Deployment:
- Render: Cloud hosting platform
- GitHub Actions: CI/CD pipeline (potential future addition)
- Environment Variables: Configuration management
๐ฏ Getting Started
Ready to try it yourself?
# Clone the repository
git clone https://github.com/singhanuj620/nse_scrap.git
cd nse_scrap
# Install all dependencies
npm run install-all
# Start development server (runs both frontend and backend)
npm run dev
# Or run backend only
npm run server
# Build for production
npm run build
Quick Test:
# Test with sample stocks
npm test
# Download all configured stocks
npm start
๐ Try It Live
Don't want to set up locally? Try the live demo:
๐ https://nse-scrap.onrender.com
Features available in the live demo:
- Add/remove stocks from the default list
- Set custom date ranges
- Real-time progress tracking
- Download complete datasets as ZIP files
- Mobile-responsive interface
๐ญ Conclusion
Building this NSE scraper taught me the value of choosing the right approach over the obvious one. By leveraging direct API calls instead of browser automation, we achieved a solution that's:
- 10x faster than traditional scrapers
- More reliable with fewer points of failure
- Easier to maintain without UI dependency
- Better user experience with real-time feedback
The full-stack implementation with session management and real-time progress tracking creates a professional tool that makes bulk data downloading actually enjoyable.
Whether you're a quantitative analyst, researcher, or developer working with Indian stock market data, this approach can save you hours of waiting time and provide more reliable data access.
๐ Connect & Contribute
Found this helpful? Let's connect!
- ๐ผ LinkedIn: Anuj Singh
- ๐ GitHub: @singhanuj620
- ๐ Project Repository: nse_scrap
Have suggestions, found a bug, or want to contribute? Feel free to:
- Open an issue on GitHub
- Submit a pull request
- Connect with me on LinkedIn
Top comments (0)