Introduction
In the digital age, web scraping has become an essential tool for extracting valuable information from websites. Subdomains of a primary domain often host significant resources and data. Moreover, misconfigured CORS (Cross-Origin Resource Sharing) settings can lead to data leakage or unauthorized access to resources. In this article, we introduce an advanced script using Node.js that identifies subdomains, extracts valuable data, and examines CORS misconfigurations.
Important Note: Artificial Intelligence has been used in preparing this paper.
What is CORS?
CORS is a security mechanism that allows servers to specify which domains are permitted to access their resources. This mechanism prevents cross-origin attacks, where an attacker could access a server's data from an unauthorized domain.
What is CORS Misconfiguration?
A CORS misconfiguration occurs when the server improperly grants access to requests from unauthorized domains. This issue can lead to data leakage or abuse of server resources. An example of a misconfigured CORS setup:
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Problem:
-
Access-Control-Allow-Origin: *
allows all domains to access server resources. -
Access-Control-Allow-Credentials: true
enables cookies and sensitive data to be sent to unauthorized domains.
Script Features
-
Subdomain Discovery:
- Identifies subdomains using tools like Subfinder.
-
Data Scraping:
- Sends requests to subdomains to retrieve HTML or JSON data.
-
CORS Misconfiguration Detection:
- Identifies subdomains with misconfigured CORS settings.
-
Reporting:
- Saves results as a JSON file for further analysis.
Node.js Script for Advanced Scraping and CORS Detection
const axios = require('axios');
const { exec } = require('child_process');
const fs = require('fs');
// Discover subdomains using Subfinder
async function findSubdomains(domain) {
return new Promise((resolve, reject) => {
console.log(`[+] Running Subfinder for ${domain}...`);
exec(`subfinder -d ${domain}`, (error, stdout, stderr) => {
if (error) {
return reject(`Error finding subdomains: ${error.message}`);
} else if (stderr) {
return reject(`Subfinder error: ${stderr}`);
} else {
const subdomains = stdout.split('\n').filter(Boolean); // Filter out empty lines
console.log(`[+] Found ${subdomains.length} subdomains.`);
return resolve(subdomains);
}
});
});
}
// Check CORS settings
async function checkCors(url) {
console.log(`[+] Checking CORS for ${url}`);
try {
const response = await axios.get(url, {
headers: { 'Origin': 'http://evil.com' },
timeout: 5000,
});
const corsHeader = response.headers['access-control-allow-origin'];
const credentials = response.headers['access-control-allow-credentials'];
if (corsHeader === '*' || (corsHeader === 'http://evil.com' && credentials === 'true')) {
console.log(`[!] Vulnerable CORS configuration found on ${url}`);
return { url, vulnerable: true, corsHeader, credentials };
} else {
console.log(`[-] No CORS misconfiguration on ${url}`);
return { url, vulnerable: false };
}
} catch (error) {
console.log(`[!] Error checking CORS for ${url}: ${error.message}`);
return { url, vulnerable: false, error: error.message };
}
}
// Scrape data from a URL
async function scrapeData(url) {
console.log(`[+] Scraping data from ${url}`);
try {
const response = await axios.get(url, { timeout: 5000 });
console.log(`[+] Data retrieved from ${url}`);
return { url, data: response.data };
} catch (error) {
console.log(`[!] Error scraping ${url}: ${error.message}`);
return { url, data: null, error: error.message };
}
}
// Main process
async function main(domain) {
console.log(`[+] Starting Scraping Process for ${domain}`);
// Discover subdomains
let subdomains;
try {
subdomains = await findSubdomains(domain);
} catch (error) {
console.error(`[!] Error finding subdomains: ${error}`);
return;
}
// Scrape data and check CORS for each subdomain
const results = [];
for (const subdomain of subdomains) {
const urls = [`http://${subdomain}`, `https://${subdomain}`];
for (const url of urls) {
const corsResult = await checkCors(url);
if (corsResult.vulnerable) {
results.push(corsResult);
}
const scrapeResult = await scrapeData(url);
results.push(scrapeResult);
}
}
// Save results
console.log(`[+] Saving results to scraping_results.json`);
fs.writeFileSync('scraping_results.json', JSON.stringify(results, null, 2));
}
// Script execution
const domain = process.argv[2];
if (!domain) {
console.error('Usage: node scraper.js <domain>');
process.exit(1);
}
main(domain);
Sample Output
The script saves its results in a file called scraping_results.json
in the following format:
[
{
"url": "http://subdomain1.example.com",
"vulnerable": true,
"corsHeader": "*",
"credentials": "true"
},
{
"url": "https://subdomain2.example.com",
"data": "<html><head>...</head><body>...</body></html>"
}
]
Conclusion
This script serves as a powerful tool for identifying subdomains, extracting useful data, and detecting CORS misconfigurations. Misconfigured CORS settings pose significant security risks by allowing unauthorized access to sensitive data. With this script, you can analyze subdomains, detect vulnerabilities, and extract structured information efficiently.
Top comments (0)