Cybersecurity reconnaissance is the first and most critical step in understanding a target’s digital footprint. As a beginner, knowing where to look, how to look, and what tools to use can dramatically increase your effectiveness in network security, penetration testing, and OSINT investigations.
Python has become the go-to language for security specialists due to its simplicity, versatility, and powerful libraries. This guide breaks down the foundations of recon, DNS enumeration, and network scanning using Python, so beginners can grasp concepts practically and immediately start experimenting.
This is not just theory. By following this guide, you’ll get hands-on examples of Python scripts for discovering subdomains, probing hosts, and automating routine recon tasks.
What You Will Achieve
By the end of this guide, you will be able to:
- Understand the difference between passive and active reconnaissance.
- Define the scope and target effectively for any recon project.
- Use Python to query DNS records, subdomains, and certificate transparency logs.
- Perform asynchronous network scans for host and service discovery.
- Organise recon data efficiently for analysis and reporting.
- Apply OPSEC and rate-limiting to stay stealthy during recon.
This knowledge forms a solid foundation for penetration testing, bug bounty hunting, and OSINT investigations.
Tools to Use
Before starting, ensure you have a Python 3.12 environment ready. The following libraries are recommended for recon tasks:
| Library | Purpose |
|---|---|
asyncio / trio
|
Handle thousands of tasks concurrently without threads |
httpx |
Async HTTP/HTTPS requests with HTTP/2, proxy, and SOCKS support |
aiodns |
Asynchronous DNS resolution with DNSSEC support |
ipwhois |
ASN and prefix lookups |
rich |
Pretty terminal output with progress bars |
pandas |
Data organization, CSV/HTML export |
Setup:
python3 -m venv recon
source recon/bin/activate
pip install httpx[http2] aiodns ipwhois rich pandas
Recon Fundamentals: Active vs Passive
Recon is classified into passive and active:
Passive Recon – You do not touch the target. Sources include WHOIS, CRT.SH, Shodan, GitHub, and leaked databases. It is stealthy and leaves no logs on the target.
Active Recon – Direct probing via DNS queries, port scanning, banner grabbing, and web crawling. Active recon is powerful but generates logs and may trigger firewalls.
Rule: Always start with passive recon. It’s safer, cost-free, and helps narrow down what to probe actively.
Canonical Workflow:
- Scope Definition – define IP ranges, domains, and employee aliases.
- Passive Recon – gather public artefacts.
- Correlation & Pivot – deduplicate, enrich, generate leads.
- Active Recon – probe live hosts, services, and versions.
- Reporting – export structured JSON or CSV for analysis.
DNS Basics & Subdomain Discovery
DNS (Domain Name System) is the foundation of how the Internet identifies and routes traffic.
In reconnaissance, DNS provides some of the most valuable early insights into an organisation’s online structure. By understanding the different DNS record types and how they interact, analysts can map infrastructure, uncover hidden assets, and expand the attack surface systematically.
At its core, DNS is responsible for translating human‑readable domain names into machine‑readable IP addresses. This translation happens through different “record types,” each serving a specific function within a domain’s configuration. The most relevant records for recon are outlined below.
-
A / AAAA→ Host to IP mapping -
CNAME→ Aliases and CDNs -
NS→ Authoritative servers hinting at internal structure -
MX→ Email services -
TXT→ SPF, DMARC, and validation tokens -
SRV→ Services like LDAP or SIP
A & AAAA Records — Direct Host Mapping
A and AAAA records map a domain or subdomain to an IP address:
These records represent the most fundamental part of DNS. When a domain resolves to an IP, it provides the first clue about where the service is hosted (e.g., cloud provider, on‑prem infrastructure, shared hosting).
CNAME Records — Indirect Mapping and Service Outsourcing
A CNAME (Canonical Name) record does not point to an IP. Instead, it points one domain to another domain.
Example:
blog.example.com → cname → example-blog.hosting.net
CNAME chains often reveal third‑party services such as:
- CDN providers (Cloudflare, Akamai)
- Email platforms
- SaaS dashboards
- Cloud hosting environments (AWS, GCP, Azure)
NS Records — Authority and Infrastructure Insight
NS (Name Server) records define which servers are authoritative for a domain. They tell the internet where to go to learn everything else about the domain.
Analysing NS records helps you understand:
- The hosting provider
- Whether DNS is self‑managed or outsourced
- Redundancy and failover configuration
- Possible subdomains through zone misconfiguration
Note - When organisations self-host NS servers, it often indicates a large internal infrastructure.
MX Records — Email Routing and Third‑Party Dependencies
MX (Mail Exchange) records indicate the mail servers responsible for receiving emails for a domain.
These records reveal:
- Whether the domain uses Google Workspace, Microsoft 365, or a custom mail server
- Legacy or insecure mail systems are still in use
- Additional subdomains possibly tied to mail infrastructure
Because email is a highly targeted attack vector, MX records are essential in understanding an organization’s communication layer.
TXT Records — Security Policies and Verification Artefacts
TXT records store arbitrary text and are commonly used for:
Note - DMARC, DKIM, and SPF are three email authentication methods. Together, they help prevent spammers, phishers, and other unauthorised parties from sending emails on behalf of a domain* they do not own.
- Cloud and SaaS verification tokens
- Public security disclosures
- Domain metadata
SRV Records — Service Discovery
SRV (Service) records indicate the location (hostname and port) of specific services such as:
- SIP
- LDAP
- Kerberos
- VoIP
- Microsoft services
- Game servers
SRV records are particularly useful because they often identify:
- Internal authentication services
- Directory services
- Infrastructure dependencies are not visible on the public web
From a recon standpoint, SRV records provide directional clues about internal architecture and commonly overlooked services.
Subdomain Discovery — Expanding the Attack Surface
Subdomains represent functional units within a domain and often expose additional, less-secure services. Each subdomain may host a unique application, API, admin panel, or onboarding system.
Common examples include:
api.example.comvpn.example.comdev.example.comstaging.example.comadmin.example.com
Subdomain discovery typically follows two approaches:
Passive Enumeration
Passive enumeration focuses on collecting information from external sources. These sources already monitor the internet, archive changes, or index public data. You simply query them.
There are three major categories we focus on here:
- Certificate Transparency (CT) logs
- Historical DNS
- Search-engine dorks
Each one reveals different layers of how a domain has evolved.
- Certificate Transparency Logs (crt.sh and bufferover)
Every HTTPS website must issue an SSL/TLS certificate. Modern browsers require these certificates to be published publicly inside Certificate Transparency logs. This means whenever a company issues a certificate, it's logged—whether that subdomain was meant to stay private or not.
Note*:* If a company creates beta.example.com and forgets to hide it, CT logs will expose it. Even if the subdomain never gets linked on the website, a security researcher can find it.
Two popular sources are:
Example
api.example.com
dev.example.com
staging-api.example.com
internal-vpn.example.com
A beginner-friendly Python example to fetch CT logs:
import requests
domain = "example.com"
url = f"https://crt.sh/?q=%25.{domain}&output=json"
try:
data = requests.get(url, timeout=10).json()
subdomains = {entry["name_value"] for entry in data}
for sub in subdomains:
print(sub)
except ValueError:
print("CT logs returned non-JSON (likely rate-limited).")
This script searches for all certificates issued for google.com and extracts subdomains. Even if crt.sh responds with HTML instead of JSON (which happens often), the logic is simple for a beginner.
- Historical DNS Records (DNSDB, SecurityTrails)
DNS changes over time. Companies delete services, migrate infrastructure, or abandon old endpoints. But historical DNS databases keep copies of previous DNS answers, making them extremely useful for security analysis.
These historical views help you answer questions like:
- What subdomains existed two years ago?
- What IP is used to host the corporate website?
- Did they previously expose an admin panel?
- Has the company used Cloudflare only recently?
Two major providers are:
- DNSDB – one of the oldest DNS history datasets (Discontinue)
- SecurityTrails – commercial but very rich historical DNS API
For example, suppose vpn.example.com used to resolve to a public IP that is now offline. Even though it’s gone today, historical DNS reveals it used to exist, which means attackers may still probe it.
We can use Python to get SecurityTrails (use format):
Note - Get API key before continuing.
import requests
api_key = "QAJFqjeHA1wlkfFgO4rYeoHrtR....."
domain = "learnhubafrica.org"
headers = {"APIKEY": api_key}
res = requests.get(
f"https://api.securitytrails.com/v1/history/{domain}/dns/a",
headers=headers
)
print(res.text)
This returns historical A records, showing old servers previously linked to the domain.
-
Search Engine Dorks (
**site:*.example.com -www**)
Search engines crawl everything they can reach, including forgotten subdomains. Using Dork's advanced search parameters lets you extract useful results.
site:*.example.com -www
This searches all subdomains except the main website.
Example:
-
site:tells Google to restrict results to a domain -
*.means “any subdomain” -
-wwwexcludes the default homepage
Beginners often underestimate how powerful this is. Search engines accidentally index:
- Internal dashboards
- Debug pages
- Test environments
- Misconfigured S3 buckets
A simple Python snippet to automate the creation of dorks:
domain = "example.com"
dork = f"site:*.{domain} -www"
print("Use this Google dork:", dork)
Search engine dorks don’t require coding, but generating them consistently helps when working with multiple domains.
In our next article, we will be diving into active recons tools and see how we can use python to automate them.
If you enjoyed this story, consider joining our mailing list. We share real stories, guides, and curated insights on web development, cybersecurity, blockchain, and cloud computing, no spam, just content worth your time.
FAQ
1: Do I need prior Python knowledge?
A: Basic Python (loops, functions, async) is enough. Advanced topics like asyncio are explained step-by-step.
2: Can I run these scripts on Windows?
A: Yes, but Linux is preferred for compatibility with networking tools.
3: Are passive recon scripts safe?
A: Passive scripts query public sources and are generally safe. Active recon carries a higher risk.
4: How do I avoid false positives in subdomain discovery?
A: Always check for wildcard DNS entries before trusting results.
5: How to store and analyse results?
A: Use pandas DataFrames and export to CSV or HTML for easy reporting.
Conclusion
Reconnaissance with Python is a continuous, iterative process. Learning how to find, enumerate and automate data would create more opportunities to get deep into your victims architecture.



Top comments (0)