DEV Community

Cover image for Best Beginner’s Guide For Cybersecurity Recon with Python
Scofield Idehen
Scofield Idehen

Posted on • Originally published at blog.learnhubafrica.org

Best Beginner’s Guide For Cybersecurity Recon with Python

Cybersecurity reconnaissance is the first and most critical step in understanding a target’s digital footprint. As a beginner, knowing where to look, how to look, and what tools to use can dramatically increase your effectiveness in network security, penetration testing, and OSINT investigations.

Python has become the go-to language for security specialists due to its simplicity, versatility, and powerful libraries. This guide breaks down the foundations of recon, DNS enumeration, and network scanning using Python, so beginners can grasp concepts practically and immediately start experimenting.

This is not just theory. By following this guide, you’ll get hands-on examples of Python scripts for discovering subdomains, probing hosts, and automating routine recon tasks.

What You Will Achieve

By the end of this guide, you will be able to:

  • Understand the difference between passive and active reconnaissance.
  • Define the scope and target effectively for any recon project.
  • Use Python to query DNS records, subdomains, and certificate transparency logs.
  • Perform asynchronous network scans for host and service discovery.
  • Organise recon data efficiently for analysis and reporting.
  • Apply OPSEC and rate-limiting to stay stealthy during recon.

This knowledge forms a solid foundation for penetration testing, bug bounty hunting, and OSINT investigations.

Tools to Use

Before starting, ensure you have a Python 3.12 environment ready. The following libraries are recommended for recon tasks:

Library Purpose
asyncio / trio Handle thousands of tasks concurrently without threads
httpx Async HTTP/HTTPS requests with HTTP/2, proxy, and SOCKS support
aiodns Asynchronous DNS resolution with DNSSEC support
ipwhois ASN and prefix lookups
rich Pretty terminal output with progress bars
pandas Data organization, CSV/HTML export

Setup:

python3 -m venv recon
source recon/bin/activate
pip install httpx[http2] aiodns ipwhois rich pandas
Enter fullscreen mode Exit fullscreen mode

Recon Fundamentals: Active vs Passive

Recon is classified into passive and active:

  • Passive Recon – You do not touch the target. Sources include WHOIS, CRT.SH, Shodan, GitHub, and leaked databases. It is stealthy and leaves no logs on the target.

  • Active Recon – Direct probing via DNS queries, port scanning, banner grabbing, and web crawling. Active recon is powerful but generates logs and may trigger firewalls.

Rule: Always start with passive recon. It’s safer, cost-free, and helps narrow down what to probe actively.

Canonical Workflow:

  1. Scope Definition – define IP ranges, domains, and employee aliases.
  2. Passive Recon – gather public artefacts.
  3. Correlation & Pivot – deduplicate, enrich, generate leads.
  4. Active Recon – probe live hosts, services, and versions.
  5. Reporting – export structured JSON or CSV for analysis.

DNS Basics & Subdomain Discovery

DNS (Domain Name System) is the foundation of how the Internet identifies and routes traffic.

In reconnaissance, DNS provides some of the most valuable early insights into an organisation’s online structure. By understanding the different DNS record types and how they interact, analysts can map infrastructure, uncover hidden assets, and expand the attack surface systematically.

At its core, DNS is responsible for translating human‑readable domain names into machine‑readable IP addresses. This translation happens through different “record types,” each serving a specific function within a domain’s configuration. The most relevant records for recon are outlined below.

  • A / AAAA → Host to IP mapping
  • CNAME → Aliases and CDNs
  • NS → Authoritative servers hinting at internal structure
  • MX → Email services
  • TXT → SPF, DMARC, and validation tokens
  • SRV → Services like LDAP or SIP

A & AAAA Records — Direct Host Mapping

A and AAAA records map a domain or subdomain to an IP address:

  • A Record: points to an IPv4 address - AAAA Record: points to an IPv6 address

These records represent the most fundamental part of DNS. When a domain resolves to an IP, it provides the first clue about where the service is hosted (e.g., cloud provider, on‑prem infrastructure, shared hosting).

CNAME Records — Indirect Mapping and Service Outsourcing

A CNAME (Canonical Name) record does not point to an IP. Instead, it points one domain to another domain.

Example:

blog.example.com → cname → example-blog.hosting.net
Enter fullscreen mode Exit fullscreen mode

CNAME chains often reveal third‑party services such as:

  • CDN providers (Cloudflare, Akamai)
  • Email platforms
  • SaaS dashboards
  • Cloud hosting environments (AWS, GCP, Azure)

NS Records — Authority and Infrastructure Insight

NS (Name Server) records define which servers are authoritative for a domain. They tell the internet where to go to learn everything else about the domain.

Analysing NS records helps you understand:

  • The hosting provider
  • Whether DNS is self‑managed or outsourced
  • Redundancy and failover configuration
  • Possible subdomains through zone misconfiguration

Note - When organisations self-host NS servers, it often indicates a large internal infrastructure.

MX Records — Email Routing and Third‑Party Dependencies

MX (Mail Exchange) records indicate the mail servers responsible for receiving emails for a domain.

These records reveal:

  • Whether the domain uses Google Workspace, Microsoft 365, or a custom mail server
  • Legacy or insecure mail systems are still in use
  • Additional subdomains possibly tied to mail infrastructure

Because email is a highly targeted attack vector, MX records are essential in understanding an organization’s communication layer.

TXT Records — Security Policies and Verification Artefacts

TXT records store arbitrary text and are commonly used for:

  • SPF (Sender Policy Framework)
  • DMARC (Domain-based Message Authentication)
  • DKIM configuration

Note - DMARC, DKIM, and SPF are three email authentication methods. Together, they help prevent spammers, phishers, and other unauthorised parties from sending emails on behalf of a domain* they do not own.

  • Cloud and SaaS verification tokens
  • Public security disclosures
  • Domain metadata

SRV Records — Service Discovery

SRV (Service) records indicate the location (hostname and port) of specific services such as:

  • SIP
  • LDAP
  • Kerberos
  • VoIP
  • Microsoft services
  • Game servers

SRV records are particularly useful because they often identify:

  • Internal authentication services
  • Directory services
  • Infrastructure dependencies are not visible on the public web

From a recon standpoint, SRV records provide directional clues about internal architecture and commonly overlooked services.

Subdomain Discovery — Expanding the Attack Surface

Subdomains represent functional units within a domain and often expose additional, less-secure services. Each subdomain may host a unique application, API, admin panel, or onboarding system.

Common examples include:

  • api.example.com
  • vpn.example.com
  • dev.example.com
  • staging.example.com
  • admin.example.com

Subdomain discovery typically follows two approaches:

Passive Enumeration

Passive enumeration focuses on collecting information from external sources. These sources already monitor the internet, archive changes, or index public data. You simply query them.
There are three major categories we focus on here:

  • Certificate Transparency (CT) logs
  • Historical DNS
  • Search-engine dorks

Each one reveals different layers of how a domain has evolved.

  • Certificate Transparency Logs (crt.sh and bufferover)

Every HTTPS website must issue an SSL/TLS certificate. Modern browsers require these certificates to be published publicly inside Certificate Transparency logs. This means whenever a company issues a certificate, it's logged—whether that subdomain was meant to stay private or not.

Note*:* If a company creates beta.example.com and forgets to hide it, CT logs will expose it. Even if the subdomain never gets linked on the website, a security researcher can find it.

Two popular sources are:

  • crt.sh – a public CT log search engine

  • bufferover.run – offering CT, DNS, and reverse lookup datasets

Example

api.example.com
dev.example.com
staging-api.example.com
internal-vpn.example.com
Enter fullscreen mode Exit fullscreen mode

A beginner-friendly Python example to fetch CT logs:

    import requests

    domain = "example.com"
    url = f"https://crt.sh/?q=%25.{domain}&output=json"

    try:
        data = requests.get(url, timeout=10).json()
        subdomains = {entry["name_value"] for entry in data}
        for sub in subdomains:
            print(sub)
    except ValueError:
        print("CT logs returned non-JSON (likely rate-limited).")
Enter fullscreen mode Exit fullscreen mode

This script searches for all certificates issued for google.com and extracts subdomains. Even if crt.sh responds with HTML instead of JSON (which happens often), the logic is simple for a beginner.

  • Historical DNS Records (DNSDB, SecurityTrails)

DNS changes over time. Companies delete services, migrate infrastructure, or abandon old endpoints. But historical DNS databases keep copies of previous DNS answers, making them extremely useful for security analysis.

These historical views help you answer questions like:

  • What subdomains existed two years ago?
  • What IP is used to host the corporate website?
  • Did they previously expose an admin panel?
  • Has the company used Cloudflare only recently?

Two major providers are:

  • DNSDB – one of the oldest DNS history datasets (Discontinue)
  • SecurityTrails – commercial but very rich historical DNS API

For example, suppose vpn.example.com used to resolve to a public IP that is now offline. Even though it’s gone today, historical DNS reveals it used to exist, which means attackers may still probe it.

We can use Python to get SecurityTrails (use format):

Note - Get API key before continuing.

    import requests

    api_key = "QAJFqjeHA1wlkfFgO4rYeoHrtR....."
    domain = "learnhubafrica.org"
    headers = {"APIKEY": api_key}

    res = requests.get(
        f"https://api.securitytrails.com/v1/history/{domain}/dns/a",
        headers=headers
    )
    print(res.text)
Enter fullscreen mode Exit fullscreen mode

This returns historical A records, showing old servers previously linked to the domain.

  • Search Engine Dorks (**site:*.example.com -www**)

Search engines crawl everything they can reach, including forgotten subdomains. Using Dork's advanced search parameters lets you extract useful results.

site:*.example.com -www
Enter fullscreen mode Exit fullscreen mode

This searches all subdomains except the main website.

Example:

  • site: tells Google to restrict results to a domain
  • *. means “any subdomain”
  • -www excludes the default homepage

Beginners often underestimate how powerful this is. Search engines accidentally index:

  • Internal dashboards
  • Debug pages
  • Test environments
  • Misconfigured S3 buckets

A simple Python snippet to automate the creation of dorks:

domain = "example.com"
dork = f"site:*.{domain} -www"
print("Use this Google dork:", dork)
Enter fullscreen mode Exit fullscreen mode

Search engine dorks don’t require coding, but generating them consistently helps when working with multiple domains.

In our next article, we will be diving into active recons tools and see how we can use python to automate them.

If you enjoyed this story, consider joining our mailing list. We share real stories, guides, and curated insights on web development, cybersecurity, blockchain, and cloud computing, no spam, just content worth your time.

FAQ

1: Do I need prior Python knowledge?
A: Basic Python (loops, functions, async) is enough. Advanced topics like asyncio are explained step-by-step.

2: Can I run these scripts on Windows?
A: Yes, but Linux is preferred for compatibility with networking tools.

3: Are passive recon scripts safe?
A: Passive scripts query public sources and are generally safe. Active recon carries a higher risk.

4: How do I avoid false positives in subdomain discovery?
A: Always check for wildcard DNS entries before trusting results.

5: How to store and analyse results?
A: Use pandas DataFrames and export to CSV or HTML for easy reporting.

Conclusion

Reconnaissance with Python is a continuous, iterative process. Learning how to find, enumerate and automate data would create more opportunities to get deep into your victims architecture.

Top comments (0)