Mariano Gobea Alcoba

Posted on Mar 22 • Originally published at mgatc.com

Cloudflare flags archive.today as 'C&C/Botnet'; no longer resolves via 1.1.1.2!

#cloudflare #dns #archivetoday #networksecurity

The recent flagging of archive.today as "C&C/Botnet" by Cloudflare Radar, leading to its non-resolution via the 1.1.1.2 recursive DNS resolver, represents a significant event for internet infrastructure and security. This incident necessitates a deep technical analysis of the mechanisms involved in domain categorization, the operational aspects of large-scale DNS resolution, and the broader implications for accessibility and threat intelligence. This article will dissect the technical underpinnings of such actions, exploring how domains are categorized, the functionality of recursive resolvers, and the impact on both service providers and end-users.

Cloudflare's DNS Infrastructure and Threat Intelligence

Cloudflare operates one of the internet's largest and most widely used recursive DNS resolvers, 1.1.1.1 (and 1.1.1.2, which offers malware blocking). This service processes trillions of DNS queries daily, providing a crucial component of internet navigation. The underlying architecture leverages a global Anycast network, ensuring low-latency resolution by directing user queries to the geographically closest Cloudflare data center. This distributed nature is fundamental to its performance and resilience.

Recursive DNS Resolution Mechanics

A recursive DNS resolver, such as 1.1.1.2, is responsible for performing the entire DNS lookup process on behalf of an end-user client. When a client requests to resolve a domain name (e.g., archive.today), the recursive resolver performs a series of queries:

Root DNS Servers: Queries a root server for the IP address of the Top-Level Domain (TLD) server (e.g., .today).
TLD DNS Servers: Queries the TLD server for the IP address of the authoritative DNS server for the specific domain (e.g., archive.today).
Authoritative DNS Servers: Queries the authoritative server for the A/AAAA record (IP address) of archive.today.
Response to Client: Returns the resolved IP address to the client, caching the result for future requests.

When a domain is flagged for security reasons, the recursive resolver can intervene at various stages. For a service like 1.1.1.2, which explicitly offers "malware blocking," this intervention is a designed feature. Instead of returning the legitimate A record, the resolver might return an NXDOMAIN (Non-Existent Domain) response, a SERVFAIL (Server Failure) response, or, in some cases, an IP address pointing to a sinkhole or a block page. The effect is that users relying on 1.1.1.2 cannot access the flagged domain.

Cloudflare Radar and Domain Categorization

Cloudflare Radar is a public-facing platform that provides insights into internet traffic patterns, attacks, and outages. More broadly, Cloudflare leverages extensive telemetry from its DNS resolver, CDN, WAF, and other security products to build a comprehensive threat intelligence picture. This intelligence feeds into various security services, including domain categorization.

The categorization of a domain as "C&C/Botnet" is a serious accusation, indicating a belief that the domain is actively used by attackers to control compromised machines (bots) or to coordinate malicious activities. The methodologies for such categorizations are complex and typically involve a combination of automated analysis and, in some cases, human review.

Technical Methodologies for "C&C/Botnet" Detection

Identifying a Command and Control (C&C) server or a domain associated with a botnet relies on observing anomalous patterns and specific indicators of compromise (IoCs) across various layers of network traffic and domain behavior.

Passive DNS Analysis

Passive DNS (pDNS) databases aggregate historical DNS resolution data. By analyzing pDNS records, security systems can identify:

Fast Flux: Rapid and frequent changes in a domain's A records, where multiple IP addresses are rotated in quick succession. This technique is often used by botnets to hide their C&C infrastructure behind a constantly changing set of compromised hosts or proxies.
Domain Generation Algorithms (DGAs): Botnets often use DGAs to generate a large number of pseudo-random domain names daily. If a bot client cannot reach its primary C&C, it can iterate through DGA-generated domains to find a new C&C. Detecting these algorithmic patterns, often through entropy analysis or machine learning, is a key indicator.
Sinkholing Overlap: If a domain resolves to an IP address known to be a security sinkhole, it suggests that the domain was previously associated with malicious activity and is now being intercepted.
Geographical Distribution and Traffic Volume: Unusual spikes in traffic from disparate geographic locations, especially those not typically associated with the domain's legitimate user base, can be suspicious.

Traffic Pattern Analysis

Beyond DNS resolution, real-time traffic analysis plays a critical role:

Payload Analysis: While challenging due to encryption, unencrypted traffic or metadata can reveal suspicious communication patterns. Botnet C&C traffic often involves small, frequent beacons, data exfiltration attempts, or commands for further malicious actions (e.g., DDoS attacks, spam distribution).
Connection Metadata: Analyzing source/destination IPs, ports, protocols, and connection durations can identify unusual communication. For instance, a domain with legitimate web services might show HTTP/S traffic on standard ports, whereas a C&C server might see unusual ports or non-standard protocols.
Client Fingerprinting: Analyzing HTTP headers (User-Agent strings), TLS client hello messages (JA3/JA4 hashes), or other network characteristics can help identify known bot families communicating with a domain.
DDoS Activity Correlation: If a domain is observed orchestrating or participating in DDoS attacks, it's a strong indicator of botnet involvement.

Malware Analysis and IoCs

Direct analysis of malware samples is often the most definitive way to link a domain to malicious activity.

Static and Dynamic Analysis: Researchers execute malware in sandboxed environments to observe its behavior, including network connections. If a sample attempts to resolve and connect to archive.today, it becomes a strong IoC.
YARA Rules and Signatures: Malware analysts develop signatures (e.g., YARA rules) to detect specific families of malware. If these families are known to communicate with particular domains, those domains are flagged.
Threat Intelligence Feeds: Integration with numerous public and private threat intelligence feeds (e.g., MISP, Abuse.ch) provides lists of known malicious IPs, domains, and hashes.

Reputation Scoring and Machine Learning

Large security vendors build complex reputation systems for IP addresses, domains, and files. These systems aggregate data from various sources (spam traps, honeypots, email security gateways, WAF logs, antivirus detections) and assign a dynamic score.

Anomaly Detection: Machine learning models are trained on vast datasets of both benign and malicious internet activity. They can identify deviations from normal behavior that might indicate a new C&C or botnet domain, even without prior knowledge of specific IoCs.
Graph Analysis: Building graphs of relationships between domains, IP addresses, autonomous systems (ASNs), and registrar information can uncover clusters of malicious infrastructure that might share common characteristics.

The Challenge of False Positives

Automated detection systems, while powerful, are not infallible. The complexity of internet traffic and the ever-evolving tactics of attackers mean that legitimate services can sometimes exhibit characteristics that mimic malicious behavior.

For instance, Content Delivery Networks (CDNs) and legitimate load balancing setups can use fast flux-like IP rotation.
Services that crawl the internet, archive content, or perform security research might generate traffic patterns that could be misconstrued as bot-like.
A compromised sub-domain or a single compromised host within a larger legitimate network could lead to the entire top-level domain being flagged if not carefully isolated.

Technical Mechanisms of Non-Resolution by 1.1.1.2

When archive.today is flagged by Cloudflare Radar's threat intelligence, the 1.1.1.2 recursive resolver implements a block. This is typically achieved by altering the response to DNS queries for the flagged domain.

Consider a standard DNS query using dig:

dig archive.today @8.8.8.8 +short

Output (example with Google DNS, if archive.today were resolving):

185.73.187.168

Now, consider a query to 1.1.1.2 for a domain flagged as malicious. The expected behavior is that the resolver will not return a legitimate A record. Instead, it might return an NXDOMAIN response, indicating that the domain does not exist, or a SERVFAIL, suggesting a server error.

dig archive.today @1.1.1.2

Output (illustrative of a blocked domain):

; <<>> DiG 9.18.1-1ubuntu1.8-Ubuntu <<>> archive.today @1.1.1.2
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 34188
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;archive.today.                 IN      A

;; AUTHORITY SECTION:
.                       842     IN      SOA     a.root-servers.net. nstld.iana.org. 2023082501 1800 900 604800 86400

;; Query time: 10 msec
;; SERVER: 1.1.1.2#53(1.1.1.2) (UDP)
;; WHEN: Fri Aug 25 10:30:00 UTC 2023
;; MSG SIZE  rcvd: 115

In this output, status: NXDOMAIN is the critical indicator. It means that archive.today is effectively invisible to clients using this specific resolver. This is not a failure of the authoritative DNS server for archive.today; rather, it is an intentional policy decision by the recursive resolver to prevent resolution.

This blocking mechanism operates at a fundamental layer of internet access. When a browser, application, or operating system attempts to resolve archive.today via 1.1.1.2, it receives an NXDOMAIN and thus cannot establish a connection to the intended server. Users would experience this as the domain being unreachable or "not found," without necessarily understanding the underlying cause is a deliberate DNS block.

Impact on `archive.today` and the Broader Internet Ecosystem

The flagging of archive.today carries several significant implications.

Reduced Accessibility and Reach

For archive.today, the primary impact is a reduction in accessibility for a segment of internet users. While 1.1.1.2 users represent a fraction of global internet users, they are often more privacy-conscious or technically savvy, making this a notable disruption. Users relying on Cloudflare Warp, or those with routers configured to use 1.1.1.2, would be unable to access the service.

This incident underscores the power wielded by large recursive DNS resolvers. As internet traffic consolidates through a few major players (Cloudflare, Google, Cisco Umbrella, Quad9), their individual policy decisions can significantly affect global internet access to specific domains.

Reputation Damage and Cascading Effects

Being labeled as "C&C/Botnet" is extremely damaging to a domain's reputation. While Cloudflare's public radar specifically highlights this, other security vendors and threat intelligence feeds might either ingest Cloudflare's data or arrive at similar conclusions independently. This could lead to:

Further Blacklisting: Other DNS resolvers, firewalls, email security gateways, and web filtering solutions might independently add archive.today to their blocklists.
Reduced Trust: Users and organizations might perceive the service as untrustworthy, even if the flagging is contested or eventually reversed.
Service Provider Scrutiny: Upstream ISPs, hosting providers, or CDN services (if archive.today uses any) might increase their scrutiny or even terminate services if they believe archive.today is facilitating malicious activity.

The Nuance of Archiving Services and Potential Misinterpretation

Services like archive.today operate by crawling and storing snapshots of web pages. From a purely technical perspective, this involves:

Automated Fetching: Making numerous HTTP/S requests to various domains, often in rapid succession.
Diverse Sources: Retrieving content from potentially millions of unique domains, including those that might themselves be compromised or host malware.
Behavioral Similarities: These actions, when observed in isolation or through certain heuristics, could potentially mimic patterns associated with web scrapers, vulnerability scanners, or even botnet reconnaissance. For example, a large volume of requests from a single source IP to diverse, previously unseen domains, could trigger anomaly detection systems.

It is crucial to differentiate between a service archiving content that might contain malicious elements (e.g., a malware download link on an archived page) and the service itself being a C&C server or part of a botnet. The technical challenge for automated systems lies in making this distinction accurately. If archive.today inadvertently hosts a file or provides a link to content that later becomes a C&C, or if its infrastructure is ever compromised, it could trigger these flags. Also, the nature of web archiving can lead to traffic patterns that are unusual compared to typical user-driven browsing, potentially leading to false positives in some automated detection systems.

Technical Considerations for Domain Owners and Recourse

For any domain owner facing such a categorization, immediate technical investigation and action are paramount.

Proactive Measures and Continuous Monitoring

Prevention is key. Domain owners should implement robust security practices:

Regular Security Audits: Conduct penetration testing and vulnerability assessments of their infrastructure.
Web Application Firewall (WAF): Deploy a WAF to mitigate common web attacks and monitor suspicious traffic.
Intrusion Detection/Prevention Systems (IDS/IPS): Monitor network traffic for anomalies and known attack signatures.
Log Management and SIEM: Centralize logs from servers, applications, and network devices for comprehensive threat detection and forensic analysis.
Threat Intelligence Integration: Monitor relevant threat intelligence feeds and blocklists to see if their domain or associated IPs appear.
DNS Monitoring: Track DNS resolution for their domain, looking for unexpected changes or unusual query patterns.
HTTPS Everywhere: Ensure all traffic is encrypted to protect against eavesdropping and tampering.

Technical Remediation and Appeal Process

If a domain is flagged, the owner must conduct a thorough technical investigation:

Forensic Analysis: Examine server logs, network traffic, and application code for signs of compromise, malware, or misconfiguration that could explain the "C&C/Botnet" categorization. This includes checking for:
- Outbound connections to suspicious IPs/domains.
- Unusual inbound connection patterns.
- Presence of unknown files or processes.
- Unexpected changes to website content or configuration.
- DNS record manipulation (e.g., dynamic DNS updates not controlled by the owner).
Mitigation: If a compromise is found, isolate affected systems, clean malware, patch vulnerabilities, and rotate all credentials.
Documentation: Document all findings, the steps taken for remediation, and evidence of a clean bill of health.
Appeal: Submit this evidence to the flagging entity (in this case, Cloudflare, via their abuse reporting or appeal mechanism for Radar listings). The technical details provided must be compelling, demonstrating a clear understanding of the alleged issue and effective remediation. This process often involves demonstrating that the domain no longer exhibits characteristics associated with C&C activity.

# Example of forensic steps for a suspected compromised web server
# 1. Check open ports and active connections
sudo netstat -tulpn | grep LISTEN
sudo netstat -tulpn | grep ESTABLISHED

# 2. Review web server access logs for unusual requests
grep "POST /" /var/log/apache2/access.log | less
grep "php?s=" /var/log/apache2/access.log | less # Common for PHP vulnerabilities

# 3. Check for recently modified files
sudo find /var/www/html -mtime -7 -ls # Files modified in the last 7 days

# 4. Examine running processes for suspicious activity
ps aux | grep -v "grep"

# 5. Check scheduled tasks (cron jobs)
sudo crontab -l
sudo find /etc/cron.* -type f -exec cat {} \;

# 6. Analyze DNS queries originating from the server (if applicable)
# Requires specific logging or tools like tcpdump/wireshark if no prior logging
sudo tcpdump -i eth0 port 53

The challenge for flagged legitimate services is the lack of transparency in the initial flagging criteria. While security vendors like Cloudflare need to protect their methodologies, this can make it difficult for domain owners to diagnose and fix the specific issue that triggered the block, especially if it's a false positive or an obscure technical nuance.

Future Outlook: Centralization, DoH/DoT, and Decentralization Efforts

This incident highlights ongoing tensions in the internet ecosystem:

Centralization of DNS: The reliance on a few large recursive resolvers concentrates significant power in their hands. While beneficial for performance and security aggregation, it also means that a single entity's policy decision can disproportionately affect global access.
DNS over HTTPS/TLS (DoH/DoT): The adoption of encrypted DNS protocols further shifts DNS resolution towards these large providers. While improving privacy by preventing ISP snooping, it also makes it harder for local networks to implement their own filtering or monitoring, consolidating filtering power at the recursive resolver level.
Decentralization Efforts: Conversely, there's a growing interest in decentralized DNS solutions (e.g., Handshake, ENS, leveraging blockchain) to mitigate the risks of centralization and single points of control or failure. However, these technologies are still nascent and face significant adoption challenges.

The incident serves as a critical reminder of the complex interplay between security, accessibility, and the architectural choices underlying the internet. For engineers and system architects, it emphasizes the importance of understanding how domain reputation is built, how DNS policies are enforced, and the potential impact of being caught in the crossfire of automated threat detection systems. Building resilient and robust internet services requires continuous vigilance over security posture and active engagement with the ecosystem's gatekeepers to ensure legitimate services remain accessible.

For deep-dive technical analysis, network security assessments, or guidance on navigating complex infrastructure challenges, please visit https://www.mgatc.com for expert consulting services.

Originally published in Spanish at www.mgatc.com/blog/cloudflare-flags-archivetoday-cnc-botnet/

DEV Community

Cloudflare flags archive.today as 'C&C/Botnet'; no longer resolves via 1.1.1.2!

Cloudflare's DNS Infrastructure and Threat Intelligence

Recursive DNS Resolution Mechanics

Cloudflare Radar and Domain Categorization

Technical Methodologies for "C&C/Botnet" Detection

Passive DNS Analysis

Traffic Pattern Analysis

Malware Analysis and IoCs

Reputation Scoring and Machine Learning

The Challenge of False Positives

Technical Mechanisms of Non-Resolution by 1.1.1.2

Impact on `archive.today` and the Broader Internet Ecosystem

Reduced Accessibility and Reach

Reputation Damage and Cascading Effects

The Nuance of Archiving Services and Potential Misinterpretation

Technical Considerations for Domain Owners and Recourse

Proactive Measures and Continuous Monitoring

Technical Remediation and Appeal Process

Future Outlook: Centralization, DoH/DoT, and Decentralization Efforts

Top comments (0)

Cloudflare's DNS Infrastructure and Threat Intelligence

Recursive DNS Resolution Mechanics

Cloudflare Radar and Domain Categorization

Technical Methodologies for "C&C/Botnet" Detection

Passive DNS Analysis

Traffic Pattern Analysis

Malware Analysis and IoCs

Reputation Scoring and Machine Learning

The Challenge of False Positives

Technical Mechanisms of Non-Resolution by 1.1.1.2

Impact on archive.today and the Broader Internet Ecosystem

Reduced Accessibility and Reach

Reputation Damage and Cascading Effects

The Nuance of Archiving Services and Potential Misinterpretation

Technical Considerations for Domain Owners and Recourse

Proactive Measures and Continuous Monitoring

Technical Remediation and Appeal Process

Future Outlook: Centralization, DoH/DoT, and Decentralization Efforts

Impact on `archive.today` and the Broader Internet Ecosystem