DEV Community: daud99

Public Suffix List

daud99 — Sun, 27 Jul 2025 13:22:47 +0000

Public Suffix List (PSL) - Quick Reference

Key Rule: PSL entries CANNOT set cookies

`example.com` in PSL:

Who can set cookies: Only subdomains (a.example.com, b.example.com)
Cookie sharing: None (each subdomain isolated)
Think: "Each apartment rents independently, no shared lobby"

`*.example.com` in PSL:

Who can set cookies:
- ✅ example.com (shares to ALL descendants)
- ❌ a.example.com, b.example.com (they're public suffixes)
- ✅ child.a.example.com, child.b.example.com (but only for themselves)
Cookie sharing: Everyone reads example.com's cookies, but children can't share with each other
Think: "Hotel owner controls lobby, guests can't set room rules, but guests' visitors can"

Memory trick:

No * = Subdomains are independent owners
With * = Parent owns everything, subdomains are just public spaces (but their children can own again)

Bottom line: PSL creates a "cookie boundary" - determines who gets to host vs who just receives.

DNS Zone Files: The Blueprints of Domain Mapping

daud99 — Sun, 18 May 2025 14:44:07 +0000

In our previous blog post, we explored how DNS works to translate domain names into IP addresses. Today, we're going deeper into a critical component of DNS: zone files. These files are the actual blueprints that make the domain name system work behind the scenes.

What Is a DNS Zone File?

A zone file is simply a text file that lives on authoritative DNS servers. It contains the mapping between domain names and their corresponding IP addresses, along with other important resource records. Think of a zone file as a detailed address book for a specific section of the internet.

Each line in a zone file represents a different record, and each record serves a specific purpose in directing internet traffic to the right destination.

Where Do Zone Files Exist?

Zone files exist primarily on authoritative nameservers. Remember from our previous blog post that authoritative nameservers are the final authority for a particular domain. When you register a domain and set up its DNS, you're essentially creating and configuring the zone file that will live on those authoritative servers.

Here's where zone files fit in the DNS hierarchy:

Root servers don't typically use zone files in the traditional sense
TLD servers (like .com or .org) maintain zone files for their domains
Authoritative nameservers for your specific domain (like yourdomain.com) host the zone file containing all your domain's DNS records

When your registrar or DNS provider gives you a control panel to manage your domain's DNS settings, you're actually editing the information that will be written to the zone file on your authoritative nameservers.

Types of DNS Records in Zone Files

Let's look at the common types of records found in zone files, with examples of how each one looks.

1. Start of Authority (SOA) Record

The SOA record is like the cover page of your zone file. It marks the beginning of a zone and contains essential administrative information.

example.com.  IN  SOA  ns1.example.com. admin.example.com. (
                2023042601  ; Serial number
                3600        ; Refresh (1 hour)
                1800        ; Retry (30 minutes)
                604800      ; Expire (1 week)
                86400 )     ; Minimum TTL (24 hours)

This record tells us:

The primary nameserver is ns1.example.com
The administrator's email is admin@example.com (note that the @ is replaced with a dot in the record)
The serial number (2023042601) is like a version number that increases whenever you update the zone
The various time values tell other DNS servers how often to check for updates and how long to consider the data valid

Every domain belongs to exactly one DNS zone at any given time, and the SOA record defines that relationship.

2. Name Server (NS) Records

NS records specify which servers are authoritative for the domain. These are the servers that have the definitive information about your domain.

example.com.  IN  NS  ns1.example.com.
example.com.  IN  NS  ns2.example.com.

This example shows that two nameservers (ns1 and ns2) are authoritative for example.com.

3. Address (A) Records

A records are the most common type of DNS record. They map a domain or subdomain to an IPv4 address.

example.com.      IN  A  203.0.113.10
www.example.com.  IN  A  203.0.113.10
blog.example.com. IN  A  203.0.113.11

In this example:

The main domain points to 203.0.113.10
The www subdomain points to the same address
The blog subdomain points to a different server at 203.0.113.11

4. AAAA Records

AAAA records are just like A records, but for IPv6 addresses instead of IPv4.

example.com.  IN  AAAA  2001:0db8:85a3:0000:0000:8a2e:0370:7334

This record directs traffic for example.com to the IPv6 address shown.

5. Canonical Name (CNAME) Records

CNAME records create an alias from one domain name to another. They're useful for creating multiple domain names that all point to the same website.

shop.example.com.  IN  CNAME  example.com.
store.example.com. IN  CNAME  example.com.

With these records, both shop.example.com and store.example.com will direct users to the same place as example.com.

6. Mail Exchanger (MX) Records

MX records specify which servers handle email for your domain. They include a priority number (lower numbers have higher priority).

example.com.  IN  MX  10  mail1.example.com.
example.com.  IN  MX  20  mail2.example.com.

This configuration tells email servers to try delivering mail to mail1.example.com first. If that server is unavailable, they'll try mail2.example.com.

7. Text (TXT) Records

TXT records can hold arbitrary text and are often used for domain verification or security policies.

example.com.  IN  TXT  "v=spf1 include:_spf.example.com ~all"

This example shows an SPF (Sender Policy Framework) record that helps prevent email spoofing.

How Zone Files Work in Practice

Let's see how these records work together in a simplified zone file for example.com:

; Zone file for example.com
$TTL 86400 ; Default TTL is 24 hours
example.com.  IN  SOA  ns1.example.com. admin.example.com. (
                2023042601  ; Serial number
                3600        ; Refresh
                1800        ; Retry
                604800      ; Expire
                86400 )     ; Minimum TTL

; Nameservers
example.com.  IN  NS  ns1.example.com.
example.com.  IN  NS  ns2.example.com.

; A records for nameservers
ns1.example.com.  IN  A  203.0.113.1
ns2.example.com.  IN  A  203.0.113.2

; Main domain and www subdomain
example.com.      IN  A  203.0.113.10
www.example.com.  IN  CNAME  example.com.

; Email configuration
example.com.  IN  MX  10  mail.example.com.
mail.example.com.  IN  A  203.0.113.20

; Text record for email security
example.com.  IN  TXT  "v=spf1 include:_spf.example.com ~all"

When someone types www.example.com into their browser, DNS resolvers follow this chain of records to find the right IP address:

They see the CNAME pointing www.example.com to example.com
They look up the A record for example.com
They find the IP address 203.0.113.10
The browser connects to that IP address

Zone Files vs. DNS Zones

It's important to distinguish between a zone file and a DNS zone:

A DNS zone is a portion of the domain namespace for which a specific organization or administrator is responsible
A zone file is the physical text file that contains the record information for that zone

A single DNS zone might be spread across multiple nameservers for redundancy, but they all use copies of the same zone file information.

Managing Your Zone Files

Most domain owners never need to edit zone files directly. Instead:

If you use a registrar's DNS service, you'll manage records through their web interface
If you host your own DNS, you might edit zone files on your servers or use DNS management software
If you use a DNS service like Cloudflare or Route 53, you'll use their control panels

The changes you make through these interfaces eventually translate to updates in the zone files on the authoritative nameservers.

What Records Do TLD Servers Actually Store?

A common point of confusion about DNS is understanding exactly what information is stored at each level of the hierarchy. Let's clarify what records TLD servers (like those for .com, .org, etc.) actually keep.

TLD Servers: Minimalist by Design

TLD servers are surprisingly minimalist in what they store. For each domain under their authority, they typically maintain only:

NS records - These point to the authoritative nameservers for each domain
Glue records - These are A records for those nameservers (but only when necessary)

That's it! TLD servers don't store:

Regular A records for websites
CNAME records for subdomains
MX records for email
TXT records for verification
Any other resources for domains under them

Here's what the .com TLD servers might store for google.com:

google.com.  IN  NS  ns1.google.com.
google.com.  IN  NS  ns2.google.com.
google.com.  IN  NS  ns3.google.com.
google.com.  IN  NS  ns4.google.com.

; Glue records
ns1.google.com.  IN  A  216.239.32.10
ns2.google.com.  IN  A  216.239.34.10
ns3.google.com.  IN  A  216.239.36.10
ns4.google.com.  IN  A  216.239.38.10

The TLD servers know nothing about www.google.com, mail.google.com, or any Google services. They only know "if someone asks about google.com, send them to these nameservers."

This delegation approach is what makes DNS scalable. Imagine if Verisign (who manages .com) had to store every single DNS record for every .com domain in the world!

How TLD Servers Deliver Both Nameservers and Their IPs

When your computer asks a .com TLD server about google.com, something clever happens in a single transaction:

The TLD server returns the NS records saying "ask Google's nameservers"
In the same response, it includes the glue records with the IP addresses of those nameservers

This all happens in one query-response cycle. Here's what that response looks like:

;; QUESTION SECTION:
;google.com.            IN    NS

;; ANSWER SECTION:
google.com.        172800    IN    NS    ns1.google.com.
google.com.        172800    IN    NS    ns2.google.com.
google.com.        172800    IN    NS    ns3.google.com.
google.com.        172800    IN    NS    ns4.google.com.

;; ADDITIONAL SECTION:
ns1.google.com.    172800    IN    A    216.239.32.10
ns2.google.com.    172800    IN    A    216.239.34.10
ns3.google.com.    172800    IN    A    216.239.36.10
ns4.google.com.    172800    IN    A    216.239.38.10

The "ADDITIONAL SECTION" contains those essential glue records with IP addresses.

Why Glue Records Are Necessary: Breaking the Circular Dependency

Without glue records, we'd face a classic chicken-and-egg problem:

"I need to ask ns1.google.com about google.com"
"But I don't know ns1.google.com's IP address"
"To find ns1.google.com's IP, I need to resolve that domain name"
"But that would require asking the nameservers for google.com..."
And we're back to step 1 in an infinite loop!

Glue records break this circular dependency by providing the IP addresses directly in the TLD response, allowing your DNS resolver to immediately connect to the authoritative nameservers without additional queries.

Think of it like calling a business and asking for a specific department. Instead of just saying "you need to speak with our Technical Department," they also give you the direct extension number so you don't have to call the main line again.

Conclusion

Zone files are the hidden blueprints that make DNS work. They contain all the crucial information needed to direct internet traffic to the right destinations. By understanding the different types of DNS records and how they function together in zone files, you gain valuable insight into how your domain's presence on the internet is defined and managed.

The beauty of DNS lies in its hierarchical delegation model. Each level knows just enough information to direct queries to the next level, with TLD servers playing a crucial role in directing traffic without needing to store excessive information.

Whether you're troubleshooting DNS issues, migrating to a new hosting provider, or simply curious about how the internet works, knowing about zone files helps you understand the fundamental address system of the web.

Understanding DNS: How Domain Names Become IP Addresses

daud99 — Sun, 18 May 2025 01:24:08 +0000

Understanding DNS: How Domain Names Become IP Addresses

In our previous blog post, we covered domains and their various aspects. Now, let's dive deeper into DNS (Domain Name System) - the backbone of internet navigation.

What is DNS?

DNS, or Domain Name System, is essentially a translator for the internet. Its primary function is simple yet crucial: it converts human-friendly domain names (like example.com) into machine-readable IP addresses that computers use to identify each other. Without DNS, you'd need to memorize numeric IP addresses instead of easy-to-remember domain names.

Key Players in the Domain World: Registries vs. Registrars

Before we explore how DNS works, let's clarify two important roles that make the domain system possible:

Registry: The Domain Database Managers

A registry is an organization that manages a specific top-level domain (TLD)
For example, Verisign manages the .com and .net TLDs, while Public Interest Registry manages .org
Registries maintain the authoritative database for their TLDs
They operate the TLD nameservers that direct traffic to the correct domain
Registries don't sell domains directly to users

The Domain Hierarchy: Who Controls What

Think of the domain system as a tree:

Root (.)
  |
  ├── .com (managed by Verisign)
  |     |
  |     ├── amazon.com (managed by Amazon's nameservers)
  |     |
  |     └── google.com (managed by Google's nameservers)
  |
  └── .org (managed by Public Interest Registry)
        |
        └── wikipedia.org (managed by Wikimedia's nameservers)

Each registry only manages its specific level in this tree:

Verisign operates nameservers authoritative for .com itself
- These TLD nameservers know which nameservers are responsible for each individual .com domain
- But they don't store the actual IP addresses for websites
Domain owners (like Amazon) operate nameservers authoritative for their specific domains
- Amazon runs ns1.amazon.com, ns2.amazon.com, etc.
- These servers contain all the DNS records (IP addresses, mail servers, etc.) for amazon.com

The registry simply maintains a database that says: "For information about amazon.com, ask Amazon's nameservers."

Why TLDs Don't Store IP Addresses: The Power of Delegation

You might wonder: "If the .com TLD servers already have a list of all .com domains, why don't they just store the IP addresses directly? Wouldn't that be faster by removing a step?"

This delegation approach is actually a brilliant design decision for several reasons:

Scalability: There are hundreds of millions of .com domains, each with multiple DNS records. By delegating to authoritative nameservers, TLD servers remain manageable and efficient.
Distributed Control: Domain owners can update their DNS records (change IPs, add subdomains, configure email) without involving the registry. You control your domain through your nameservers.
Flexible Updates: Websites change servers, companies add new services, and IP addresses get updated frequently. If all these changes had to go through the TLD servers, it would create a massive bottleneck.
Separation of Responsibilities: Verisign (the .com registry) focuses on maintaining the integrity of the TLD, while you focus on managing your specific domain's records.
System Resilience: Distributing DNS across thousands of nameservers creates redundancy. If all .com records were in one place, it would be a single point of failure.

This delegation model is like a phone company maintaining a list of office building addresses, but letting each building manage its own internal directory of employee extensions. It's more efficient for everyone!

Flexible Nameserver Arrangements: Breaking the Hierarchy

While the domain system is hierarchical, nameservers don't have to follow this hierarchy. This creates flexibility in how domains are managed:

A single nameserver can handle domains across different levels and TLDs:

Your hosting provider's nameserver (like ns1.hostgator.com) might handle:
- yourbusiness.com
- yourfriend.org
- blog.someoneelse.net

This is like having one receptionist who knows about multiple unrelated businesses!

Examples of flexible nameserver arrangements:

Hosting providers manage millions of unrelated domains on the same nameservers
Large companies might use their nameservers for multiple brands (Google's nameservers handle google.com, youtube.com, gmail.com)
Specialized services might take over part of your domain (blog.yoursite.com might use your blogging platform's nameservers)

What matters is not who owns the nameserver, but which nameserver is registered as authoritative for each domain. You can mix and match however works best for your needs.

Registrar: Your Domain Service Provider

A registrar (like GoDaddy, Namecheap, or Google Domains) is accredited by ICANN to sell domains
They act as the middlemen between you and the registry
Registrars handle domain registration, renewals, transfers, and DNS management
When you buy a domain, your registrar communicates with the appropriate registry to record your ownership

How Domain Registration Works

When you register a domain like "yourblog.com":

You visit a registrar's website and check if the domain is available
The registrar queries the .com registry (Verisign) to verify availability
You purchase the domain through the registrar
The registrar sends your information to the registry
The registry adds your domain to its database
The registry updates its nameservers with information about your domain's authoritative nameservers
These updates propagate through the DNS system (which can take 24-48 hours)

This centralized registry system ensures that no matter which registrar you use, there's only one authoritative source of truth for each TLD.

The DNS Resolution Process: A Step-by-Step Journey

When you type a URL into your browser, a fascinating sequence of lookups begins. Let's walk through this journey:

1. Browser and OS Cache Check

Your system first checks if it already knows the answer:

Browser DNS Cache: Your browser keeps a temporary record of recent DNS lookups. You can view this in some browsers (in Edge, type "edge://net-internals/#dns" in the address bar).
Operating System Cache: If not found in the browser, your OS checks its own cache. This local resolver is called a stub resolver.

2. The Recursive Resolver: Your DNS Detective

If the domain isn't found locally, the query leaves your computer with a recursive flag set to true, heading to a DNS recursor server.

Think of the DNS recursor as a detective - it takes your case and investigates until it finds an answer. This server is typically provided by your Internet Service Provider (ISP) or public DNS services like Google's 8.8.8.8 or Cloudflare's 1.1.1.1.

The recursor first checks its own cache. If the information isn't there, it begins a journey through the DNS hierarchy.

3. The DNS Hierarchy: A Tree of Servers

The DNS system is structured as a hierarchical tree:

Root Servers: The recursor first contacts one of the 13 root server networks (labeled A through M). Despite being only 13 logical entities, these represent hundreds of physical servers distributed globally, operated by 12 independent organizations.
Top-Level Domain (TLD) Servers: The root server points the recursor to the appropriate TLD server (like .com, .org, or .net).
Authoritative Nameservers: The TLD server directs the recursor to the authoritative nameservers for the specific domain. These servers hold the actual DNS records (including IP addresses) for the domain you're looking for.

4. Finding the Final Answer

The recursor contacts the authoritative nameserver, which responds with the IP address for the requested domain. This information then flows back through the chain to your browser, which can finally connect to the website.

Intelligent Shortcuts: How DNS Optimizes Lookups

DNS resolvers are smart - they don't just cache complete domain resolutions. They remember:

Addresses of root servers
Addresses of TLD servers (like .com)
Addresses of authoritative nameservers

This strategic caching means that:

If a resolver has seen a .com domain before, it can skip the root server step and go directly to the .com TLD server
If it recognizes the authoritative nameservers for a domain, it can bypass both root and TLD servers

These shortcuts significantly speed up DNS resolution for frequently accessed domains.

Solving the Chicken-and-Egg Problem: Glue Records

Here's an interesting puzzle: If nameservers often have domain names themselves (like ns1.example.com), how do we resolve their domains without creating an infinite loop?

The solution is glue records. When a domain is registered, the registrar provides not just the nameserver's domain name but also its direct IP address to the TLD server. This breaks the circular dependency, allowing resolvers to find the nameserver's IP without having to resolve another domain name.

Types of DNS Queries

The DNS resolution process involves three distinct query types:

Recursive Queries: Like asking a librarian to find a book for you. You expect a complete answer (the book) or a definitive "we don't have it."
Iterative Queries: Like asking a librarian which section to look in, then going there yourself. The server gives you directions to the next stop, but you continue the journey yourself.
Non-recursive Queries: Like asking for a book the librarian is already holding. These are quick responses for information the DNS server already has in its cache or is directly responsible for.

Reverse DNS: Looking Up Names from IP Addresses

While standard DNS answers "What IP address does example.com have?", Reverse DNS answers "What domain name is using IP address 93.184.216.34?"

This process uses the special .in-addr.arpa TLD (for IPv4) or .ip6.arpa (for IPv6). Reverse DNS is commonly used for:

Email server verification (reducing spam)
Server logging (showing domain names instead of IP addresses)
Network troubleshooting
Security monitoring

Conclusion

The Domain Name System is a marvel of internet engineering. Its distributed, hierarchical design allows billions of DNS queries to be resolved daily with remarkable efficiency.

Understanding DNS involves recognizing the roles of different entities:

Registries maintain the authoritative databases for TLDs
Registrars provide the interface between users and registries
DNS Servers (from root servers to your ISP's resolvers) work together to translate domains to IP addresses

This knowledge helps you:

Troubleshoot website connection issues
Make smarter decisions about hosting and domain management
Better understand who controls different aspects of your online presence
Appreciate how the internet maintains its user-friendly face

References

The DNA of a Domain: Understanding DNS, FQDNs, and Domain Structures

daud99 — Sat, 17 May 2025 22:37:34 +0000

Domains are to the internet what names are to humans, making identification simple and intuitive for everyone. Just as we give people names instead of describing their physical characteristics each time we refer to them, domains give websites and online resources readable names instead of complex numerical addresses. When you type 'google.com' instead of having to remember a string of numbers like '172.217.168.238', you're benefiting from the domain name system that makes the internet accessible to everyone.

One key difference from human names is that while many people can share the same name in the real world, a single domain can only point to one destination at a time on the internet. However, you can have multiple domains all pointing to the same resource, similar to having several nicknames that all refer to you. Domains essentially translate the technical infrastructure of the internet into a language we can easily understand and remember, bridging the gap between complex technology and everyday human interaction.

Let's see this DNS resolution in action with a simple command-line tool:

# Using 'dig' to see how a domain resolves to an IP address
$ dig google.com +short
142.250.72.110

# Using 'nslookup' for the same purpose
$ nslookup google.com
Server:     192.168.1.1
Address:    192.168.1.1#53

Non-authoritative answer:
Name:   google.com
Address: 142.250.72.110

This translation happens behind the scenes every time you visit a website, allowing you to use memorable names instead of numerical addresses.

Domain Characteristics

Domain names can only contain letters (a-z, A-Z), numbers (0-9), and hyphens. A hyphen cannot appear at the beginning or end of a domain name. Any other characters are considered invalid for standard domains. The dot character serves a special purpose in domains - it separates different levels of the domain hierarchy rather than being part of the domain name itself.

An important characteristic to remember is that domains are case-insensitive, meaning GOOGLE.COM and google.com are treated as identical. This makes domains even more user-friendly, as you don't need to worry about uppercase or lowercase when typing a web address.

I would recommend REGEX used in IOCSEARCHER for reference to validate domain.

Internationalized Domain Names

Initially, domains were limited to ASCII characters, which only allowed for about 128 different characters. This restriction prevented many languages from using their native scripts in domain names. To address this limitation, Internationalized Domain Names (IDNs) were developed to support characters from languages like Chinese, Russian, Hindi, and many others.

While you can register domains with these non-ASCII characters, they're ultimately converted to Punycode through the bootstring algorithm outlined in RFC-3492. When an internationalized domain is processed, the system adds "xn--" to each part containing non-ASCII characters. For example, café.com becomes xn--caf-dma.com, and مثال.إختبار becomes xn--mgbh0fb.xn--kgbechtv. This is why domains cannot have hyphens as the third and fourth characters unless they're IDNs - it would conflict with this encoding system.

You can see this conversion in action with some simple Python code:

import idna

# Convert internationalized domain names to Punycode
examples = ['café.com', 'привет.рф', 'よろしく.jp']

for domain in examples:
    punycode = idna.encode(domain).decode('ascii')
    print(f"{domain} → {punycode}")

# Output:
# café.com → xn--caf-dma.com
# привет.рф → xn--b1agh1afp.xn--p1ai
# よろしく.jp → xn--28j2a3ar1p.jp

This internationalization has unfortunately led to security concerns like homograph attacks, where visually similar characters from different scripts can create convincing fake domains. For instance, раypal.com using the Cyrillic 'р' looks nearly identical to paypal.com but leads to a completely different website. Modern browsers have implemented protections against many of these tricks, but users should remain vigilant.

Domain Levels and the DNS Hierarchy

The domain hierarchy is organized into levels, each separated by a dot. The Top-Level Domain (TLD) like .com is the highest level. Adding sections creates new levels - one.com is a Second-Level Domain (2LD), while two.one.com represents a Third-Level Domain (3LD).

Here's a visual breakdown of domain levels:

┌─ 3rd Level Domain ─┐ ┌─ 2nd Level Domain ─┐ ┌─ TLD ─┐
         blog         .        example       .   com
└───────────────────────────── FQDN ────────────────────┘

Common examples of domain levels in practice:

TLD: .com, .org, .net, .edu
2LD: google.com, wikipedia.org, amazon.com
3LD: mail.google.com, en.wikipedia.org, aws.amazon.com
4LD: support.mail.google.com

Each level serves a specific organizational purpose in the hierarchical domain name system.

SLD and eSLD: Understanding Domain Registration Boundaries

SLD (Second-Level Domain)

The SLD is the label immediately to the left of the public suffix (not just the TLD)
It's the specific part that identifies the registrant's domain
Examples:
- In google.com, the SLD is google (left of the .com public suffix)
- In example.co.uk, the SLD is example (left of the .co.uk public suffix)
- In blog.wordpress.com, the SLD is blog (left of the wordpress.com public suffix)

eSLD (Effective Second-Level Domain)

The eSLD is the complete registrable domain - the domain at which registration occurs
It consists of the SLD plus the public suffix
It represents the boundary of administrative control
Examples:
- In google.com, the eSLD is google.com
- In example.co.uk, the eSLD is example.co.uk
- In user.github.io, the eSLD is user.github.io
- In mypage.blogspot.com, the eSLD is mypage.blogspot.com

Key Distinction

SLD: Just the identifying label portion controlled by the registrant
eSLD: The complete domain that represents the unit of ownership/registration, including both the SLD and its public suffix

The critical factor is understanding the public suffix list, which defines which domain suffixes are available for public registration (like .com, .co.uk, github.io, etc.).

Apex Domain (Root Domain or Naked Domain)

The apex domain (also called the root domain or naked domain) refers to a domain without any subdomain prefix. It's the base domain that you register with a domain registrar.

Key characteristics:

It has no subdomain part (no "www" or other prefix)
It's directly at the "apex" of your domain namespace
It cannot have a CNAME record in standard DNS (only A, AAAA, MX, TXT, etc.)
It's the entry point to your domain's DNS zone

Examples:

example.com (not www.example.com)
github.io (not username.github.io)
mydomain.co.uk (not blog.mydomain.co.uk)

The apex domain is particularly important in DNS configuration and web hosting setups. Many CDNs and cloud providers have special requirements or limitations for apex domains due to DNS constraints. Some services offer workarounds like ANAME, ALIAS, or CNAME flattening to overcome these limitations.

Understanding the apex domain is crucial when configuring websites, email services, and other internet resources, as it represents the foundation of your domain's identity on the internet.

FQDN vs Domain Name: What's the Real Difference?

When navigating the world of DNS and internet naming, terms like FQDN (Fully Qualified Domain Name) and domain name often get used interchangeably — but they're not the same. Understanding the distinction is essential for developers, sysadmins, and anyone dealing with network configuration or web services.

What Is a Domain Name?

A domain name is a human-readable address used to identify resources on the internet. It typically consists of:

A second-level domain (SLD) like example
A top-level domain (TLD) like .com, .org, or .net

Examples:

example.com
openai.org

These are domain names — they can represent a website, a zone in DNS, or even serve as a base for email routing.

What Is an FQDN (Fully Qualified Domain Name)?

An FQDN is the complete address of a host within the DNS hierarchy, including all levels of the domain, right up to the root (.).

Structure of an FQDN:

hostname.subdomain.domain.tld.

✔️ The trailing dot (.) is optional in most real-world usage but technically represents the DNS root.

Examples:

www.example.com.
mail.google.com.
api.openai.org.

An FQDN unambiguously identifies a specific resource (usually a host or service) on the internet.

Key Differences Between Domain Name and FQDN

Feature	Domain Name	FQDN
Hierarchy Depth	Partial	Full
Includes Hostname?	Not necessarily	Yes
Ends with Root Dot?	No (implied)	Yes (optional, implied)
Example	`example.com`	`www.example.com.`
DNS Resolution?	Yes, if configured	Yes, if configured

Can a Subdomain Be at the Leaf (Instead of a Hostname)?

Yes. The leftmost part of a name like blog.example.com could be:

A hostname
A subdomain

Bottom line: The leaf node in an FQDN is not always a hostname. It depends on how DNS records are configured.

Are Both Domain Names and FQDNs Resolvable?

Yes, as long as they have the necessary DNS records.

If there are no DNS records, then neither will resolve.

Domain Name System Hierarchy Explained

The Domain Name System (DNS) has a hierarchical structure similar to a family tree or an organizational chart. Here's how it works in simple terms:

The Root - At the very top of the hierarchy is what's called the "root," represented by a single dot (.).
Top-Level Domains (TLDs) - The next level down contains domains like:
- Generic TLDs: .com, .org, .net, .edu
- Country-code TLDs: .uk, .fr, .jp, .ca

Official TLD List: Authoritative Sources

The official list of Top-Level Domains (TLDs) is maintained by the Internet Assigned Numbers Authority (IANA), which operates under the Internet Corporation for Assigned Names and Numbers (ICANN). This authoritative registry contains all recognized TLDs in the global DNS root zone.

Where to find the official TLD list:

IANA Root Zone Database: The most authoritative source, available at https://www.iana.org/domains/root/db
ICANN TLD Program: Information about new gTLDs: https://newgtlds.icann.org/
Public Suffix List: Maintained by Mozilla, this list includes both TLDs and public suffixes: https://publicsuffix.org/

The IANA Root Zone Database categorizes TLDs into several types:

gTLD (Generic Top-Level Domain): .com, .org, .net, .info
ccTLD (Country Code Top-Level Domain): .us, .uk, .jp, .de
sTLD (Sponsored Top-Level Domain): .edu, .gov, .mil
IDN ccTLD (Internationalized Country Code): .рф (Russia), .中国 (China)
New gTLD: .app, .blog, .dev, .shop

The number of TLDs has expanded dramatically since 2013 when ICANN's New gTLD Program introduced hundreds of new generic TLDs. The root zone is regularly updated as new TLDs are approved and added to the global DNS system.

For developers and security professionals, programmatic access to the IANA database is possible, and many APIs and libraries offer routines to check or validate domains against the current TLD list.

Second-Level Domains - These are the names organizations register, like "google" in google.com or "bbc" in bbc.co.uk.
Subdomains - These are additional levels that organizations can create, like "mail" in mail.google.com or "news" in news.bbc.co.uk.

Think of it like a mailing address:

The root is like the planet
The TLD is like the country
The second-level domain is like the city
Subdomains are like the street and building

When you type a web address, your computer follows this hierarchy from right to left to find the correct destination. It starts at the root, then follows the path down through each level until it reaches the specific website or service you're looking for.

Domain Constraints

There are technical limitations to domains. A complete domain name cannot exceed 253 characters, with each label (section between dots) limited to 63 characters. The domain system allows for up to 127 labels, including the root level, though such deep hierarchies are rarely used in practice.

These constraints ensure that domain names remain manageable and compatible with the underlying DNS infrastructure. While most domain registrations use just two or three levels, the system's flexibility allows for more complex organizational structures when needed.

What is Rootkit?

daud99 — Mon, 01 Aug 2022 08:38:09 +0000

When you listen about rootkit and if you are a linux user first thing that comes to your mind will be this has some thing to do with root user. And, you are not wrong but it's a part of it. Let's define it formally.

Rootkit is a program that can hide itself as well as other running processes, files, network connections from the host where it is running.

What is the utmost goal of the rootkit?

The main goal is to run incognito meaning running in the background for as long as it is possible.

What is the typical functionality or characteristics of rootkit?

1- Stealth Functionality

It aims to hide the traces of intruder by manipulating processes, open files, network activity, changing access rights/permission of different files and directory.

2- backdoor

One of the main goal of rootkit is to make sure that intruder have full remote access to the victim's computer all the time. For e.g: rootkit may establish a backdoor using ssh tunneling.

3- Sniffing

It also enables attacker to wiretapping and intercepting various system components may be sending data to a particular end point or installing a keylogger.

What is the biggest challenge to the attacker?

The biggest challenge that also differentiate rootkit from other types of malware is the fact that rootkit need to be installed with root privileges in the first place.

Types of Rootkits

User-mode rootkit: A user-mode rootkit covertly replaces common UNIX binaries or libraries with infected versions to hide its existence and to gain root privileges if needed.
Kernel-mode rootkit: A kernel-mode rootkit operates on the system level and modifies or replaces the kernel which may have been affected in the boot process.

This is a good blog on Linux boot process.

Installing Ubuntu using VMWare fusion tech Preview on MAC M1 silicon

daud99 — Sun, 24 Jul 2022 09:44:00 +0000

Download Ubuntu ISO

Download the ISO named "ubuntu-20.04.4-live-server-arm64.iso" from here.

Create a new Virtual Machine

1- Click on "Install from disk or image".

2- Browse and select to the ISO Ubuntu file downloaded.

3- Click Finish.

It will automatically start the Virtual Machine. Simply shut it down.

Disabling Network Adapter

1- Go to the Settings

2- Select Network Adapter

3- Deselect Connect network adapter

Installing Ubuntu

1- Now, start the VM.
2- Click on "Install Ubuntu Server"
3- Select all the defaults and keep on going.
4- Eventually, you will reach here.

5- Once install You can see that "installing system" changes to "Install Complete!". Then, click on "Reboot Now" and Press "Enter" again.

6- The VM will stuck on the following screen simply Press "Enter" to continue.

7- Enable the Network adapter back again by going to the Settings > Network Adapter > Selecting "Connect Network Adapter" option.

8- You will be prompt to Enter Username and password which you configured while installing the operating system.
9- You will reach the following screen.

10- Also remove the ISO by going to settings and unselecting "CD/DVD (SATA)

Updating repositories and rebooting

sudo apt update
sudo reboot

Installing Desktop Environment

sudo apt install tasksel
sudo tasksel install ubuntu-desktop
sudo reboot

Install VMWare tools

sudo apt install -y open-vm-tools-desktop
sudo reboot now

Take Snapshot for fresh installation

We can revert back to it in case there is some issue.

Perceptron

daud99 — Fri, 08 Jul 2022 16:38:21 +0000

Perceptron Model

You can think of it as a basis of neural network. The motivation for perceptron is taken from the human neuron.

Input are going to neuron which perform some sort of computation on it in order to give the output. The functionality inside the neuron is often referred as Activation Function.

Now, in order to do the learning we need to adjust some parameters these parameters are known as weights. These weights get multiplied to the input.

There is still a problem. What if the input is zero? Doesn't matter what change we do to the w nothing is going to happen. In order to solve this problem we will add a bias to each input.

The interesting thing is the multiplication of input and it's weight has to overcome the bias in order to have some effect on the output.

The value of both the weight and bias can be positive and negative.

Mathematically our generalisation is

Feature Scaling In Machine Learning

daud99 — Thu, 02 Jun 2022 16:09:38 +0000

Feature Scaling

The process of making all the features or independent variables (Variable other than target variable) on almost the same scale so that each feature is equally important.

Example:

This is a dataset that contains an independent variable (Purchased) and 3 dependent variables (Country, Age, and Salary). We can easily notice that the variables are not on the same scale because the range of Age is from 27 to 50, while the range of Salary going from 48 K to 83 K. The range of Salary is much wider than the range of Age. This will cause some issues in our models since a lot of machine learning models such as k-means clustering and nearest neighbor classification are based on the Euclidean Distance.

Methods for Feature Scaling

There are different method of feature scaling.

Standardization (Z-score Normalization)
Max-Min Normalization (Min-Max Scaling)
Standard Deviation Method
Range Method

1. Standardization

Standardization means you're transforming your data so that fits within specific scale/range, like 0-100 or 0-1. The features are rescaled such that it's mean and standard deviation are 0 and 1, respectively.

The data distribution with mean and standard deviation 0 and 1 respectively indicates Standard Normal Distribution. This is also know as Z-Score Normalization.

Well, the idea is simple. Variables that are measured at different scales do not contribute equally to the model fitting & model learned function and might end up creating a bias. Thus, to deal with this potential problem feature-wise standardized (μ=0, σ=1) is usually used prior to model fitting.

Standardization comes into picture when features of input data set have large differences between their ranges, or simply when they are measured in different measurement units (e.g., Pounds, Meters, Miles … etc).

These differences in the ranges of initial features causes trouble to many machine learning models. For example, for the models that are based on distance computation, if one of the features has a broad range of values, the distance will be governed by this particular feature.

To illustrate this with an example : say we have a 2-dimensional data set with two features, Height in Meters and Weight in Pounds, that range respectively from [1 to 2] Meters and [10 to 200] Pounds. No matter what distance based model you perform on this data set, the Weight feature will dominate over the Height feature and will have more contribution to the distance computation, just because it has bigger values compared to the Height. So, to prevent this problem, transforming features to comparable scales using standardization is the solution.

The following formula is used to perform Standardization for each value of the feature.

Python Implementation

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() 
data_scaled = scaler.fit_transform(data)

2. Max-Min Normalization

It is also known as Min-Max Scaling. Also in this blog, it is also being called simply Scaling

However, in most of the places I came across it is simply known as Normalization.

I Know this is confusing. Lol! But this is how I understand this.

It is defined as

"Technique in which values are shifted and rescaled so that they end up ranging between 0 and 1."

Here,s the formula

Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively.

Python Implementation

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler() 
data_scaled = scaler.fit_transform(data)

3. Robust Scaling

Use the RobustScaler that will just scale the features but in this case using statistics that are robust to outliers. This scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).

Scaling using median and quantiles consists of subtracting the median to all the observations and then dividing by the interquartile difference. It Scales features using statistics that are robust to outliers.

The interquartile difference is the difference between the 75th and 25th quantile:

IQR = 75th quantile — 25th quantile

The equation to calculate scaled values:

X_scaled = (X — X.median) / IQR

Python Implementation

from sklearn.preprocessing import RobustScaler
scaler = RobustScaler() 
data_scaled = scaler.fit_transform(data)

What are the main question when it comes to feature scaling?

The thing is we need to answer two question mainly.

Does we need to do the feature scaling?
If yes, then which method of feature scaling we need to use Standardization, Normalization, etc.

When should we use feature scaling?

1- Gradient Descent Based Algorithms.

Linear Regression
Logistic Regression
Neural Networks
etc.

2- Distance Based Algorithms

KNN
K-means
SVM

When to perform standardization?

As seen above, for distance based models, standardization is performed to prevent features with wider ranges from dominating the distance metric. But the reason we standardize data is not the same for all machine learning models, and differs from one model to another.

So before which ML models and methods you have to standardize your data and why ?

1- BEFORE PCA:

In Principal Component Analysis, features with high variances/wide ranges, get more weight than those with low variance, and consequently, they end up illegitimately dominating the First Principal Components (Components with maximum variance). I used the word “Illegitimately” here, because the reason these features have high variances compared to the other ones is just because they were measured in different scales.

Standardization can prevent this, by giving same wheightage to all features.

2- BEFORE CLUSTERING:

Clustering models are distance based algorithms, in order to measure similarities between observations and form clusters they use a distance metric. So, features with high ranges will have a bigger influence on the clustering. Therefore, standardization is required before building a clustering model.

3- BEFORE KNN:

k-nearest neighbors is a distance based classifier that classifies new observations based on similarity measures (e.g., distance metrics) with labeled observations of the training set. Standardization makes all variables to contribute equally to the similarity measures .

4- BEFORE SVM

Support Vector Machine tries to maximize the distance between the separating plane and the support vectors. If one feature has very large values, it will dominate over other features when calculating the distance. So Standardization gives all features the same influence on the distance metric.

5- BEFORE MEASURING VARIABLE IMPORTANCE IN REGRESSION MODELS

You can measure variable importance in regression analysis, by fitting a regression model using the standardized independent variables and comparing the absolute value of their standardized coefficients. But, if the independent variables are not standardized, comparing their coefficients becomes meaningless.

This one is also known as Feature importance measuring.

6- BEFORE LASSO AND RIDGE REGRESSION

LASSO and Ridge regressions place a penalty on the magnitude of the coefficients associated to each variable. And the scale of variables will affect how much penalty will be applied on their coefficients. Because coefficients of variables with large variance are small and thus less penalized. Therefore, standardization is required before fitting both regressions.

When standardization is not needed?

LOGISTIC REGRESSION AND TREE BASED MODELS

Logistic Regression and Tree based algorithms such as Decision Tree, Random forest and gradient boosting, are not sensitive to the magnitude of variables. So standardization is not needed before fitting this kind of models.

When to do Normalization?

Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbours and Neural Networks.
However, at the end of the day, the choice of using normalization or standardization will depend on your problem and the machine learning algorithm you are using.
There is no hard and fast rule to tell you when to normalize or standardize your data. You can always start by fitting your model to raw, normalized, and standardized data and compare the performance for the best results.

Difference b/w normalization and standardization?

Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbours and Neural Networks.
Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.
However, at the end of the day, the choice of using normalization or standardization will depend on your problem and the machine learning algorithm you are using.
There is no hard and fast rule to tell you when to normalize or standardize your data. You can always start by fitting your model to raw, normalized, and standardized data and compare the performance for the best results.
It is a good practice to fit the scaler on the training data and then uses it to transform the testing data. This would avoid any data leakage during the model testing process. Also, the scaling of target values is generally not required.

Visualizing unscaled, normalized and standardized data?

After Normalization

After Standardization

How outliers are deal in Standardization VS. Normalization?

For Standardized data outliers exist as just they exist for the Original data. In contrast to standardization, in Normalized data the cost of having this bounded range is that we will end up with smaller standard deviations, which can suppress the effect of outliers. However, Normalization is still sensitive to outlier but a little less than Standardization.

Points worth noting

We can see that the Normalized data have different means. As, the MEAN changes so does the Standard Deviation. However, the Standardized data have the same MEAN.
Normalized data have the fixed range i.e. between 0 and 1. However, the range for Standardized data vary.
For Standardized data outliers exist as just they exist for the Original data. In contrast to standardization, in Normalized data the cost of having this bounded range is that we will end up with smaller standard deviations, which can suppress the effect of outliers. However, Normalization is still sensitive to outlier but a little less than Standardization.

1. https://www.linkedin.com/pulse/standardization-machine-learning-sachin-vinay/?trk=public_profile_article_view

2. https://www.kaggle.com/code/rtatman/data-cleaning-challenge-scale-and-normalize-data/notebook#Scaling-vs.-Normalization:-What's-the-difference?

3. https://www.kdnuggets.com/2020/04/data-transformation-standardization-normalization.html

4. https://builtin.com/data-science/when-and-why-standardize-your-data

5. https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/

6. https://towardsai.net/p/data-science/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff

7. https://towardsdatascience.com/how-and-why-to-standardize-your-data-996926c2c832

P-Value (Significane Value)

daud99 — Sun, 29 May 2022 17:26:00 +0000

P-value is the probability b/w 0 and 1 that quantifies how confident we are that our Null Hypothesis is True. The larger the value the more confidence we are that our Null Hypothesis is true and vice versa.

Explanation

Say we have a mean of a traffic coming to your website, then you made some changes and want to know that mean of traffic changes or not. So, you will start by establishing the Null hypothesis and Alternative Hypothesis. Null hypothesis is default and Alternative hypothesis is something which we want to prove. In this case, Null Hypothesis is Average Traffic doesn't change and Alternative Hypothesis is on Average Traffic to the website increases.

We will decide the significance Level. Now, you in order to check your alternative Hypothesis hold or not you take in the sample and calculate it's Mean/Average for Traffic. Then, we will calculate the probability of getting that Mean/Average given that Null hypothesis is True this is nothing but p-value. If it's less than significance level we will reject the Null Hypothesis if not we will not reject the Null Hypothesis. Rejecting Null Hypothesis is same as saying that we are confident that taking the random sample again we will get almost the same value that is not too far from the mean which is usually 3 z-score (3 standard deviation for normal distribution).

Interpreting P-value

The close the p-value is to zero. The more confidence we will be that Null Hypothesis is True and Alternative Hypothesis is False.

If p-value is less than the significance value which is usually 0.05. Then, we say that event for which we are getting this value is much away from the mean and it's so much extreme that we really need to reject the Null hypothesis.

Here, the grey shaded area represent the area in which value falls if our p-value for an event falls under a significance level say 0.05 which is equivalent of saying that we will be rejecting the Null Hypothesis.

The p-value doesn't tell us far our value is from the actual value but only tells us how confidence we are on our value it's correct or not.

Significance Level

Usually, value for Significance level is 0.05. However, it may get suggested by a domain expert. This is also known as Decision Threshold.

If we can allow a greater number of False positive given our problem is not that sensitive we can have larger value for Significance Level such as 0.20. Similarly, if we have a sensitive problem such as predicting a cancer we will try to have a smaller value for Significance level such as 0.01.

Rejecting a null hypothesis at .01 level meaning that there is less than a 1 in 100 chance of observing a result in this range if the Null Hypothesis were true.

False Positive

Getting a small value for p-value that is less than significance level is also known as False Positive.

References

General confusion related to Feature Selection

daud99 — Fri, 27 May 2022 09:49:23 +0000

Should I do Feature Selection on the entire dataset?

The answer is NO.

The reason being this results in Bais and data leakage. As the matter of fact we always make sure that our TEST data is absolutely unknown and it's only available to assess the performance of our machine learning model. If we are performing Feature Selection on entire dataset this statement doesn't hold true any more.

The model has an unfair advantage as the Features are selected based on all the samples.

When should we do the feature selection?

Firstly, you should split your data into Train and Test Data.
Then, You should do the feature selection on the Training data.
Once, you done the feature selection on the Training data you can train your model.
Now, you can select the same features from the Testing data and perform the prediction.

How our feature selection is effected in case of K Fold Cross Validation usage?

Thing is the order remains the same. First split and then do the Feature Selection.

"CV methods are proven to be unbiased only if all the various aspects of classifier training takes place inside the CV loop. This means that all aspects of training a classifier e.g. feature selection, classifier type selection and classifier parameter tuning takes place on the data not left out during each CV loop. It has been shown that violating this principle in some ways can result in very biased estimates of the true error. "

The right way to Cross Validate with feature selection

scores = []

for train, test in KFold(len(y), n_folds=5):
    xtrain, xtest, ytrain, ytest = x[train], x[test], y[train], y[test]

    b = SelectKBest(f_regression, k=2)
    b.fit(xtrain, ytrain)
    xtrain = xtrain[:, b.get_support()]
    xtest = xtest[:, b.get_support()]

    clf.fit(xtrain, ytrain)    
    scores.append(clf.score(xtest, ytest))

    yp = clf.predict(xtest)
    plt.plot(yp, ytest, 'o')
    plt.plot(ytest, ytest, 'r-')

plt.xlabel("Predicted")
plt.ylabel("Observed")

print("CV Score is ", np.mean(scores))

Should I do Feature encoding such as One hot or Ordinal encoding before or after the Feature Selection?

One should do Feature encoding before the Feature selection. One intuition behind it can be as our main aim is to use Encoded feature in our machine learning model then we should find it's importance as well in the way it needs to be used in the model.

References

Embedded Methods for Feature Selection

daud99 — Thu, 26 May 2022 18:01:11 +0000

L1 Regularized Logistic Regression

Let's have a brief overview of Regularization.

Regularization help us with the problem of overfitting model on the training dataset. Instead of just decreasing the LOSS function we also penalize the model complexity.

There are different form of Regularizatoin.

L1 Regularization.
L2 Regularization.

We will be discussing them in detail in the future blogs.

Using L1 (LASSO) Regularization

This L1 regularization can also be use as one of the method of feature selection.

We will discuss the loss function minimising part in some other blog. For this discussion, the interesting part is the N1 Norm it plays the main role in feature selection. This N1 Norm is the measure of how big the weights are. Here, m is the number of features in the dataset whereas |W| is the absolute sum of all the weights. You can think of Lambda as the scaling factor it's a hyper parameter which we have to tune when we use it in practice. This N1 Norm is the penalty against the LOSS function greater the complexity greater will be the penalty and vice versa.

Our final LOSS function is

Our goal is to minimize the overall LOSS function but the N1 NORM added a large positive number. So, if we want to minimize the loss we also need to minimize this term which can only be possible if we use least weights (less complex model). So, our goal is to find weights which are not only good for predictions but also the smallest possible weights to make overall loss function less.

If we have large Lambda term, the trade of b/w minimizing the solution term and the global loss function lies where one of the weights is zero or usually more than one with is zero.

How can we use it for feature selection?

As we know the greater the weight the more important/value-able is the feature. So, we can remove the features with zero weight or least weights or select the features with the most weights.

Most of this info is derived from this video.

Using Decision Trees & Random Forest

In Logistic Regression, we are using all of the feature unless we are using L1 regularization which zero out some of the features. However, in Decision Tree the features selection is done implicitly. It is done such that a feature is selected which reduces the entropy the most. The goal is get the Entropy to 0. There can other criteria for selection of feature such as Gini or any other impurity in General. Also known as Information Gain.

Decision tree perform feature selection implicitly.

Feature Selection for Dimensionality Reduction

daud99 — Thu, 26 May 2022 14:05:15 +0000

There are three broad categories of methods for Feature Selection.

Filter Methods
Embedded Methods
Wrapper Methods

Filter Methods.

These methods are based on the intrinsic (natural) properties of the features itself. We don't use any classifier or model at this point.

Univariate Statistics

If the variance for the feature is large such that data points are very spread out it tells that these features data points are very useful for distinguishing b/w different training examples. It will be easy to come up with boundaries to distinguish different data points if there is variance. The larger the variance the more better it is so we can simply remove the features with low variance. This is also known as UNIVARIATE Statistic. As only a single feature is involve. Another fancy term we often use is Information Gain that how much a feature contribute in distinguishing different data points.

1. Using Simply Threshold

The advantage of calculating variance is that it's really fast. And, the major disadvantage is that it doesn't take into account the relationship among features.

# Seperating fetures from Target Variable
features = dataset.loc[:, dataset.columns != 'Label'].astype('float64')
labels = dataset['Label']

from sklearn.feature_selection import VarianceThreshold
# We want to see column here 95% of the value in a feature/column is same 
# Selecting features with less than 5% variance
var_constant = 0.05
var_thr = VarianceThreshold(threshold = var_constant)  
var_thr.fit(features)
variance_stat = var_thr.get_support()
print(f"The {len(variance_stat[variance_stat==False])} have low variance than {var_constant*100}% out of {len(features.columns)} features.")
print(f'Following are the features with low variance')
print(features.columns[np.invert(variance_stat)])

Bivariate Statistics

If we are involving more than one feature for the computation than it's nothing but Bivariate Statistics.

1. Pairwise Correlation

When two features are very correlated then we know that one feature is redundant and probably we can remove that from the dataset without losing too much information.

feature_corr_matrix = features.corr()
plt.figure(figsize=(100,100)) # (width, height)
sns.heatmap(feature_corr_matrix, annot=True,cmap="RdYlGn")

2. Correlation with target Variable

If we have a feature which are highly correlated with target variable than they are good features to use. Especially in the case of Linear Regression.

def corelationHeatMap(col_name):
  corr_matrix = dataset.corr() 
  corr_matrix_cols = corr_matrix.columns
  plt.figure(figsize=(20,60)) # (width, height)
  index = [i for (i,each) in enumerate(corr_matrix.columns) if each == col_name][0]
  sns.heatmap(corr_matrix.iloc[:, [index]], annot=True,cmap="RdYlGn")

Pass in the name of your Target variable to the above function.

corelationHeatMap("Label")

3. Using Anova

We know that Standard Deviation tell us about how spread out the data is in other words how much our data points deviates from mean on average. Whereas, Variance is nothing but the square of the Standard Deviation it also help us to understand the correlation b/w variables.

Anova also help us to find the correlation b/w the variables.

When we use Anova we end up with the value known as F Ratio or F Statistic. This tells us how confidentially we can say there is a correlation b/w the variables. There is a Null Hypothesis which says there is no correlation b/w Variable and Alternate Hypothesis saying there is correlation b/w Variable. Just like p-value is less than significant level here similarly if F ratio is less than significant level than we will reject the Null Hypothesis and accept the Alternate Hypothesis.

When we can use Anova Test if your features are Numerical and Target Variable is Categorical or Numerical then we can use the Anova Test.

How Anova Selection for Feature Selection?

The thing is the F Ratio is calculate for each feature with the Target variable. We select the features with the highest F Ratio/Score as they are the most important.

Embedded Methods

These methods actually involves the model. The model is used with the goal of optimising by selecting the best features. For example: Decision Tree each time we split the node we compare all the different features to select the feature with maximum information gain. So, our goal is to find the features which maximize the information gain when we use it for splitting. We can say that decision tree is actually selecting features while growing the tree. We will be selecting the features that result in most information gain. Usually the features which are used more higher up are the most important one as they have the maximum information gain in the decision tree.

This is just one of the many examples. We will look each of them in great details in the upcoming blogs.

Wrapper Methods

They are based on our main objectives. For instance if we are interested in improving our prediction performance or time of prediction or may be training time. So, we may have different best features depending upon our objective.

What we do is basically we fit our models based on different subsets of features. And, see what are the values/performance of the model using different subsets as per our main objective. This help us selecting the best features.

The Wrapper Method is really expensive as it takes a lot of time to compute the result for each subset of the features as compare to the Univariate Statistics such as variance. So it's quite computationally expensive but it's also very good as it is directly dealing with the intended results.

DEV Community: daud99

Public Suffix List

Public Suffix List (PSL) - Quick Reference

Key Rule: PSL entries CANNOT set cookies

example.com in PSL:

*.example.com in PSL:

Memory trick:

DNS Zone Files: The Blueprints of Domain Mapping

What Is a DNS Zone File?

Where Do Zone Files Exist?

Types of DNS Records in Zone Files

1. Start of Authority (SOA) Record

2. Name Server (NS) Records

3. Address (A) Records

4. AAAA Records

5. Canonical Name (CNAME) Records

6. Mail Exchanger (MX) Records

7. Text (TXT) Records

How Zone Files Work in Practice

Zone Files vs. DNS Zones

Managing Your Zone Files

What Records Do TLD Servers Actually Store?

TLD Servers: Minimalist by Design

How TLD Servers Deliver Both Nameservers and Their IPs

Why Glue Records Are Necessary: Breaking the Circular Dependency

Conclusion

Understanding DNS: How Domain Names Become IP Addresses

Understanding DNS: How Domain Names Become IP Addresses

What is DNS?

Key Players in the Domain World: Registries vs. Registrars

Registry: The Domain Database Managers

The Domain Hierarchy: Who Controls What

Why TLDs Don't Store IP Addresses: The Power of Delegation

Flexible Nameserver Arrangements: Breaking the Hierarchy

Registrar: Your Domain Service Provider

How Domain Registration Works

The DNS Resolution Process: A Step-by-Step Journey

1. Browser and OS Cache Check

2. The Recursive Resolver: Your DNS Detective

3. The DNS Hierarchy: A Tree of Servers

4. Finding the Final Answer

Intelligent Shortcuts: How DNS Optimizes Lookups

Solving the Chicken-and-Egg Problem: Glue Records

Types of DNS Queries

Reverse DNS: Looking Up Names from IP Addresses

Conclusion

References

The DNA of a Domain: Understanding DNS, FQDNs, and Domain Structures

Domain Characteristics

Internationalized Domain Names

Domain Levels and the DNS Hierarchy

SLD and eSLD: Understanding Domain Registration Boundaries

Apex Domain (Root Domain or Naked Domain)

FQDN vs Domain Name: What's the Real Difference?

What Is a Domain Name?

What Is an FQDN (Fully Qualified Domain Name)?

Key Differences Between Domain Name and FQDN

Can a Subdomain Be at the Leaf (Instead of a Hostname)?

Are Both Domain Names and FQDNs Resolvable?

Domain Name System Hierarchy Explained

Official TLD List: Authoritative Sources

Domain Constraints

What is Rootkit?

What is the utmost goal of the rootkit?

What is the typical functionality or characteristics of rootkit?

1- Stealth Functionality

2- backdoor

3- Sniffing

What is the biggest challenge to the attacker?

Types of Rootkits

Installing Ubuntu using VMWare fusion tech Preview on MAC M1 silicon

Download Ubuntu ISO

Create a new Virtual Machine

Disabling Network Adapter

Installing Ubuntu

Updating repositories and rebooting

Installing Desktop Environment

Install VMWare tools

Take Snapshot for fresh installation

Perceptron

`example.com` in PSL:

`*.example.com` in PSL: