<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: daud99</title>
    <description>The latest articles on DEV Community by daud99 (@daud99).</description>
    <link>https://dev.to/daud99</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F270797%2F33859045-2214-4081-9eb0-9a8a1c952589.JPG</url>
      <title>DEV Community: daud99</title>
      <link>https://dev.to/daud99</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/daud99"/>
    <language>en</language>
    <item>
      <title>Public Suffix List</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Sun, 27 Jul 2025 13:22:47 +0000</pubDate>
      <link>https://dev.to/daud99/public-suffix-list-2bo</link>
      <guid>https://dev.to/daud99/public-suffix-list-2bo</guid>
      <description>&lt;h1&gt;
  
  
  Public Suffix List (PSL) - Quick Reference
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Key Rule: PSL entries &lt;strong&gt;CANNOT set cookies&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;example.com&lt;/code&gt; in PSL:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Who can set cookies&lt;/strong&gt;: Only subdomains (&lt;code&gt;a.example.com&lt;/code&gt;, &lt;code&gt;b.example.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cookie sharing&lt;/strong&gt;: None (each subdomain isolated)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Think&lt;/strong&gt;: "Each apartment rents independently, no shared lobby"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;*.example.com&lt;/code&gt; in PSL:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Who can set cookies&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;✅ &lt;code&gt;example.com&lt;/code&gt; (shares to ALL descendants)&lt;/li&gt;
&lt;li&gt;❌ &lt;code&gt;a.example.com&lt;/code&gt;, &lt;code&gt;b.example.com&lt;/code&gt; (they're public suffixes)&lt;/li&gt;
&lt;li&gt;✅ &lt;code&gt;child.a.example.com&lt;/code&gt;, &lt;code&gt;child.b.example.com&lt;/code&gt; (but only for themselves)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Cookie sharing&lt;/strong&gt;: Everyone reads &lt;code&gt;example.com&lt;/code&gt;'s cookies, but children can't share with each other&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Think&lt;/strong&gt;: "Hotel owner controls lobby, guests can't set room rules, but guests' visitors can"&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Memory trick:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No &lt;code&gt;*&lt;/code&gt;&lt;/strong&gt; = Subdomains are independent owners&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With &lt;code&gt;*&lt;/code&gt;&lt;/strong&gt; = Parent owns everything, subdomains are just public spaces (but their children can own again)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Bottom line&lt;/strong&gt;: PSL creates a "cookie boundary" - determines who gets to host vs who just receives.&lt;/p&gt;

</description>
      <category>psl</category>
      <category>cookies</category>
      <category>websecurity</category>
      <category>mozilla</category>
    </item>
    <item>
      <title>DNS Zone Files: The Blueprints of Domain Mapping</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Sun, 18 May 2025 14:44:07 +0000</pubDate>
      <link>https://dev.to/daud99/dns-zone-files-the-blueprints-of-domain-mapping-316n</link>
      <guid>https://dev.to/daud99/dns-zone-files-the-blueprints-of-domain-mapping-316n</guid>
      <description>&lt;p&gt;In our previous blog post, we explored how DNS works to translate domain names into IP addresses. Today, we're going deeper into a critical component of DNS: zone files. These files are the actual blueprints that make the domain name system work behind the scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a DNS Zone File?
&lt;/h2&gt;

&lt;p&gt;A zone file is simply a text file that lives on authoritative DNS servers. It contains the mapping between domain names and their corresponding IP addresses, along with other important resource records. Think of a zone file as a detailed address book for a specific section of the internet.&lt;/p&gt;

&lt;p&gt;Each line in a zone file represents a different record, and each record serves a specific purpose in directing internet traffic to the right destination.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Do Zone Files Exist?
&lt;/h2&gt;

&lt;p&gt;Zone files exist primarily on authoritative nameservers. Remember from our previous blog post that authoritative nameservers are the final authority for a particular domain. When you register a domain and set up its DNS, you're essentially creating and configuring the zone file that will live on those authoritative servers.&lt;/p&gt;

&lt;p&gt;Here's where zone files fit in the DNS hierarchy:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Root servers don't typically use zone files in the traditional sense&lt;/li&gt;
&lt;li&gt;TLD servers (like .com or .org) maintain zone files for their domains&lt;/li&gt;
&lt;li&gt;Authoritative nameservers for your specific domain (like yourdomain.com) host the zone file containing all your domain's DNS records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When your registrar or DNS provider gives you a control panel to manage your domain's DNS settings, you're actually editing the information that will be written to the zone file on your authoritative nameservers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of DNS Records in Zone Files
&lt;/h2&gt;

&lt;p&gt;Let's look at the common types of records found in zone files, with examples of how each one looks.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Start of Authority (SOA) Record
&lt;/h3&gt;

&lt;p&gt;The SOA record is like the cover page of your zone file. It marks the beginning of a zone and contains essential administrative information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;example.com.  IN  SOA  ns1.example.com. admin.example.com. (
                2023042601  ; Serial number
                3600        ; Refresh (1 hour)
                1800        ; Retry (30 minutes)
                604800      ; Expire (1 week)
                86400 )     ; Minimum TTL (24 hours)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This record tells us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The primary nameserver is ns1.example.com&lt;/li&gt;
&lt;li&gt;The administrator's email is &lt;a href="mailto:admin@example.com"&gt;admin@example.com&lt;/a&gt; (note that the @ is replaced with a dot in the record)&lt;/li&gt;
&lt;li&gt;The serial number (2023042601) is like a version number that increases whenever you update the zone&lt;/li&gt;
&lt;li&gt;The various time values tell other DNS servers how often to check for updates and how long to consider the data valid&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every domain belongs to exactly one DNS zone at any given time, and the SOA record defines that relationship.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Name Server (NS) Records
&lt;/h3&gt;

&lt;p&gt;NS records specify which servers are authoritative for the domain. These are the servers that have the definitive information about your domain.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;example.com.  IN  NS  ns1.example.com.
example.com.  IN  NS  ns2.example.com.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example shows that two nameservers (ns1 and ns2) are authoritative for example.com.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Address (A) Records
&lt;/h3&gt;

&lt;p&gt;A records are the most common type of DNS record. They map a domain or subdomain to an IPv4 address.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;example.com.      IN  A  203.0.113.10
www.example.com.  IN  A  203.0.113.10
blog.example.com. IN  A  203.0.113.11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The main domain points to 203.0.113.10&lt;/li&gt;
&lt;li&gt;The www subdomain points to the same address&lt;/li&gt;
&lt;li&gt;The blog subdomain points to a different server at 203.0.113.11&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. AAAA Records
&lt;/h3&gt;

&lt;p&gt;AAAA records are just like A records, but for IPv6 addresses instead of IPv4.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;example.com.  IN  AAAA  2001:0db8:85a3:0000:0000:8a2e:0370:7334
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This record directs traffic for example.com to the IPv6 address shown.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Canonical Name (CNAME) Records
&lt;/h3&gt;

&lt;p&gt;CNAME records create an alias from one domain name to another. They're useful for creating multiple domain names that all point to the same website.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;shop.example.com.  IN  CNAME  example.com.
store.example.com. IN  CNAME  example.com.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With these records, both shop.example.com and store.example.com will direct users to the same place as example.com.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Mail Exchanger (MX) Records
&lt;/h3&gt;

&lt;p&gt;MX records specify which servers handle email for your domain. They include a priority number (lower numbers have higher priority).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;example.com.  IN  MX  10  mail1.example.com.
example.com.  IN  MX  20  mail2.example.com.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configuration tells email servers to try delivering mail to mail1.example.com first. If that server is unavailable, they'll try mail2.example.com.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Text (TXT) Records
&lt;/h3&gt;

&lt;p&gt;TXT records can hold arbitrary text and are often used for domain verification or security policies.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;example.com.  IN  TXT  "v=spf1 include:_spf.example.com ~all"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example shows an SPF (Sender Policy Framework) record that helps prevent email spoofing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Zone Files Work in Practice
&lt;/h2&gt;

&lt;p&gt;Let's see how these records work together in a simplified zone file for example.com:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;; Zone file for example.com
$TTL 86400 ; Default TTL is 24 hours
example.com.  IN  SOA  ns1.example.com. admin.example.com. (
                2023042601  ; Serial number
                3600        ; Refresh
                1800        ; Retry
                604800      ; Expire
                86400 )     ; Minimum TTL

; Nameservers
example.com.  IN  NS  ns1.example.com.
example.com.  IN  NS  ns2.example.com.

; A records for nameservers
ns1.example.com.  IN  A  203.0.113.1
ns2.example.com.  IN  A  203.0.113.2

; Main domain and www subdomain
example.com.      IN  A  203.0.113.10
www.example.com.  IN  CNAME  example.com.

; Email configuration
example.com.  IN  MX  10  mail.example.com.
mail.example.com.  IN  A  203.0.113.20

; Text record for email security
example.com.  IN  TXT  "v=spf1 include:_spf.example.com ~all"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When someone types &lt;a href="http://www.example.com" rel="noopener noreferrer"&gt;www.example.com&lt;/a&gt; into their browser, DNS resolvers follow this chain of records to find the right IP address:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;They see the CNAME pointing &lt;a href="http://www.example.com" rel="noopener noreferrer"&gt;www.example.com&lt;/a&gt; to example.com&lt;/li&gt;
&lt;li&gt;They look up the A record for example.com&lt;/li&gt;
&lt;li&gt;They find the IP address 203.0.113.10&lt;/li&gt;
&lt;li&gt;The browser connects to that IP address&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Zone Files vs. DNS Zones
&lt;/h2&gt;

&lt;p&gt;It's important to distinguish between a zone file and a DNS zone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;DNS zone&lt;/strong&gt; is a portion of the domain namespace for which a specific organization or administrator is responsible&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;zone file&lt;/strong&gt; is the physical text file that contains the record information for that zone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single DNS zone might be spread across multiple nameservers for redundancy, but they all use copies of the same zone file information.&lt;/p&gt;

&lt;h2&gt;
  
  
  Managing Your Zone Files
&lt;/h2&gt;

&lt;p&gt;Most domain owners never need to edit zone files directly. Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you use a registrar's DNS service, you'll manage records through their web interface&lt;/li&gt;
&lt;li&gt;If you host your own DNS, you might edit zone files on your servers or use DNS management software&lt;/li&gt;
&lt;li&gt;If you use a DNS service like Cloudflare or Route 53, you'll use their control panels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The changes you make through these interfaces eventually translate to updates in the zone files on the authoritative nameservers.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Records Do TLD Servers Actually Store?
&lt;/h2&gt;

&lt;p&gt;A common point of confusion about DNS is understanding exactly what information is stored at each level of the hierarchy. Let's clarify what records TLD servers (like those for .com, .org, etc.) actually keep.&lt;/p&gt;

&lt;h3&gt;
  
  
  TLD Servers: Minimalist by Design
&lt;/h3&gt;

&lt;p&gt;TLD servers are surprisingly minimalist in what they store. For each domain under their authority, they typically maintain only:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;NS records&lt;/strong&gt; - These point to the authoritative nameservers for each domain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Glue records&lt;/strong&gt; - These are A records for those nameservers (but only when necessary)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it! TLD servers don't store:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular A records for websites&lt;/li&gt;
&lt;li&gt;CNAME records for subdomains&lt;/li&gt;
&lt;li&gt;MX records for email&lt;/li&gt;
&lt;li&gt;TXT records for verification&lt;/li&gt;
&lt;li&gt;Any other resources for domains under them&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what the .com TLD servers might store for google.com:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;google.com.  IN  NS  ns1.google.com.
google.com.  IN  NS  ns2.google.com.
google.com.  IN  NS  ns3.google.com.
google.com.  IN  NS  ns4.google.com.

; Glue records
ns1.google.com.  IN  A  216.239.32.10
ns2.google.com.  IN  A  216.239.34.10
ns3.google.com.  IN  A  216.239.36.10
ns4.google.com.  IN  A  216.239.38.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The TLD servers know nothing about &lt;a href="http://www.google.com" rel="noopener noreferrer"&gt;www.google.com&lt;/a&gt;, mail.google.com, or any Google services. They only know "if someone asks about google.com, send them to these nameservers."&lt;/p&gt;

&lt;p&gt;This delegation approach is what makes DNS scalable. Imagine if Verisign (who manages .com) had to store every single DNS record for every .com domain in the world!&lt;/p&gt;

&lt;h3&gt;
  
  
  How TLD Servers Deliver Both Nameservers and Their IPs
&lt;/h3&gt;

&lt;p&gt;When your computer asks a .com TLD server about google.com, something clever happens in a single transaction:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The TLD server returns the NS records saying "ask Google's nameservers"&lt;/li&gt;
&lt;li&gt;In the same response, it includes the glue records with the IP addresses of those nameservers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This all happens in one query-response cycle. Here's what that response looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;;; QUESTION SECTION:
;google.com.            IN    NS

;; ANSWER SECTION:
google.com.        172800    IN    NS    ns1.google.com.
google.com.        172800    IN    NS    ns2.google.com.
google.com.        172800    IN    NS    ns3.google.com.
google.com.        172800    IN    NS    ns4.google.com.

;; ADDITIONAL SECTION:
ns1.google.com.    172800    IN    A    216.239.32.10
ns2.google.com.    172800    IN    A    216.239.34.10
ns3.google.com.    172800    IN    A    216.239.36.10
ns4.google.com.    172800    IN    A    216.239.38.10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "ADDITIONAL SECTION" contains those essential glue records with IP addresses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Glue Records Are Necessary: Breaking the Circular Dependency
&lt;/h3&gt;

&lt;p&gt;Without glue records, we'd face a classic chicken-and-egg problem:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"I need to ask ns1.google.com about google.com"&lt;/li&gt;
&lt;li&gt;"But I don't know ns1.google.com's IP address"&lt;/li&gt;
&lt;li&gt;"To find ns1.google.com's IP, I need to resolve that domain name"&lt;/li&gt;
&lt;li&gt;"But that would require asking the nameservers for google.com..."&lt;/li&gt;
&lt;li&gt;And we're back to step 1 in an infinite loop!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Glue records break this circular dependency by providing the IP addresses directly in the TLD response, allowing your DNS resolver to immediately connect to the authoritative nameservers without additional queries.&lt;/p&gt;

&lt;p&gt;Think of it like calling a business and asking for a specific department. Instead of just saying "you need to speak with our Technical Department," they also give you the direct extension number so you don't have to call the main line again.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Zone files are the hidden blueprints that make DNS work. They contain all the crucial information needed to direct internet traffic to the right destinations. By understanding the different types of DNS records and how they function together in zone files, you gain valuable insight into how your domain's presence on the internet is defined and managed.&lt;/p&gt;

&lt;p&gt;The beauty of DNS lies in its hierarchical delegation model. Each level knows just enough information to direct queries to the next level, with TLD servers playing a crucial role in directing traffic without needing to store excessive information.&lt;/p&gt;

&lt;p&gt;Whether you're troubleshooting DNS issues, migrating to a new hosting provider, or simply curious about how the internet works, knowing about zone files helps you understand the fundamental address system of the web.&lt;/p&gt;

</description>
      <category>dns</category>
      <category>zonefile</category>
      <category>web</category>
      <category>dnsrecords</category>
    </item>
    <item>
      <title>Understanding DNS: How Domain Names Become IP Addresses</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Sun, 18 May 2025 01:24:08 +0000</pubDate>
      <link>https://dev.to/daud99/understanding-dns-how-domain-names-become-ip-addresses-2b9h</link>
      <guid>https://dev.to/daud99/understanding-dns-how-domain-names-become-ip-addresses-2b9h</guid>
      <description>&lt;h1&gt;
  
  
  Understanding DNS: How Domain Names Become IP Addresses
&lt;/h1&gt;

&lt;p&gt;In our previous blog post, we covered domains and their various aspects. Now, let's dive deeper into DNS (Domain Name System) - the backbone of internet navigation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is DNS?
&lt;/h2&gt;

&lt;p&gt;DNS, or Domain Name System, is essentially a translator for the internet. Its primary function is simple yet crucial: it converts human-friendly domain names (like example.com) into machine-readable IP addresses that computers use to identify each other. Without DNS, you'd need to memorize numeric IP addresses instead of easy-to-remember domain names.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Players in the Domain World: Registries vs. Registrars
&lt;/h2&gt;

&lt;p&gt;Before we explore how DNS works, let's clarify two important roles that make the domain system possible:&lt;/p&gt;

&lt;h3&gt;
  
  
  Registry: The Domain Database Managers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;registry&lt;/strong&gt; is an organization that manages a specific top-level domain (TLD)&lt;/li&gt;
&lt;li&gt;For example, Verisign manages the .com and .net TLDs, while Public Interest Registry manages .org&lt;/li&gt;
&lt;li&gt;Registries maintain the authoritative database for their TLDs&lt;/li&gt;
&lt;li&gt;They operate the TLD nameservers that direct traffic to the correct domain&lt;/li&gt;
&lt;li&gt;Registries don't sell domains directly to users&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Domain Hierarchy: Who Controls What
&lt;/h4&gt;

&lt;p&gt;Think of the domain system as a tree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Root (.)
  |
  ├── .com (managed by Verisign)
  |     |
  |     ├── amazon.com (managed by Amazon's nameservers)
  |     |
  |     └── google.com (managed by Google's nameservers)
  |
  └── .org (managed by Public Interest Registry)
        |
        └── wikipedia.org (managed by Wikimedia's nameservers)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each registry only manages its specific level in this tree:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Verisign&lt;/strong&gt; operates nameservers authoritative for &lt;strong&gt;.com&lt;/strong&gt; itself&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;These TLD nameservers know which nameservers are responsible for each individual .com domain&lt;/li&gt;
&lt;li&gt;But they don't store the actual IP addresses for websites&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Domain owners&lt;/strong&gt; (like Amazon) operate nameservers authoritative for their specific domains&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amazon runs ns1.amazon.com, ns2.amazon.com, etc.&lt;/li&gt;
&lt;li&gt;These servers contain all the DNS records (IP addresses, mail servers, etc.) for amazon.com&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The registry simply maintains a database that says: "For information about amazon.com, ask Amazon's nameservers."&lt;/p&gt;

&lt;h3&gt;
  
  
  Why TLDs Don't Store IP Addresses: The Power of Delegation
&lt;/h3&gt;

&lt;p&gt;You might wonder: "If the .com TLD servers already have a list of all .com domains, why don't they just store the IP addresses directly? Wouldn't that be faster by removing a step?"&lt;/p&gt;

&lt;p&gt;This delegation approach is actually a brilliant design decision for several reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: There are hundreds of millions of .com domains, each with multiple DNS records. By delegating to authoritative nameservers, TLD servers remain manageable and efficient.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Distributed Control&lt;/strong&gt;: Domain owners can update their DNS records (change IPs, add subdomains, configure email) without involving the registry. You control your domain through your nameservers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flexible Updates&lt;/strong&gt;: Websites change servers, companies add new services, and IP addresses get updated frequently. If all these changes had to go through the TLD servers, it would create a massive bottleneck.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Separation of Responsibilities&lt;/strong&gt;: Verisign (the .com registry) focuses on maintaining the integrity of the TLD, while you focus on managing your specific domain's records.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;System Resilience&lt;/strong&gt;: Distributing DNS across thousands of nameservers creates redundancy. If all .com records were in one place, it would be a single point of failure.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This delegation model is like a phone company maintaining a list of office building addresses, but letting each building manage its own internal directory of employee extensions. It's more efficient for everyone!&lt;/p&gt;

&lt;h3&gt;
  
  
  Flexible Nameserver Arrangements: Breaking the Hierarchy
&lt;/h3&gt;

&lt;p&gt;While the domain system is hierarchical, nameservers don't have to follow this hierarchy. This creates flexibility in how domains are managed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1eo9n86pvsxfm7p0maj5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1eo9n86pvsxfm7p0maj5.png" alt="Nameserver Flexibility" width="800" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A single nameserver can handle domains across different levels and TLDs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your hosting provider's nameserver (like ns1.hostgator.com) might handle:

&lt;ul&gt;
&lt;li&gt;yourbusiness.com&lt;/li&gt;
&lt;li&gt;yourfriend.org&lt;/li&gt;
&lt;li&gt;blog.someoneelse.net&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;This is like having one receptionist who knows about multiple unrelated businesses!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples of flexible nameserver arrangements:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Hosting providers&lt;/strong&gt; manage millions of unrelated domains on the same nameservers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large companies&lt;/strong&gt; might use their nameservers for multiple brands (Google's nameservers handle google.com, youtube.com, gmail.com)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialized services&lt;/strong&gt; might take over part of your domain (blog.yoursite.com might use your blogging platform's nameservers)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What matters is not who owns the nameserver, but which nameserver is registered as authoritative for each domain. You can mix and match however works best for your needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Registrar: Your Domain Service Provider
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;registrar&lt;/strong&gt; (like GoDaddy, Namecheap, or Google Domains) is accredited by ICANN to sell domains&lt;/li&gt;
&lt;li&gt;They act as the middlemen between you and the registry&lt;/li&gt;
&lt;li&gt;Registrars handle domain registration, renewals, transfers, and DNS management&lt;/li&gt;
&lt;li&gt;When you buy a domain, your registrar communicates with the appropriate registry to record your ownership&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How Domain Registration Works
&lt;/h3&gt;

&lt;p&gt;When you register a domain like "yourblog.com":&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You visit a registrar's website and check if the domain is available&lt;/li&gt;
&lt;li&gt;The registrar queries the .com registry (Verisign) to verify availability&lt;/li&gt;
&lt;li&gt;You purchase the domain through the registrar&lt;/li&gt;
&lt;li&gt;The registrar sends your information to the registry&lt;/li&gt;
&lt;li&gt;The registry adds your domain to its database&lt;/li&gt;
&lt;li&gt;The registry updates its nameservers with information about your domain's authoritative nameservers&lt;/li&gt;
&lt;li&gt;These updates propagate through the DNS system (which can take 24-48 hours)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This centralized registry system ensures that no matter which registrar you use, there's only one authoritative source of truth for each TLD.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DNS Resolution Process: A Step-by-Step Journey
&lt;/h2&gt;

&lt;p&gt;When you type a URL into your browser, a fascinating sequence of lookups begins. Let's walk through this journey:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Browser and OS Cache Check
&lt;/h3&gt;

&lt;p&gt;Your system first checks if it already knows the answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Browser DNS Cache&lt;/strong&gt;: Your browser keeps a temporary record of recent DNS lookups. You can view this in some browsers (in Edge, type "edge://net-internals/#dns" in the address bar).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operating System Cache&lt;/strong&gt;: If not found in the browser, your OS checks its own cache. This local resolver is called a &lt;strong&gt;stub resolver&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. The Recursive Resolver: Your DNS Detective
&lt;/h3&gt;

&lt;p&gt;If the domain isn't found locally, the query leaves your computer with a recursive flag set to true, heading to a &lt;strong&gt;DNS recursor&lt;/strong&gt; server.&lt;/p&gt;

&lt;p&gt;Think of the DNS recursor as a detective - it takes your case and investigates until it finds an answer. This server is typically provided by your Internet Service Provider (ISP) or public DNS services like Google's 8.8.8.8 or Cloudflare's 1.1.1.1.&lt;/p&gt;

&lt;p&gt;The recursor first checks its own cache. If the information isn't there, it begins a journey through the DNS hierarchy.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The DNS Hierarchy: A Tree of Servers
&lt;/h3&gt;

&lt;p&gt;The DNS system is structured as a hierarchical tree:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Root Servers&lt;/strong&gt;: The recursor first contacts one of the 13 root server networks (labeled A through M). Despite being only 13 logical entities, these represent hundreds of physical servers distributed globally, operated by 12 independent organizations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Top-Level Domain (TLD) Servers&lt;/strong&gt;: The root server points the recursor to the appropriate TLD server (like .com, .org, or .net).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Authoritative Nameservers&lt;/strong&gt;: The TLD server directs the recursor to the authoritative nameservers for the specific domain. These servers hold the actual DNS records (including IP addresses) for the domain you're looking for.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Finding the Final Answer
&lt;/h3&gt;

&lt;p&gt;The recursor contacts the authoritative nameserver, which responds with the IP address for the requested domain. This information then flows back through the chain to your browser, which can finally connect to the website.&lt;/p&gt;

&lt;h3&gt;
  
  
  Intelligent Shortcuts: How DNS Optimizes Lookups
&lt;/h3&gt;

&lt;p&gt;DNS resolvers are smart - they don't just cache complete domain resolutions. They remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Addresses of root servers&lt;/li&gt;
&lt;li&gt;Addresses of TLD servers (like .com)&lt;/li&gt;
&lt;li&gt;Addresses of authoritative nameservers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This strategic caching means that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If a resolver has seen a .com domain before, it can skip the root server step and go directly to the .com TLD server&lt;/li&gt;
&lt;li&gt;If it recognizes the authoritative nameservers for a domain, it can bypass both root and TLD servers&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These shortcuts significantly speed up DNS resolution for frequently accessed domains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solving the Chicken-and-Egg Problem: Glue Records
&lt;/h2&gt;

&lt;p&gt;Here's an interesting puzzle: If nameservers often have domain names themselves (like ns1.example.com), how do we resolve their domains without creating an infinite loop?&lt;/p&gt;

&lt;p&gt;The solution is &lt;strong&gt;glue records&lt;/strong&gt;. When a domain is registered, the registrar provides not just the nameserver's domain name but also its direct IP address to the TLD server. This breaks the circular dependency, allowing resolvers to find the nameserver's IP without having to resolve another domain name.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of DNS Queries
&lt;/h2&gt;

&lt;p&gt;The DNS resolution process involves three distinct query types:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Recursive Queries&lt;/strong&gt;: Like asking a librarian to find a book for you. You expect a complete answer (the book) or a definitive "we don't have it."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Iterative Queries&lt;/strong&gt;: Like asking a librarian which section to look in, then going there yourself. The server gives you directions to the next stop, but you continue the journey yourself.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Non-recursive Queries&lt;/strong&gt;: Like asking for a book the librarian is already holding. These are quick responses for information the DNS server already has in its cache or is directly responsible for.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Reverse DNS: Looking Up Names from IP Addresses
&lt;/h2&gt;

&lt;p&gt;While standard DNS answers "What IP address does example.com have?", Reverse DNS answers "What domain name is using IP address 93.184.216.34?"&lt;/p&gt;

&lt;p&gt;This process uses the special .in-addr.arpa TLD (for IPv4) or .ip6.arpa (for IPv6). Reverse DNS is commonly used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Email server verification (reducing spam)&lt;/li&gt;
&lt;li&gt;Server logging (showing domain names instead of IP addresses)&lt;/li&gt;
&lt;li&gt;Network troubleshooting&lt;/li&gt;
&lt;li&gt;Security monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Domain Name System is a marvel of internet engineering. Its distributed, hierarchical design allows billions of DNS queries to be resolved daily with remarkable efficiency.&lt;/p&gt;

&lt;p&gt;Understanding DNS involves recognizing the roles of different entities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Registries&lt;/strong&gt; maintain the authoritative databases for TLDs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Registrars&lt;/strong&gt; provide the interface between users and registries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DNS Servers&lt;/strong&gt; (from root servers to your ISP's resolvers) work together to translate domains to IP addresses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This knowledge helps you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Troubleshoot website connection issues&lt;/li&gt;
&lt;li&gt;Make smarter decisions about hosting and domain management&lt;/li&gt;
&lt;li&gt;Better understand who controls different aspects of your online presence&lt;/li&gt;
&lt;li&gt;Appreciate how the internet maintains its user-friendly face&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://howdns.works/ep7/" rel="noopener noreferrer"&gt;How DNS Works - DNS Explained in Comic Form&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cloudflare.com/learning/dns/what-is-dns/" rel="noopener noreferrer"&gt;Cloudflare - What is DNS?&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>web</category>
      <category>dns</category>
      <category>registrar</category>
      <category>registry</category>
    </item>
    <item>
      <title>The DNA of a Domain: Understanding DNS, FQDNs, and Domain Structures</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Sat, 17 May 2025 22:37:34 +0000</pubDate>
      <link>https://dev.to/daud99/the-dna-of-a-domain-understanding-dns-fqdns-and-domain-structures-1in2</link>
      <guid>https://dev.to/daud99/the-dna-of-a-domain-understanding-dns-fqdns-and-domain-structures-1in2</guid>
      <description>&lt;p&gt;Domains are to the internet what names are to humans, making identification simple and intuitive for everyone. Just as we give people names instead of describing their physical characteristics each time we refer to them, domains give websites and online resources readable names instead of complex numerical addresses. When you type 'google.com' instead of having to remember a string of numbers like '172.217.168.238', you're benefiting from the domain name system that makes the internet accessible to everyone.&lt;/p&gt;

&lt;p&gt;One key difference from human names is that while many people can share the same name in the real world, a single domain can only point to one destination at a time on the internet. However, you can have multiple domains all pointing to the same resource, similar to having several nicknames that all refer to you. Domains essentially translate the technical infrastructure of the internet into a language we can easily understand and remember, bridging the gap between complex technology and everyday human interaction.&lt;/p&gt;

&lt;p&gt;Let's see this DNS resolution in action with a simple command-line tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using 'dig' to see how a domain resolves to an IP address&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;dig google.com +short
142.250.72.110

&lt;span class="c"&gt;# Using 'nslookup' for the same purpose&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;nslookup google.com
Server:     192.168.1.1
Address:    192.168.1.1#53

Non-authoritative answer:
Name:   google.com
Address: 142.250.72.110
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This translation happens behind the scenes every time you visit a website, allowing you to use memorable names instead of numerical addresses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Domain Characteristics
&lt;/h2&gt;

&lt;p&gt;Domain names can only contain letters (a-z, A-Z), numbers (0-9), and hyphens. A hyphen cannot appear at the beginning or end of a domain name. Any other characters are considered invalid for standard domains. The dot character serves a special purpose in domains - it separates different levels of the domain hierarchy rather than being part of the domain name itself.&lt;/p&gt;

&lt;p&gt;An important characteristic to remember is that domains are case-insensitive, meaning GOOGLE.COM and google.com are treated as identical. This makes domains even more user-friendly, as you don't need to worry about uppercase or lowercase when typing a web address.&lt;/p&gt;

&lt;p&gt;I would recommend REGEX used in IOCSEARCHER for reference to validate domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Internationalized Domain Names
&lt;/h2&gt;

&lt;p&gt;Initially, domains were limited to ASCII characters, which only allowed for about 128 different characters. This restriction prevented many languages from using their native scripts in domain names. To address this limitation, Internationalized Domain Names (IDNs) were developed to support characters from languages like Chinese, Russian, Hindi, and many others.&lt;/p&gt;

&lt;p&gt;While you can register domains with these non-ASCII characters, they're ultimately converted to Punycode through the bootstring algorithm outlined in RFC-3492. When an internationalized domain is processed, the system adds "xn--" to each part containing non-ASCII characters. For example, café.com becomes xn--caf-dma.com, and مثال.إختبار becomes xn--mgbh0fb.xn--kgbechtv. This is why domains cannot have hyphens as the third and fourth characters unless they're IDNs - it would conflict with this encoding system.&lt;/p&gt;

&lt;p&gt;You can see this conversion in action with some simple Python code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;idna&lt;/span&gt;

&lt;span class="c1"&gt;# Convert internationalized domain names to Punycode
&lt;/span&gt;&lt;span class="n"&gt;examples&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;café.com&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;привет.рф&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;よろしく.jp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;domain&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;examples&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;punycode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;idna&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ascii&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;domain&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;punycode&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Output:
# café.com → xn--caf-dma.com
# привет.рф → xn--b1agh1afp.xn--p1ai
# よろしく.jp → xn--28j2a3ar1p.jp
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This internationalization has unfortunately led to security concerns like homograph attacks, where visually similar characters from different scripts can create convincing fake domains. For instance, раypal.com using the Cyrillic 'р' looks nearly identical to paypal.com but leads to a completely different website. Modern browsers have implemented protections against many of these tricks, but users should remain vigilant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Domain Levels and the DNS Hierarchy
&lt;/h2&gt;

&lt;p&gt;The domain hierarchy is organized into levels, each separated by a dot. The Top-Level Domain (TLD) like .com is the highest level. Adding sections creates new levels - one.com is a Second-Level Domain (2LD), while two.one.com represents a Third-Level Domain (3LD).&lt;/p&gt;

&lt;p&gt;Here's a visual breakdown of domain levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─ 3rd Level Domain ─┐ ┌─ 2nd Level Domain ─┐ ┌─ TLD ─┐
         blog         .        example       .   com
└───────────────────────────── FQDN ────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common examples of domain levels in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TLD: .com, .org, .net, .edu&lt;/li&gt;
&lt;li&gt;2LD: google.com, wikipedia.org, amazon.com&lt;/li&gt;
&lt;li&gt;3LD: mail.google.com, en.wikipedia.org, aws.amazon.com&lt;/li&gt;
&lt;li&gt;4LD: support.mail.google.com&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each level serves a specific organizational purpose in the hierarchical domain name system.&lt;/p&gt;

&lt;h3&gt;
  
  
  SLD and eSLD: Understanding Domain Registration Boundaries
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;SLD (Second-Level Domain)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The SLD is the label &lt;strong&gt;immediately to the left&lt;/strong&gt; of the public suffix (not just the TLD)&lt;/li&gt;
&lt;li&gt;It's the specific part that identifies the registrant's domain&lt;/li&gt;
&lt;li&gt;Examples:

&lt;ul&gt;
&lt;li&gt;In &lt;code&gt;google.com&lt;/code&gt;, the SLD is &lt;code&gt;google&lt;/code&gt; (left of the &lt;code&gt;.com&lt;/code&gt; public suffix)&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;example.co.uk&lt;/code&gt;, the SLD is &lt;code&gt;example&lt;/code&gt; (left of the &lt;code&gt;.co.uk&lt;/code&gt; public suffix)&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;blog.wordpress.com&lt;/code&gt;, the SLD is &lt;code&gt;blog&lt;/code&gt; (left of the &lt;code&gt;wordpress.com&lt;/code&gt; public suffix)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;eSLD (Effective Second-Level Domain)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The eSLD is the &lt;strong&gt;complete registrable domain&lt;/strong&gt; - the domain at which registration occurs&lt;/li&gt;
&lt;li&gt;It consists of the SLD plus the public suffix&lt;/li&gt;
&lt;li&gt;It represents the boundary of administrative control&lt;/li&gt;
&lt;li&gt;Examples:

&lt;ul&gt;
&lt;li&gt;In &lt;code&gt;google.com&lt;/code&gt;, the eSLD is &lt;code&gt;google.com&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;example.co.uk&lt;/code&gt;, the eSLD is &lt;code&gt;example.co.uk&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;user.github.io&lt;/code&gt;, the eSLD is &lt;code&gt;user.github.io&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;In &lt;code&gt;mypage.blogspot.com&lt;/code&gt;, the eSLD is &lt;code&gt;mypage.blogspot.com&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Distinction&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SLD&lt;/strong&gt;: Just the identifying label portion controlled by the registrant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;eSLD&lt;/strong&gt;: The complete domain that represents the unit of ownership/registration, including both the SLD and its public suffix&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical factor is understanding the public suffix list, which defines which domain suffixes are available for public registration (like &lt;code&gt;.com&lt;/code&gt;, &lt;code&gt;.co.uk&lt;/code&gt;, &lt;code&gt;github.io&lt;/code&gt;, etc.).&lt;/p&gt;

&lt;h3&gt;
  
  
  Apex Domain (Root Domain or Naked Domain)
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;apex domain&lt;/strong&gt; (also called the &lt;strong&gt;root domain&lt;/strong&gt; or &lt;strong&gt;naked domain&lt;/strong&gt;) refers to a domain without any subdomain prefix. It's the base domain that you register with a domain registrar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It has no subdomain part (no "www" or other prefix)&lt;/li&gt;
&lt;li&gt;It's directly at the "apex" of your domain namespace&lt;/li&gt;
&lt;li&gt;It cannot have a CNAME record in standard DNS (only A, AAAA, MX, TXT, etc.)&lt;/li&gt;
&lt;li&gt;It's the entry point to your domain's DNS zone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;example.com&lt;/code&gt; (not &lt;code&gt;www.example.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;github.io&lt;/code&gt; (not &lt;code&gt;username.github.io&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mydomain.co.uk&lt;/code&gt; (not &lt;code&gt;blog.mydomain.co.uk&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The apex domain is particularly important in DNS configuration and web hosting setups. Many CDNs and cloud providers have special requirements or limitations for apex domains due to DNS constraints. Some services offer workarounds like ANAME, ALIAS, or CNAME flattening to overcome these limitations.&lt;/p&gt;

&lt;p&gt;Understanding the apex domain is crucial when configuring websites, email services, and other internet resources, as it represents the foundation of your domain's identity on the internet.&lt;/p&gt;

&lt;h2&gt;
  
  
  FQDN vs Domain Name: What's the Real Difference?
&lt;/h2&gt;

&lt;p&gt;When navigating the world of DNS and internet naming, terms like &lt;strong&gt;FQDN (Fully Qualified Domain Name)&lt;/strong&gt; and &lt;strong&gt;domain name&lt;/strong&gt; often get used interchangeably — but they're &lt;strong&gt;not the same&lt;/strong&gt;. Understanding the distinction is essential for developers, sysadmins, and anyone dealing with network configuration or web services.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is a Domain Name?
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;domain name&lt;/strong&gt; is a human-readable address used to identify resources on the internet. It typically consists of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;second-level domain (SLD)&lt;/strong&gt; like &lt;code&gt;example&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;top-level domain (TLD)&lt;/strong&gt; like &lt;code&gt;.com&lt;/code&gt;, &lt;code&gt;.org&lt;/code&gt;, or &lt;code&gt;.net&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;example.com&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;openai.org&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are domain names — they can represent a website, a zone in DNS, or even serve as a base for email routing.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is an FQDN (Fully Qualified Domain Name)?
&lt;/h3&gt;

&lt;p&gt;An &lt;strong&gt;FQDN&lt;/strong&gt; is the complete address of a host &lt;strong&gt;within the DNS hierarchy&lt;/strong&gt;, including &lt;strong&gt;all levels of the domain&lt;/strong&gt;, right up to the root (&lt;code&gt;.&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structure of an FQDN:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hostname.subdomain.domain.tld.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;✔️ The trailing dot (&lt;code&gt;.&lt;/code&gt;) is optional in most real-world usage but technically represents the &lt;strong&gt;DNS root&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Examples:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;www.example.com.&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mail.google.com.&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;api.openai.org.&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An FQDN &lt;strong&gt;unambiguously identifies a specific resource&lt;/strong&gt; (usually a host or service) on the internet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Differences Between Domain Name and FQDN
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Domain Name&lt;/th&gt;
&lt;th&gt;FQDN&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hierarchy Depth&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Full&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Includes Hostname?&lt;/td&gt;
&lt;td&gt;Not necessarily&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ends with Root Dot?&lt;/td&gt;
&lt;td&gt;No (implied)&lt;/td&gt;
&lt;td&gt;Yes (optional, implied)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Example&lt;/td&gt;
&lt;td&gt;&lt;code&gt;example.com&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;www.example.com.&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DNS Resolution?&lt;/td&gt;
&lt;td&gt;Yes, if configured&lt;/td&gt;
&lt;td&gt;Yes, if configured&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Can a Subdomain Be at the Leaf (Instead of a Hostname)?
&lt;/h3&gt;

&lt;p&gt;Yes. The leftmost part of a name like &lt;code&gt;blog.example.com&lt;/code&gt; could be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;hostname&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;subdomain&lt;/strong&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Bottom line:&lt;/strong&gt; The &lt;strong&gt;leaf node in an FQDN&lt;/strong&gt; is not always a hostname. It depends on how DNS records are configured.&lt;/p&gt;

&lt;h3&gt;
  
  
  Are Both Domain Names and FQDNs Resolvable?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Yes&lt;/strong&gt;, as long as they have the necessary DNS records.&lt;/p&gt;

&lt;p&gt;If there are &lt;strong&gt;no DNS records&lt;/strong&gt;, then neither will resolve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain Name System Hierarchy Explained
&lt;/h3&gt;

&lt;p&gt;The Domain Name System (DNS) has a hierarchical structure similar to a family tree or an organizational chart. Here's how it works in simple terms:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Root&lt;/strong&gt; - At the very top of the hierarchy is what's called the "root," represented by a single dot (.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top-Level Domains (TLDs)&lt;/strong&gt; - The next level down contains domains like:

&lt;ul&gt;
&lt;li&gt;Generic TLDs: .com, .org, .net, .edu&lt;/li&gt;
&lt;li&gt;Country-code TLDs: .uk, .fr, .jp, .ca&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Official TLD List: Authoritative Sources
&lt;/h3&gt;

&lt;p&gt;The official list of Top-Level Domains (TLDs) is maintained by the &lt;strong&gt;Internet Assigned Numbers Authority (IANA)&lt;/strong&gt;, which operates under the Internet Corporation for Assigned Names and Numbers (ICANN). This authoritative registry contains all recognized TLDs in the global DNS root zone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where to find the official TLD list:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;IANA Root Zone Database&lt;/strong&gt;: The most authoritative source, available at &lt;a href="https://www.iana.org/domains/root/db" rel="noopener noreferrer"&gt;https://www.iana.org/domains/root/db&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ICANN TLD Program&lt;/strong&gt;: Information about new gTLDs: &lt;a href="https://newgtlds.icann.org/" rel="noopener noreferrer"&gt;https://newgtlds.icann.org/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public Suffix List&lt;/strong&gt;: Maintained by Mozilla, this list includes both TLDs and public suffixes: &lt;a href="https://publicsuffix.org/" rel="noopener noreferrer"&gt;https://publicsuffix.org/&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The IANA Root Zone Database categorizes TLDs into several types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;gTLD&lt;/strong&gt; (Generic Top-Level Domain): .com, .org, .net, .info&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ccTLD&lt;/strong&gt; (Country Code Top-Level Domain): .us, .uk, .jp, .de&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sTLD&lt;/strong&gt; (Sponsored Top-Level Domain): .edu, .gov, .mil&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IDN ccTLD&lt;/strong&gt; (Internationalized Country Code): .рф (Russia), .中国 (China)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New gTLD&lt;/strong&gt;: .app, .blog, .dev, .shop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The number of TLDs has expanded dramatically since 2013 when ICANN's New gTLD Program introduced hundreds of new generic TLDs. The root zone is regularly updated as new TLDs are approved and added to the global DNS system.&lt;/p&gt;

&lt;p&gt;For developers and security professionals, programmatic access to the IANA database is possible, and many APIs and libraries offer routines to check or validate domains against the current TLD list.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Second-Level Domains&lt;/strong&gt; - These are the names organizations register, like "google" in google.com or "bbc" in bbc.co.uk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subdomains&lt;/strong&gt; - These are additional levels that organizations can create, like "mail" in mail.google.com or "news" in news.bbc.co.uk.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Think of it like a mailing address:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The root is like the planet&lt;/li&gt;
&lt;li&gt;The TLD is like the country&lt;/li&gt;
&lt;li&gt;The second-level domain is like the city&lt;/li&gt;
&lt;li&gt;Subdomains are like the street and building&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When you type a web address, your computer follows this hierarchy from right to left to find the correct destination. It starts at the root, then follows the path down through each level until it reaches the specific website or service you're looking for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Domain Constraints
&lt;/h2&gt;

&lt;p&gt;There are technical limitations to domains. A complete domain name cannot exceed 253 characters, with each label (section between dots) limited to 63 characters. The domain system allows for up to 127 labels, including the root level, though such deep hierarchies are rarely used in practice.&lt;/p&gt;

&lt;p&gt;These constraints ensure that domain names remain manageable and compatible with the underlying DNS infrastructure. While most domain registrations use just two or three levels, the system's flexibility allows for more complex organizational structures when needed.&lt;/p&gt;

</description>
      <category>fqdn</category>
      <category>domain</category>
      <category>tld</category>
      <category>domainstructure</category>
    </item>
    <item>
      <title>What is Rootkit?</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Mon, 01 Aug 2022 08:38:09 +0000</pubDate>
      <link>https://dev.to/daud99/what-is-rootkit-4974</link>
      <guid>https://dev.to/daud99/what-is-rootkit-4974</guid>
      <description>&lt;p&gt;When you listen about rootkit and if you are a linux user first thing that comes to your mind will be &lt;strong&gt;this has some thing to do with root user&lt;/strong&gt;. And, you are not wrong but it's a part of it. Let's define it formally.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Rootkit is a program that can hide itself as well as other running processes, files, network connections from the host where it is running.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is the utmost goal of the rootkit?
&lt;/h2&gt;

&lt;p&gt;The main goal is to run &lt;strong&gt;incognito&lt;/strong&gt; meaning running in the background for as long as it is possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the typical functionality or characteristics of rootkit?
&lt;/h2&gt;

&lt;h4&gt;
  
  
  1- Stealth Functionality
&lt;/h4&gt;

&lt;p&gt;It aims to hide the traces of intruder by manipulating processes, open files, network activity, changing access rights/permission of different files and directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  2- backdoor
&lt;/h3&gt;

&lt;p&gt;One of the main goal of rootkit is to make sure that intruder have full remote access to the victim's computer all the time. For e.g: rootkit may establish a backdoor using ssh tunneling.&lt;/p&gt;

&lt;h3&gt;
  
  
  3- Sniffing
&lt;/h3&gt;

&lt;p&gt;It also enables attacker to wiretapping and intercepting various system components may be sending data to a particular end point or installing a keylogger.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is the biggest challenge to the attacker?
&lt;/h2&gt;

&lt;p&gt;The biggest challenge that also differentiate rootkit from other types of malware is the fact that rootkit need to be installed with root privileges in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of Rootkits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;User-mode rootkit: A user-mode rootkit covertly replaces common UNIX binaries or libraries with infected versions to hide its existence and to gain root privileges if needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kernel-mode rootkit: A kernel-mode rootkit operates on the system level and modifies or replaces the kernel which may have been affected in the boot process.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;This is a good &lt;a href="https://www.freecodecamp.org/news/the-linux-booting-process-6-steps-described-in-detail/" rel="noopener noreferrer"&gt;blog&lt;/a&gt; on Linux boot process.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>rootkit</category>
      <category>malware</category>
      <category>rootkitdetection</category>
      <category>security</category>
    </item>
    <item>
      <title>Installing Ubuntu using VMWare fusion tech Preview on MAC M1 silicon</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Sun, 24 Jul 2022 09:44:00 +0000</pubDate>
      <link>https://dev.to/daud99/installing-ubuntu-using-vmware-fusion-tech-preview-on-mac-m1-silicon-4b0e</link>
      <guid>https://dev.to/daud99/installing-ubuntu-using-vmware-fusion-tech-preview-on-mac-m1-silicon-4b0e</guid>
      <description>&lt;h2&gt;
  
  
  Download Ubuntu ISO
&lt;/h2&gt;

&lt;p&gt;Download the ISO named &lt;strong&gt;"ubuntu-20.04.4-live-server-arm64.iso"&lt;/strong&gt; from &lt;a href="https://cdimage.ubuntu.com/releases/20.04/release/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a new Virtual Machine
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wiagh2goqhq9jkz7vjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wiagh2goqhq9jkz7vjr.png" alt="Image description" width="800" height="518"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;1- Click on "Install from disk or image".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjacntiy2bebo9h7ypc64.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjacntiy2bebo9h7ypc64.png" alt="Image description" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2- Browse and select to the ISO Ubuntu file downloaded. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqt9a9v77gqc6ai1jj8il.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqt9a9v77gqc6ai1jj8il.png" alt="Image description" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3- Click Finish.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8i7y3oj198kyf1c5nno.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8i7y3oj198kyf1c5nno.png" alt="Image description" width="800" height="668"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It will automatically start the Virtual Machine. Simply shut it down.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Disabling Network Adapter
&lt;/h2&gt;

&lt;p&gt;1- Go to the Settings&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flry6kvvatmhwge8kkej2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flry6kvvatmhwge8kkej2.png" alt="Image description" width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2- Select Network Adapter&lt;/p&gt;

&lt;p&gt;3- Deselect &lt;strong&gt;Connect network adapter&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnotj0arkcxsf8z7mqapp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnotj0arkcxsf8z7mqapp.png" alt="Image description" width="800" height="528"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Ubuntu
&lt;/h2&gt;

&lt;p&gt;1- Now, start the VM.&lt;br&gt;
2- Click on "Install Ubuntu Server"&lt;br&gt;
3- Select all the defaults and keep on going.&lt;br&gt;
4- Eventually, you will reach here.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4md2wktxs3yo8wwcilh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy4md2wktxs3yo8wwcilh.png" alt="Image description" width="800" height="590"&gt;&lt;/a&gt;&lt;br&gt;
5- Once install You can see that "installing system" changes to "Install Complete!". Then, click on "Reboot Now" and Press "Enter" again.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ry3j1cxil343a0xihwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ry3j1cxil343a0xihwi.png" alt="Image description" width="800" height="630"&gt;&lt;/a&gt;&lt;br&gt;
6- The VM will stuck on the following screen simply Press "Enter" to continue.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fauaeechd9dt7ak0g9010.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fauaeechd9dt7ak0g9010.png" alt="Image description" width="800" height="630"&gt;&lt;/a&gt;&lt;br&gt;
7- Enable the Network adapter back again by going to the Settings &amp;gt; Network Adapter &amp;gt; Selecting "Connect Network Adapter" option.&lt;br&gt;&lt;br&gt;
8- You will be prompt to Enter Username and password which you configured while installing the operating system.&lt;br&gt;
9- You will reach the following screen.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fynk02usjod00igykjpu7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fynk02usjod00igykjpu7.png" alt="Image description" width="800" height="432"&gt;&lt;/a&gt; &lt;br&gt;
10- Also remove the ISO by going to settings and unselecting "CD/DVD (SATA)&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsntj0j22uwgvcn9xqog1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsntj0j22uwgvcn9xqog1.png" alt="Image description" width="800" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Updating repositories and rebooting
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Installing Desktop Environment
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;tasksel
&lt;span class="nb"&gt;sudo &lt;/span&gt;tasksel &lt;span class="nb"&gt;install &lt;/span&gt;ubuntu-desktop
&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Install VMWare tools
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; open-vm-tools-desktop
&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot now
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Take Snapshot for fresh installation
&lt;/h2&gt;

&lt;p&gt;We can revert back to it in case there is some issue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikijre64mv3tf3ipqduw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fikijre64mv3tf3ipqduw.png" alt="Image description" width="800" height="267"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wvugbb69g8oskt9n51f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wvugbb69g8oskt9n51f.png" alt="Image description" width="800" height="671"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld8itpasvtewh44cdbpp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld8itpasvtewh44cdbpp.png" alt="Image description" width="800" height="662"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>vmwarefusion</category>
      <category>mac</category>
      <category>m1</category>
      <category>ubuntuinstallation</category>
    </item>
    <item>
      <title>Perceptron</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Fri, 08 Jul 2022 16:38:21 +0000</pubDate>
      <link>https://dev.to/daud99/perceptron-2kmk</link>
      <guid>https://dev.to/daud99/perceptron-2kmk</guid>
      <description>&lt;h2&gt;
  
  
  Perceptron Model
&lt;/h2&gt;

&lt;p&gt;You can think of it as a basis of neural network. The motivation for perceptron is taken from the human neuron.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmog4085lzcb1pizep1g1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmog4085lzcb1pizep1g1.png" alt="Image description" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Input are going to neuron which perform some sort of computation on it in order to give the output. The functionality inside the neuron is often referred as &lt;strong&gt;Activation Function&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Now, in order to do the learning we need to adjust some parameters these parameters are known as &lt;strong&gt;weights&lt;/strong&gt;. These weights get multiplied to the input.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28xs51zdd7z2ks3mpakv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28xs51zdd7z2ks3mpakv.png" alt="Image description" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is still a problem. What if the input is zero? Doesn't matter what change we do to the w nothing is going to happen. In order to solve this problem we will add a bias to each input.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F908m8hu8kkyuzlucw9uf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F908m8hu8kkyuzlucw9uf.png" alt="Image description" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The interesting thing is the multiplication of input and it's weight has to overcome the bias in order to have some effect on the output.&lt;/p&gt;

&lt;p&gt;The value of both the weight and bias can be positive and negative.&lt;/p&gt;

&lt;p&gt;Mathematically our generalisation is &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyk96r4df4irj14t7siak.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyk96r4df4irj14t7siak.png" alt="Image description" width="800" height="313"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>deeplearning</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>perceptron</category>
    </item>
    <item>
      <title>Feature Scaling In Machine Learning</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Thu, 02 Jun 2022 16:09:38 +0000</pubDate>
      <link>https://dev.to/daud99/feature-scaling-in-machine-learning-47be</link>
      <guid>https://dev.to/daud99/feature-scaling-in-machine-learning-47be</guid>
      <description>&lt;h2&gt;
  
  
  Feature Scaling
&lt;/h2&gt;

&lt;p&gt;The process of making all the features or independent variables (Variable other than target variable) on almost the same scale so that each feature is equally important.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example:
&lt;/h3&gt;

&lt;p&gt;This is a dataset that contains an independent variable (Purchased) and 3 dependent variables (Country, Age, and Salary). We can easily notice that the variables are not on the same scale because the range of Age is from 27 to 50, while the range of Salary going from 48 K to 83 K. The range of Salary is much wider than the range of Age. This will cause some issues in our models since a lot of machine learning models such as k-means clustering and nearest neighbor classification are based on the Euclidean Distance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Methods for Feature Scaling
&lt;/h2&gt;

&lt;p&gt;There are different method of feature scaling.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Standardization (Z-score Normalization)&lt;/li&gt;
&lt;li&gt;Max-Min Normalization (Min-Max Scaling)&lt;/li&gt;
&lt;li&gt;Standard Deviation Method&lt;/li&gt;
&lt;li&gt;Range Method&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  1. Standardization
&lt;/h3&gt;

&lt;p&gt;Standardization means you're transforming your data so that fits within specific scale/range, like 0-100 or 0-1. The features are rescaled such that it's mean and standard deviation are 0 and 1, respectively.&lt;/p&gt;

&lt;p&gt;The data distribution with mean and standard deviation 0 and 1 respectively indicates &lt;strong&gt;Standard Normal Distribution&lt;/strong&gt;. This is also know as &lt;strong&gt;Z-Score Normalization&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Well, the idea is &lt;strong&gt;simple&lt;/strong&gt;. Variables that are measured at different scales do not contribute equally to the model fitting &amp;amp; model learned function and might end up creating a bias. Thus, to deal with this potential problem feature-wise standardized (μ=0, σ=1) is usually used prior to model fitting.&lt;/p&gt;

&lt;p&gt;Standardization comes into picture when features of input data set have large differences between their ranges, or simply when they are measured in different measurement units (e.g., Pounds, Meters, Miles … etc).&lt;/p&gt;

&lt;p&gt;These differences in the ranges of initial features causes trouble to many machine learning models. For example, for the models that are based on distance computation, if one of the features has a broad range of values, the distance will be governed by this particular feature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0yz4urcpqhjkvz51bueg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0yz4urcpqhjkvz51bueg.png" alt="Image description" width="743" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To illustrate this with an example : say we have a 2-dimensional data set with two features, Height in Meters and Weight in Pounds, that range respectively from [1 to 2] Meters and [10 to 200] Pounds. No matter what distance based model you perform on this data set, the Weight feature will dominate over the Height feature and will have more contribution to the distance computation, just because it has bigger values compared to the Height. So, to prevent this problem, transforming features to comparable scales using standardization is the solution. &lt;/p&gt;

&lt;p&gt;The following formula is used to perform Standardization for each value of the feature.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4udy3is7p0cyy5ib0je.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4udy3is7p0cyy5ib0je.png" alt="Image description" width="335" height="113"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Python Implementation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StandardScaler&lt;/span&gt;
&lt;span class="n"&gt;scaler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StandardScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;span class="n"&gt;data_scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Max-Min Normalization
&lt;/h3&gt;

&lt;p&gt;It is also known as Min-Max Scaling. Also in this blog, it is also being called simply &lt;strong&gt;Scaling&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;However, in most of the places I came across it is simply known as &lt;strong&gt;Normalization&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;I Know this is confusing. Lol! But this is how I understand this.&lt;/p&gt;

&lt;p&gt;It is defined as&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"&lt;/em&gt;&lt;em&gt;Technique in which values are shifted and rescaled so that they end up ranging between 0 and 1.&lt;/em&gt;&lt;em&gt;"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here,s the formula&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhw6vfxk9s7l9u3dwo3f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhw6vfxk9s7l9u3dwo3f.png" alt="Image description" width="181" height="63"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, Xmax and Xmin are the maximum and the minimum values of the feature respectively.&lt;/p&gt;

&lt;h4&gt;
  
  
  Python Implementation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MinMaxScaler&lt;/span&gt;
&lt;span class="n"&gt;scaler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MinMaxScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;span class="n"&gt;data_scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Robust Scaling
&lt;/h2&gt;

&lt;p&gt;Use the &lt;code&gt;RobustScaler&lt;/code&gt; that will just scale the features but in this case using &lt;strong&gt;statistics that are robust to outliers&lt;/strong&gt;. This scaler removes the &lt;strong&gt;median&lt;/strong&gt; and &lt;strong&gt;scales&lt;/strong&gt; the data according to the &lt;strong&gt;quantile&lt;/strong&gt; &lt;strong&gt;range&lt;/strong&gt; (defaults to &lt;strong&gt;IQR&lt;/strong&gt;: Interquartile Range). &lt;em&gt;The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Scaling using median and quantiles consists of subtracting the median to all the observations and then dividing by the interquartile difference. It Scales features using &lt;a href="https://towardsai.net/p/statistics/descriptive-statistics-21fc0196c1df" rel="noopener noreferrer"&gt;statistics&lt;/a&gt; that are robust to outliers.&lt;/p&gt;

&lt;p&gt;The interquartile difference is the difference between the 75th and 25th quantile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IQR = 75th quantile — 25th quantile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The equation to calculate scaled values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X_scaled = (X — X.median) / IQR
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Python Implementation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.preprocessing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RobustScaler&lt;/span&gt;
&lt;span class="n"&gt;scaler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RobustScaler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
&lt;span class="n"&gt;data_scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scaler&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What are the main question when it comes to feature scaling?
&lt;/h2&gt;

&lt;p&gt;The thing is we need to answer two question mainly.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Does we need to do the feature scaling?&lt;/li&gt;
&lt;li&gt;If yes, then which method of feature scaling we need to use Standardization, Normalization, etc.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When should we use feature scaling?
&lt;/h2&gt;

&lt;p&gt;1- Gradient Descent Based Algorithms.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linear Regression&lt;/li&gt;
&lt;li&gt;Logistic Regression&lt;/li&gt;
&lt;li&gt;Neural Networks&lt;/li&gt;
&lt;li&gt;etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;2- Distance Based Algorithms&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KNN&lt;/li&gt;
&lt;li&gt;K-means&lt;/li&gt;
&lt;li&gt;SVM&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to perform standardization?
&lt;/h2&gt;

&lt;p&gt;As seen above, for distance based models, standardization is performed to prevent features with wider ranges from dominating the distance metric. But the reason we standardize data is not the same for all machine learning models, and differs from one model to another.&lt;/p&gt;

&lt;p&gt;So before which ML models and methods you have to standardize your data and why ?&lt;/p&gt;

&lt;h3&gt;
  
  
  1- BEFORE PCA:
&lt;/h3&gt;

&lt;p&gt;In &lt;a href="https://builtin.com/data-science/step-step-explanation-principal-component-analysis" rel="noopener noreferrer"&gt;Principal Component Analysis&lt;/a&gt;, features with high variances/wide ranges, get more weight than those with low variance, and consequently, they end up illegitimately dominating the First Principal Components (Components with maximum variance). I used the word “Illegitimately” here, because the reason these features have high variances compared to the other ones is just because they were measured in different scales.&lt;/p&gt;

&lt;p&gt;Standardization can prevent this, by giving same wheightage to all features.&lt;/p&gt;

&lt;h3&gt;
  
  
  2- BEFORE CLUSTERING:
&lt;/h3&gt;

&lt;p&gt;Clustering models are distance based algorithms, in order to measure similarities between observations and form clusters they use a distance metric. So, features with high ranges will have a bigger influence on the clustering. Therefore, standardization is required before building a clustering model.&lt;/p&gt;

&lt;h3&gt;
  
  
  3- BEFORE KNN:
&lt;/h3&gt;

&lt;p&gt;k-nearest neighbors is a distance based classifier that classifies new observations based on similarity measures (e.g., distance metrics) with labeled observations of the training set. Standardization makes all variables to contribute equally to the similarity measures .&lt;/p&gt;

&lt;h3&gt;
  
  
  4- BEFORE SVM
&lt;/h3&gt;

&lt;p&gt;Support Vector Machine tries to maximize the distance between the separating plane and the support vectors. If one feature has very large values, it will dominate over other features when calculating the distance. So Standardization gives all features the same influence on the distance metric.&lt;/p&gt;

&lt;h3&gt;
  
  
  5- BEFORE MEASURING VARIABLE IMPORTANCE IN REGRESSION MODELS
&lt;/h3&gt;

&lt;p&gt;You can measure variable importance in regression analysis, by fitting a regression model using the &lt;strong&gt;standardized&lt;/strong&gt; independent variables and comparing the absolute value of their standardized coefficients. But, if the independent variables are not standardized, comparing their coefficients becomes meaningless.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This one is also known as &lt;strong&gt;Feature importance&lt;/strong&gt; measuring.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  6- BEFORE LASSO AND RIDGE REGRESSION
&lt;/h3&gt;

&lt;p&gt;LASSO and Ridge regressions place a penalty on the magnitude of the coefficients associated to each variable. And the scale of variables will affect how much penalty will be applied on their coefficients. Because coefficients of variables with large variance are small and thus less penalized. Therefore, standardization is required before fitting both regressions.&lt;/p&gt;

&lt;h2&gt;
  
  
  When standardization is not needed?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  LOGISTIC REGRESSION AND TREE BASED MODELS
&lt;/h3&gt;

&lt;p&gt;Logistic Regression and Tree based algorithms such as Decision Tree, Random forest and gradient boosting, are not sensitive to the magnitude of variables. So standardization is not needed before fitting this kind of models.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to do Normalization?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbours and Neural Networks.&lt;/li&gt;
&lt;li&gt;However, at the end of the day, the choice of using normalization or standardization will depend on your problem and the machine learning algorithm you are using.&lt;/li&gt;
&lt;li&gt;There is no hard and fast rule to tell you when to normalize or standardize your data. You can always start by fitting your model to raw, normalized, and standardized data and compare the performance for the best results.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Difference b/w normalization and standardization?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Normalization is good to use when you know that the distribution of your data does not follow a Gaussian distribution. This can be useful in algorithms that do not assume any distribution of the data like K-Nearest Neighbours and Neural Networks.&lt;/li&gt;
&lt;li&gt;Standardization, on the other hand, can be helpful in cases where the data follows a Gaussian distribution. However, this does not have to be necessarily true. Also, unlike normalization, standardization does not have a bounding range. So, even if you have outliers in your data, they will not be affected by standardization.&lt;/li&gt;
&lt;li&gt;However, at the end of the day, the choice of using normalization or standardization will depend on your problem and the machine learning algorithm you are using.&lt;/li&gt;
&lt;li&gt;There is no hard and fast rule to tell you when to normalize or standardize your data. You can always start by fitting your model to raw, normalized, and standardized data and compare the performance for the best results.&lt;/li&gt;
&lt;li&gt;It is a good practice to fit the scaler on the training data and then uses it to transform the testing data. This would avoid any data leakage during the model testing process. Also, the scaling of target values is generally not required.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Visualizing unscaled, normalized and standardized data?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96m9zxah57y0y8b18hef.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96m9zxah57y0y8b18hef.png" alt="Image description" width="483" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  After Normalization
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj2ly2a214h7jf80qwl1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj2ly2a214h7jf80qwl1.png" alt="Image description" width="471" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  After Standardization
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx04qk6ckbhoth7qu5qs3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx04qk6ckbhoth7qu5qs3.png" alt="Image description" width="454" height="393"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How outliers are deal in Standardization VS. Normalization?
&lt;/h3&gt;

&lt;p&gt;For &lt;strong&gt;Standardized data&lt;/strong&gt; outliers exist as just they exist for the &lt;strong&gt;Original data&lt;/strong&gt;. In contrast to standardization, in &lt;strong&gt;Normalized data&lt;/strong&gt; the cost of having this bounded range is that we will end up with smaller standard deviations, which can suppress the effect of outliers. However, &lt;strong&gt;Normalization&lt;/strong&gt; is still sensitive to outlier but a little less than &lt;strong&gt;Standardization&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Points worth noting
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;We can see that the &lt;strong&gt;Normalized data&lt;/strong&gt; have different means. As, the &lt;strong&gt;MEAN&lt;/strong&gt; changes so does the Standard Deviation. However, the &lt;strong&gt;Standardized data&lt;/strong&gt; have the same &lt;strong&gt;MEAN&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalized data&lt;/strong&gt; have the fixed range i.e. between 0 and 1. However, the range for &lt;strong&gt;Standardized data&lt;/strong&gt; vary. &lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;Standardized data&lt;/strong&gt; outliers exist as just they exist for the &lt;strong&gt;Original data&lt;/strong&gt;. In contrast to standardization, in &lt;strong&gt;Normalized data&lt;/strong&gt; the cost of having this bounded range is that we will end up with smaller standard deviations, which can suppress the effect of outliers. However, &lt;strong&gt;Normalization&lt;/strong&gt; is still sensitive to outlier but a little less than &lt;strong&gt;Standardization&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Further Readings
&lt;/h2&gt;

&lt;h2&gt;
  
  
  1. &lt;a href="https://www.linkedin.com/pulse/standardization-machine-learning-sachin-vinay/?trk=public_profile_article_view" rel="noopener noreferrer"&gt;https://www.linkedin.com/pulse/standardization-machine-learning-sachin-vinay/?trk=public_profile_article_view&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  2. &lt;a href="https://www.kaggle.com/code/rtatman/data-cleaning-challenge-scale-and-normalize-data/notebook#Scaling-vs.-Normalization:-What's-the-difference" rel="noopener noreferrer"&gt;https://www.kaggle.com/code/rtatman/data-cleaning-challenge-scale-and-normalize-data/notebook#Scaling-vs.-Normalization:-What's-the-difference&lt;/a&gt;?
&lt;/h2&gt;

&lt;h2&gt;
  
  
  3. &lt;a href="https://www.kdnuggets.com/2020/04/data-transformation-standardization-normalization.html" rel="noopener noreferrer"&gt;https://www.kdnuggets.com/2020/04/data-transformation-standardization-normalization.html&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  4. &lt;a href="https://builtin.com/data-science/when-and-why-standardize-your-data" rel="noopener noreferrer"&gt;https://builtin.com/data-science/when-and-why-standardize-your-data&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  5. &lt;a href="https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/" rel="noopener noreferrer"&gt;https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  6. &lt;a href="https://towardsai.net/p/data-science/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff" rel="noopener noreferrer"&gt;https://towardsai.net/p/data-science/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff&lt;/a&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  7. &lt;a href="https://towardsdatascience.com/how-and-why-to-standardize-your-data-996926c2c832" rel="noopener noreferrer"&gt;https://towardsdatascience.com/how-and-why-to-standardize-your-data-996926c2c832&lt;/a&gt;
&lt;/h2&gt;

</description>
      <category>machinelearning</category>
      <category>datascience</category>
      <category>featurescaling</category>
      <category>preprocessing</category>
    </item>
    <item>
      <title>P-Value (Significane Value)</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Sun, 29 May 2022 17:26:00 +0000</pubDate>
      <link>https://dev.to/daud99/p-value-significane-value-5e48</link>
      <guid>https://dev.to/daud99/p-value-significane-value-5e48</guid>
      <description>&lt;p&gt;P-value is the probability b/w 0 and 1 that quantifies how confident we are that our Null Hypothesis is True. The larger the value the more confidence we are that our Null Hypothesis is true and vice versa.&lt;/p&gt;

&lt;h2&gt;
  
  
  Explanation
&lt;/h2&gt;

&lt;p&gt;Say we have a mean of a traffic coming to your website, then you made some changes and want to know that mean of traffic changes or not. So, you will start by establishing the Null hypothesis and Alternative Hypothesis. Null hypothesis is default and Alternative hypothesis is something which we want to prove. In this case, Null Hypothesis is &lt;strong&gt;Average Traffic doesn't change&lt;/strong&gt; and Alternative Hypothesis is &lt;strong&gt;on Average Traffic to the website increases&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We will decide the significance Level. Now, you in order to check your alternative Hypothesis hold or not you take in the sample and calculate it's Mean/Average for Traffic. Then, we will calculate the probability of getting that Mean/Average given that Null hypothesis is True this is nothing but &lt;strong&gt;p-value&lt;/strong&gt;. If it's less than significance level we will reject the Null Hypothesis if not we will not reject the Null Hypothesis. Rejecting Null Hypothesis is same as saying that we are confident that taking the random sample again we will get almost the same value that is not too far from the mean which is usually 3 z-score (3 standard deviation for normal distribution). &lt;/p&gt;

&lt;h2&gt;
  
  
  Interpreting P-value
&lt;/h2&gt;

&lt;p&gt;The close the p-value is to zero. The more confidence we will be that Null Hypothesis is True and Alternative Hypothesis is False.&lt;/p&gt;

&lt;p&gt;If p-value is less than the significance value which is usually 0.05. Then, we say that event for which we are getting this value is much away from the mean and it's so much extreme that we really need to reject the Null hypothesis.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87c1adhfntt9v9wlst2h.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87c1adhfntt9v9wlst2h.jpg" alt="Image description" width="535" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, the grey shaded area represent the area in which value falls if our p-value for an event falls under a significance level say 0.05 which is equivalent of saying that we will be rejecting the Null Hypothesis.&lt;/p&gt;

&lt;p&gt;The p-value doesn't tell us far our value is from the actual value but only tells us how confidence we are on our value it's correct or not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Significance Level
&lt;/h2&gt;

&lt;p&gt;Usually, value for Significance level is 0.05. However, it may get suggested by a domain expert. This is also known as Decision Threshold.&lt;/p&gt;

&lt;p&gt;If we can allow a greater number of False positive given our problem is not that sensitive we can have larger value for Significance Level such as 0.20. Similarly, if we have a sensitive problem such as predicting a cancer we will try to have a smaller value for Significance level such as 0.01. &lt;/p&gt;

&lt;p&gt;Rejecting a null hypothesis at .01 level meaning that there is less than a 1 in 100 chance of observing a result in this range if the Null Hypothesis were true.&lt;/p&gt;

&lt;h2&gt;
  
  
  False Positive
&lt;/h2&gt;

&lt;p&gt;Getting a small value for p-value that is less than significance level is also known as &lt;strong&gt;False Positive&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=vemZtEM63GY&amp;amp;t=10s" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=vemZtEM63GY&amp;amp;t=10s&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=zII6KLR4Lb4&amp;amp;t=3s" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=zII6KLR4Lb4&amp;amp;t=3s&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=KS6KEWaoOOE" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=KS6KEWaoOOE&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=-FtlH4svqx4" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=-FtlH4svqx4&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>pvalue</category>
      <category>significancevalue</category>
      <category>statistics</category>
      <category>datascience</category>
    </item>
    <item>
      <title>General confusion related to Feature Selection</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Fri, 27 May 2022 09:49:23 +0000</pubDate>
      <link>https://dev.to/daud99/general-confusion-related-to-feature-selection-2jdi</link>
      <guid>https://dev.to/daud99/general-confusion-related-to-feature-selection-2jdi</guid>
      <description>&lt;h2&gt;
  
  
  Should I do Feature Selection on the entire dataset?
&lt;/h2&gt;

&lt;p&gt;The answer is &lt;strong&gt;NO&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The reason being this results in Bais and data leakage. As the matter of fact we always make sure that our TEST data is absolutely unknown and it's only available to assess the performance of our machine learning model. If we are performing Feature Selection on entire dataset this statement doesn't hold true any more. &lt;/p&gt;

&lt;p&gt;The model has an unfair advantage as the Features are selected based on all the samples. &lt;/p&gt;

&lt;h2&gt;
  
  
  When should we do the feature selection?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Firstly, you should split your data into Train and Test Data.&lt;/li&gt;
&lt;li&gt;Then, You should do the feature selection on the Training data.&lt;/li&gt;
&lt;li&gt;Once, you done the feature selection on the Training data you can train your model.&lt;/li&gt;
&lt;li&gt;Now, you can select the same features from the Testing data and perform the prediction. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How our feature selection is effected in case of K Fold Cross Validation usage?
&lt;/h2&gt;

&lt;p&gt;Thing is the order remains the same. First split and then do the Feature Selection.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"CV methods are proven to be unbiased only if all the various aspects of classifier training takes place inside the CV loop. This means that all aspects of training a classifier e.g. feature selection, classifier type selection and classifier parameter tuning takes place on the data not left out during each CV loop. It has been shown that violating this principle in some ways can result in very biased estimates of the true error. "&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The right way to Cross Validate with feature selection&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nc"&gt;KFold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;n_folds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;xtrain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;xtest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ytrain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ytest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;train&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SelectKBest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f_regression&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xtrain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ytrain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;xtrain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xtrain&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_support&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
    &lt;span class="n"&gt;xtest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;xtest&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_support&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;

    &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xtrain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ytrain&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
    &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xtest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ytest&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;yp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;xtest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ytest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;o&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ytest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ytest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r-&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Predicted&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Observed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CV Score is &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Should I do Feature encoding such as One hot or Ordinal encoding before or after the Feature Selection?
&lt;/h2&gt;

&lt;p&gt;One should do Feature encoding &lt;strong&gt;before&lt;/strong&gt; the Feature selection. One intuition behind it can be as our main aim is to use Encoded feature in our machine learning model then we should find it's importance as well in the way it needs to be used in the model.&lt;/p&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/questions/56308116/should-feature-selection-be-done-before-train-test-split-or-after" rel="noopener noreferrer"&gt;https://stackoverflow.com/questions/56308116/should-feature-selection-be-done-before-train-test-split-or-after&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nodalpoint.com/not-perform-feature-selection/" rel="noopener noreferrer"&gt;https://www.nodalpoint.com/not-perform-feature-selection/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nbviewer.org/github/cs109/content/blob/master/lec_10_cross_val.ipynb" rel="noopener noreferrer"&gt;https://nbviewer.org/github/cs109/content/blob/master/lec_10_cross_val.ipynb&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stats.stackexchange.com/questions/64825/should-feature-selection-be-performed-only-on-training-data-or-all-data" rel="noopener noreferrer"&gt;https://stats.stackexchange.com/questions/64825/should-feature-selection-be-performed-only-on-training-data-or-all-data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://followthedata.wordpress.com/2013/10/30/the-importance-of-proper-cross-validation-and-experimental-design/" rel="noopener noreferrer"&gt;https://followthedata.wordpress.com/2013/10/30/the-importance-of-proper-cross-validation-and-experimental-design/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://datascience.stackexchange.com/questions/95071/should-i-do-one-hot-encoding-before-feature-selection-and-how-should-i-perform-f" rel="noopener noreferrer"&gt;https://datascience.stackexchange.com/questions/95071/should-i-do-one-hot-encoding-before-feature-selection-and-how-should-i-perform-f&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stats.stackexchange.com/questions/440372/feature-selection-before-or-after-encoding" rel="noopener noreferrer"&gt;https://stats.stackexchange.com/questions/440372/feature-selection-before-or-after-encoding&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>featureselection</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>scikitlearn</category>
    </item>
    <item>
      <title>Embedded Methods for Feature Selection</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Thu, 26 May 2022 18:01:11 +0000</pubDate>
      <link>https://dev.to/daud99/embedded-methods-for-feature-selection-419d</link>
      <guid>https://dev.to/daud99/embedded-methods-for-feature-selection-419d</guid>
      <description>&lt;h2&gt;
  
  
  L1 Regularized Logistic Regression
&lt;/h2&gt;

&lt;p&gt;Let's have a brief overview of &lt;strong&gt;Regularization&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Regularization help us with the problem of overfitting model on the training dataset. Instead of just decreasing the LOSS function we also penalize the model complexity. &lt;/p&gt;

&lt;p&gt;There are different form of Regularizatoin.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;L1 Regularization.&lt;/li&gt;
&lt;li&gt;L2 Regularization.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We will be discussing them in detail in the future blogs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using L1 (LASSO) Regularization
&lt;/h2&gt;

&lt;p&gt;This L1 regularization can also be use as one of the method of feature selection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowqq8ur8572odjnjrbxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowqq8ur8572odjnjrbxd.png" alt="Image description" width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will discuss the loss function minimising part in some other blog. For this discussion, the interesting part is the &lt;strong&gt;N1 Norm&lt;/strong&gt; it plays the main role in feature selection. This &lt;strong&gt;N1 Norm&lt;/strong&gt; is the measure of how big the weights are. Here, &lt;strong&gt;m&lt;/strong&gt; is the number of features in the dataset whereas |W| is the absolute sum of all the weights. You can think of &lt;strong&gt;Lambda&lt;/strong&gt; as the scaling factor it's a hyper parameter which we have to tune when we use it in practice. This &lt;strong&gt;N1 Norm&lt;/strong&gt; is the penalty against the LOSS function greater the complexity greater will be the penalty and vice versa.&lt;/p&gt;

&lt;p&gt;Our final LOSS function is&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6tfct8lki2aktfp9sy2a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6tfct8lki2aktfp9sy2a.png" alt="Image description" width="800" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Our goal is to minimize the overall LOSS function but the &lt;strong&gt;N1 NORM&lt;/strong&gt; added a large positive number. So, if we want to minimize the loss we also need to minimize this term which can only be possible if we use least weights (less complex model). So, our goal is to find &lt;strong&gt;weights&lt;/strong&gt; which are not only good for predictions but also the smallest possible weights to make overall loss function less.&lt;/p&gt;

&lt;p&gt;If we have large Lambda term, the trade of b/w minimizing the solution term and the global loss function lies where one of the weights is zero or usually more than one with is zero.&lt;/p&gt;

&lt;h3&gt;
  
  
  How can we use it for feature selection?
&lt;/h3&gt;

&lt;p&gt;As we know the greater the weight the more important/value-able is the feature. So, we can remove the features with zero weight or least weights or select the features with the most weights.&lt;/p&gt;

&lt;p&gt;Most of this info is derived from this &lt;a href="https://www.youtube.com/watch?v=_aGWjt7GKBE&amp;amp;list=PLTKMiZHVd_2KyGirGEvKlniaWeLOHhUF3&amp;amp;index=89" rel="noopener noreferrer"&gt;video&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using Decision Trees &amp;amp; Random Forest
&lt;/h2&gt;

&lt;p&gt;In Logistic Regression, we are using all of the feature unless we are using L1 regularization which zero out some of the features. However, in Decision Tree the features selection is done implicitly. It is done such that a feature is selected which reduces the entropy the most. The goal is get the Entropy to 0. There can other criteria for selection of feature such as Gini or any other impurity in General. Also known as Information Gain.&lt;/p&gt;

&lt;p&gt;Decision tree perform feature selection implicitly.&lt;/p&gt;

</description>
      <category>filtermethod</category>
      <category>featureselection</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
    <item>
      <title>Feature Selection for Dimensionality Reduction</title>
      <dc:creator>daud99</dc:creator>
      <pubDate>Thu, 26 May 2022 14:05:15 +0000</pubDate>
      <link>https://dev.to/daud99/feature-selection-for-dimensionality-reduction-kha</link>
      <guid>https://dev.to/daud99/feature-selection-for-dimensionality-reduction-kha</guid>
      <description>&lt;p&gt;There are three broad categories of methods for Feature Selection.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Filter Methods&lt;/li&gt;
&lt;li&gt;Embedded Methods&lt;/li&gt;
&lt;li&gt;Wrapper Methods&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Filter Methods.
&lt;/h2&gt;

&lt;p&gt;These methods are based on the &lt;strong&gt;intrinsic&lt;/strong&gt; (natural) properties of the features itself. We don't use any classifier or model at this point.&lt;/p&gt;

&lt;h3&gt;
  
  
  Univariate Statistics
&lt;/h3&gt;

&lt;p&gt;If the variance for the feature is large such that data points are very spread out it tells that these features data points are very useful for distinguishing b/w different training examples. It will be easy to come up with boundaries to distinguish different data points if there is variance. The larger the variance the more better it is so we can simply remove the features with low variance. This is also known as &lt;strong&gt;UNIVARIATE Statistic&lt;/strong&gt;. As only a single feature is involve. Another fancy term we often use is &lt;strong&gt;Information Gain&lt;/strong&gt; that how much a feature contribute in distinguishing different data points.&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Using Simply Threshold
&lt;/h4&gt;

&lt;p&gt;The advantage of calculating variance is that it's really fast. And, the major disadvantage is that it doesn't take into account the relationship among features.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Seperating fetures from Target Variable
&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;float64&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Label&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.feature_selection&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VarianceThreshold&lt;/span&gt;
&lt;span class="c1"&gt;# We want to see column here 95% of the value in a feature/column is same 
# Selecting features with less than 5% variance
&lt;/span&gt;&lt;span class="n"&gt;var_constant&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;
&lt;span class="n"&gt;var_thr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VarianceThreshold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;var_constant&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;span class="n"&gt;var_thr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;variance_stat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;var_thr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_support&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variance_stat&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;variance_stat&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; have low variance than &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;var_constant&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;% out of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; features.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Following are the features with low variance&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;variance_stat&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bivariate Statistics
&lt;/h3&gt;

&lt;p&gt;If we are involving more than one feature for the computation than it's nothing but Bivariate Statistics. &lt;/p&gt;

&lt;h4&gt;
  
  
  1. Pairwise Correlation
&lt;/h4&gt;

&lt;p&gt;When two features are very correlated then we know that one feature is redundant and probably we can remove that from the dataset without losing too much information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;feature_corr_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# (width, height)
&lt;/span&gt;&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;feature_corr_matrix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;annot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RdYlGn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Correlation with target Variable
&lt;/h4&gt;

&lt;p&gt;If we have a feature which are highly correlated with target variable than they are good features to use. Especially in the case of Linear Regression.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;corelationHeatMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;corr_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;corr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
  &lt;span class="n"&gt;corr_matrix_cols&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;corr_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;
  &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# (width, height)
&lt;/span&gt;  &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="nf"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corr_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;col_name&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corr_matrix&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;annot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RdYlGn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pass in the name of your Target variable to the above function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;corelationHeatMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Using Anova
&lt;/h4&gt;

&lt;p&gt;We know that Standard Deviation tell us about how spread out the data is in other words how much our data points deviates from mean on average. Whereas, Variance is nothing but the square of the Standard Deviation it also help us to understand the correlation b/w variables.&lt;/p&gt;

&lt;p&gt;Anova also help us to find the correlation b/w the variables.&lt;/p&gt;

&lt;p&gt;When we use Anova we end up with the value known as F Ratio or F Statistic. This tells us how confidentially we can say there is a correlation b/w the variables. There is a Null Hypothesis which says there is no correlation b/w Variable and Alternate Hypothesis saying there is correlation b/w Variable. Just like p-value is less than significant level here similarly if F ratio is less than significant level than we will reject the Null Hypothesis and accept the Alternate Hypothesis.&lt;/p&gt;

&lt;p&gt;When we can use Anova Test if your features are Numerical and Target Variable is Categorical or Numerical then we can use the Anova Test.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6zax3d51zanoku41hfi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm6zax3d51zanoku41hfi.png" alt="Image description" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How Anova Selection for Feature Selection?&lt;/p&gt;

&lt;p&gt;The thing is the F Ratio is calculate for each feature with the Target variable. We select the features with the highest F Ratio/Score as they are the most important.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedded Methods
&lt;/h2&gt;

&lt;p&gt;These methods actually involves the model. The model is used with the goal of optimising by selecting the best features. For example: &lt;strong&gt;Decision Tree&lt;/strong&gt; each time we split the node we compare all the different features to select the feature with maximum information gain. So, our goal is to find the features which maximize the information gain when we use it for splitting. We can say that decision tree is actually selecting features while growing the tree. We will be selecting the features that result in most information gain. Usually the features which are used more higher up are the most important one as they have the maximum information gain in the decision tree. &lt;/p&gt;

&lt;p&gt;This is just one of the many examples. We will look each of them in great details in the upcoming blogs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapper Methods
&lt;/h2&gt;

&lt;p&gt;They are based on our main objectives. For instance if we are interested in improving our prediction performance or time of prediction or may be training time. So, we may have different best features depending upon our objective.&lt;/p&gt;

&lt;p&gt;What we do is basically we fit our models based on different subsets of features. And, see what are the values/performance of the model using different subsets as per our main objective. This help us selecting the best features.&lt;/p&gt;

&lt;p&gt;The Wrapper Method is really expensive as it takes a lot of time to compute the result for each subset of the features as compare to the Univariate Statistics such as variance. So it's quite computationally expensive but it's also very good as it is directly dealing with the intended results.&lt;/p&gt;

</description>
      <category>featureselection</category>
      <category>filtermethod</category>
      <category>wrapper</category>
      <category>embedded</category>
    </item>
  </channel>
</rss>
