<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Erika Dietrick</title>
    <description>The latest articles on DEV Community by Erika Dietrick (@erika_thedev).</description>
    <link>https://dev.to/erika_thedev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1363469%2Fb6c97be8-cf48-4af7-a61d-407e3b6b4638.jpg</url>
      <title>DEV Community: Erika Dietrick</title>
      <link>https://dev.to/erika_thedev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/erika_thedev"/>
    <language>en</language>
    <item>
      <title>Answering The Top 10 Questions I'm Asked About GitHub Copilot</title>
      <dc:creator>Erika Dietrick</dc:creator>
      <pubDate>Mon, 09 Sep 2024 12:06:00 +0000</pubDate>
      <link>https://dev.to/erika_thedev/answering-the-top-10-questions-im-asked-about-github-copilot-2ag6</link>
      <guid>https://dev.to/erika_thedev/answering-the-top-10-questions-im-asked-about-github-copilot-2ag6</guid>
      <description>&lt;p&gt;The advent of AI coding assistants has been somewhat surreal. Suddenly, after years of raiding Stack Overflow, dog-earring textbooks, and dumping all of my deepest darkest secrets to a rubber ducky, we have tools now that are smart enough to generate code for you. &lt;/p&gt;

&lt;p&gt;This sounds great in theory, but in practice, plenty of questions remain to be answered. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;(...And even if you're eager to jump in, how does one even know which coding assistant to choose?)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Well, we all have to start somewhere! I initially settled on learning &lt;a href="https://github.com/features/copilot" rel="noopener noreferrer"&gt;GitHub Copilot&lt;/a&gt; - a leading AI coding assistant on the market (at least, as of February 2024, when my journey all started). &lt;/p&gt;

&lt;p&gt;Earlier this year, I went on a winding journey of trial and error so that I could ultimately, and definitively, answer one question: "How useful or reliable is GitHub Copilot &lt;em&gt;really&lt;/em&gt;?"&lt;/p&gt;

&lt;p&gt;I gave a &lt;a href="https://www.ciscolive.com/on-demand/on-demand-library.html?search=erika%20dietrick#/session/1717269199375001te2K" rel="noopener noreferrer"&gt;Cisco Live talk&lt;/a&gt; on this for your enjoyment, but in this article, I'll be answering the top 10 questions I get asked most frequently about adopting GitHub Copilot. &lt;/p&gt;

&lt;p&gt;Let's dive in!&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Are they training on your data?
&lt;/h2&gt;

&lt;p&gt;If you tend to mistrust corporations and their handling of our data, I won't try to convince you otherwise. However, I can point you to GitHub Copilot's Responsible AI policy, which at this moment states that, "The model that powers Copilot is trained on a broad collection of publicly accessible code, which may include copyrighted code, and Copilot’s suggestions (in rare instances) may resemble the code its model was trained on." In settings, you can also &lt;a href="https://docs.github.com/en/copilot/managing-copilot/managing-copilot-as-an-individual-subscriber/managing-copilot-policies-as-an-individual-subscriber" rel="noopener noreferrer"&gt;disable prompt and suggestion collection&lt;/a&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  2. Why Copilot over another coding assistant?
&lt;/h2&gt;

&lt;p&gt;LLMs are constantly improving and competing with one another to be "the best" or gold standard. In most cases, I do not specifically advocate for using Copilot, but for using a coding assistant that was trained for this specific purpose so as to improve the accuracy of suggestions. Beyond this, whether or not you should use Copilot is likely "it depends." Are you working in GitHub? You may appreciate the GitHub-specific features. Do the alternatives you're considering work within your IDE? If not, Copilot will provide more contextually relevant responses.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Should new developers be using Copilot?
&lt;/h2&gt;

&lt;p&gt;Yes, but after learning foundational concepts and writing their first functional app/script. While many may shudder that I would suggest a novice use an AI coding assistant, what I actually preach is that beginners use Copilot to learn and ask questions via Copilot Chat. Students already heavily rely on Google, and in my experience, Copilot Chat far exceeds Google in terms of quality of result and speed to finding a good result.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Do you have to pay for a subscription?
&lt;/h2&gt;

&lt;p&gt;Yes. The subscription for individuals is currently $10 USD/month, but anyone can do a free trial, and it appears to be free for certain populations (i.e. students). There are also packages for Business ($19) and Enterprise ($39). &lt;/p&gt;

&lt;h2&gt;
  
  
  5. Is it just intelligent autocomplete?
&lt;/h2&gt;

&lt;p&gt;The autocomplete is great, but it really just scratches the surface of why you would benefit from using Copilot in your workflow. Copilot can assist with document and test generation, with debugging, and with refactoring (i.e. improved variable names). The chat functionality is phenomenal for learning and is contextually relevant to your codebase. I also go into what I call "Comment-Driven Development" in my Cisco Live talk. &lt;/p&gt;

&lt;h2&gt;
  
  
  6. Can I actually trust the answers it provides?
&lt;/h2&gt;

&lt;p&gt;It depends on what you're trying to do as well as what you mean by "trust." If by trust you mean blindly generate and accept code suggestions, the answer, at this time, will always be no. To boil down a long talk - the smaller the task that you're asking Copilot to complete, the more likely it is to generate something helpful. But at the end of the day, LLMs are using advanced machine learning algorithms to make predictions; no matter how effective those algorithms are, there is always a chance that the predicted - and provided - response is wrong. &lt;/p&gt;

&lt;h2&gt;
  
  
  7. Will it create entire applications from scratch?
&lt;/h2&gt;

&lt;p&gt;No (at least, not at the time of writing). I've meticulously tested various prompts and - except in very basic use cases - was unable to scaffold entire scripts or apps without the need for significant review and modification. In my opinion, this makes code scaffolding not worth trying at this point in time.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Do you have to use it with GitHub, or can you use another repo?
&lt;/h2&gt;

&lt;p&gt;Copilot works within your IDE; so as long as you're using a compatible IDE, you're good to go! (Keep in mind, however, that depending on the package you're subscribed to, you may be missing out on some GitHub-specific features, like pull request description and summarization.)&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Is using Copilot just asking to introduce security vulnerabilities?
&lt;/h2&gt;

&lt;p&gt;Copilot is trained on public data, meaning you do run the risk of adopting insecure code or code practices. That being said, this is really no different from when we were copy and pasting code from the internet. As you will hear me say often - what's most important here is that you are not blindly generating code without evaluating it. (And in fact, in my Cisco Live talk, I mention how Copilot can be used to _improve _code security.)&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Do we still need to learn to code?
&lt;/h2&gt;

&lt;p&gt;Absolutely. In order for code generation and assistance to be effective, we have to keep in mind that "assistance" is the key word here. You are still the brain. Your knowledge, skill, and experience are required, both for effectively leveraging Copilot and for evaluating the output it generates. (And as far as job security and all that... well... we're nowhere near being replaced quite yet.) &lt;/p&gt;

&lt;h2&gt;
  
  
  BONUS - 11. Does GitHub own my code if I use Copilot?
&lt;/h2&gt;

&lt;p&gt;Not according to GitHub Copilot's Responsible AI policy. "If a suggestion is capable of being owned, our terms are clear: GitHub does not claim ownership."&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>productivity</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>Automate identification of uncommon DNS requests with Cisco Umbrella API</title>
      <dc:creator>Erika Dietrick</dc:creator>
      <pubDate>Fri, 22 Mar 2024 18:25:37 +0000</pubDate>
      <link>https://dev.to/cisco-devnet/automate-identification-of-uncommon-dns-requests-with-cisco-umbrella-api-gm7</link>
      <guid>https://dev.to/cisco-devnet/automate-identification-of-uncommon-dns-requests-with-cisco-umbrella-api-gm7</guid>
      <description>&lt;p&gt;Many corporate networks are processing massive amounts of internet traffic, which poses a monitoring and security challenge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With so much activity, how do we know what should be investigated?&lt;/li&gt;
&lt;li&gt;Better yet... how can we &lt;em&gt;proactively&lt;/em&gt; identify internet traffic that is worth investigation &lt;em&gt;before&lt;/em&gt; there's a security incident?&lt;/li&gt;
&lt;li&gt;And most importantly... &lt;em&gt;can we automate this?&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We answer all of these questions for Cisco Umbrella users in this article. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Not an Umbrella customer? Check out our &lt;a href="https://devnetsandbox.cisco.com/DevNet/catalog/umbrella-secure-internet-gateway"&gt;always-on sandbox&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What is Cisco Umbrella?&lt;/li&gt;
&lt;li&gt;
Generating an Umbrella Admin API Key
&lt;ul&gt;
&lt;li&gt;Securely using the API Key and Key Secret&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
Planning our script

&lt;ul&gt;
&lt;li&gt;Authenticating with Umbrella API&lt;/li&gt;
&lt;li&gt;Umbrella Reports API&lt;/li&gt;
&lt;li&gt;Leveraging Umbrella's global usage data&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;
Filtering and formatting files for comparison

&lt;ul&gt;
&lt;li&gt;Cleaning up our top_destinations.csv file&lt;/li&gt;
&lt;li&gt;Cleaning up the Top 1-Million CSV file&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;
&lt;li&gt;Finding uncommon domains&lt;/li&gt;
&lt;li&gt;Removing old files&lt;/li&gt;
&lt;li&gt;Running the script&lt;/li&gt;
&lt;li&gt;Cisco DevNet sample code&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What is Cisco Umbrella? &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fews3l4oap7k26howhmvg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fews3l4oap7k26howhmvg.png" alt="A portion of the Umbrella dashboard displaying Activity Search" width="800" height="525"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're unfamiliar with Umbrella, we colloquially refer to it as "internet security" or DNS security. While all Umbrella packages allow you to create DNS policies and access a variety of reports on your network's internet activity, other packages have additional features ranging from web policy to data loss prevention (DLP) policy to the Investigate API (proactive threat research).&lt;/p&gt;

&lt;p&gt;The image above shows a small snippet of the Umbrella interface, in which I navigated to the Activity Search. (This is because I had just configured Umbrella, so my dashboard of the past 24 hours was empty.)&lt;/p&gt;

&lt;h2&gt;
  
  
  Generating an Umbrella Admin API Key &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;In the leftside menu, navigate to Admin &amp;gt; API Keys.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqd2u8k29k7oomgefbcdm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqd2u8k29k7oomgefbcdm.png" alt="The Umbrella navigation menu with API Keys highlighted" width="279" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the top right corner of the screen, you should then see a circular Add button. Click that, then fill out the information for creating a new Admin API Key.&lt;/p&gt;

&lt;p&gt;The photo below provides an example of how to fill this out. What's important is that you can choose the correct scope (Reports &amp;gt; Aggregations: Read-Only) and that you choose an expiration date that isn't today.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrahbp9vqx8nrxfc3bca.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrahbp9vqx8nrxfc3bca.png" alt="Admin API Key creation form" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you've filled out the form above, click the Create Key button.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz88r3qk7ypnrso81djb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyz88r3qk7ypnrso81djb.png" alt="Generated Admin API Key and Key Secret" width="800" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You'll now see an API Key and Key Secret, as shown above. Copy both of these -- they will only be displayed once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Securely using the API Key and Key Secret &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;We'll need the API Key and Key Secret we just generated in order to communicate with Umbrella; but if we're using a version-controlled repository, hardcoding those credentials into the script and pushing them to the repository will expose our sensitive credentials to others.&lt;/p&gt;

&lt;p&gt;While you can secure credentials in multiple ways, we'll create a .env file to store them in.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API_KEY=apikeygoeshere
KEY_SECRET=keysecretgoeshere
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, to ensure this .env file is not pushed to our repository, we'll create a .gitignore file using this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;touch .gitignore
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the .gitignore file is created, we'll add the name of any files we want to be ignored inside the file -- in this case, only .env.&lt;/p&gt;

&lt;p&gt;Finally, our script will need to access these credentials despite the fact that they aren't hardcoded into the script. &lt;/p&gt;

&lt;p&gt;In Python, we'll include the following import statements to accomplish this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import os
from dotenv import load_dotenv 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we'll load the credentials into the script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;load_dotenv()

# Environmental variables should contain your org's values in .env file.
client_key = os.environ['API_KEY']
client_secret = os.environ['KEY_SECRET']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Planning our script &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Okay, so now we have an API Key and Key Secret from which we can retrieve DNS traffic from Umbrella. Now, we have 3 questions to answer: &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What's required to authenticate with the Umbrella API? &lt;/li&gt;
&lt;li&gt;Which API call should we be making to Umbrella?&lt;/li&gt;
&lt;li&gt;How can we sift through the DNS traffic to determine what is "uncommon"?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Authenticating with Umbrella API &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Authentication with any Umbrella API requires not just an API Key and Key Secret (which we generated earlier), but an &lt;a href="https://developer.cisco.com/docs/cloud-security/#!authentication/generate-an-api-access-token"&gt;access token&lt;/a&gt;, which expires after 1 hour. &lt;/p&gt;

&lt;p&gt;In Python, we'll first need to import the requests library to make an API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import requests
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, we write a function that generates an access token using the correct API endpoint. Because we'll run this script on a weekly basis, but the access token only lasts an hour, we'll make sure we call this function first each time the script runs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Relevant v2 Umbrella API endpoints
base_url = "https://api.umbrella.com"
access_token_endpoint = f"{base_url}/auth/v2/token"

# Generate new access token as these expire after 1 hour. Requires a valid and unexpired Umbrella API Key and Key Secret.
def generate_access_token():

    response = requests.post(url=access_token_endpoint,auth=(client_key,client_secret))
    access_token = response.json()['access_token']

    return access_token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Umbrella Reports API &lt;a&gt;
&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;By reviewing the &lt;a href="https://developer.cisco.com/docs/cloud-security/#!authentication"&gt;Umbrella API documentation&lt;/a&gt;, we see that using the &lt;a href="https://developer.cisco.com/docs/cloud-security/#!reporting-overview/reporting"&gt;Reports API&lt;/a&gt; will retrieve information about the traffic coming through the Umbrella network; specifically, the &lt;a href="https://developer.cisco.com/docs/cloud-security/#!reporting-api-reference-api-top-destinations-top-destinations"&gt;getTopDestinations&lt;/a&gt; endpoint.&lt;/p&gt;

&lt;p&gt;We'll first create a variable for the Top Destinations endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Relevant v2 Umbrella API endpoints
base_url = "https://api.umbrella.com"
access_token_endpoint = f"{base_url}/auth/v2/token"
top_destinations_endpoint = f"{base_url}/reports/v2/top-destinations"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we define the headers and parameters (as defined in the API documentation) before making the Top Destinations API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Get the Top Destinations visited from 7 days ago until now. Top 1000 domains are returned.
def get_top_destinations(access_token):  

    headers = {
    "Authorization": "Bearer " + access_token,
    "Content-Type": "application/json",
    "Accept": "application/json"
    } 

    params = {
        "from": "-7days",
        "to": "now",
        "offset": "0",
        "limit": 1000
    }

    top_destinations_request = requests.get(top_destinations_endpoint, headers=headers,params=params)
    top_destinations = top_destinations_request.json()

    return top_destinations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll notice that we've set parameters to pull Top Destinations from 7 days ago until now. (Specifying &lt;em&gt;now&lt;/em&gt; is a supported option based on documentation.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Leveraging Umbrella's global usage data &lt;a&gt;
&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;The best way to determine what is abnormal or uncommon? Find a way to establish a baseline or "normal." &lt;/p&gt;

&lt;p&gt;Fortunately, Umbrella posts a &lt;a href="https://s3-us-west-1.amazonaws.com/umbrella-static/index.html"&gt;Popularity List&lt;/a&gt; daily. According to Cisco Umbrella, this list "contains our most queried domains based on passive DNS usage across our Umbrella global network of more than 100 Billion requests per day with 65 million unique active users, in more than 165 countries."&lt;/p&gt;

&lt;p&gt;1-million of the most commonly queried domains should be a sufficient baseline of "normal." &lt;/p&gt;

&lt;p&gt;We'll make a GET API call to retrieve the Umbrella Top 1-Million. (Yes, I'm realizing now I should have stored the URL in a variable to be consistent.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Download the Umbrella top 1 million destinations, unzip file, format file. 
def get_top_million():

    # API call to get Umbrella Top 1 Million as a zip file
    get_top_1million_zip = requests.get("http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function isn't yet complete. Let's discuss what else needs to be considered before we finish this function. &lt;/p&gt;

&lt;h2&gt;
  
  
  Filtering and formatting files for comparison &lt;a&gt;
&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We've now made two API calls: one to retrieve the top 1000 destinations seen by our Umbrella network over the past week, and one to retrieve the Top 1-Million domains seen by the Umbrella network globally.&lt;/p&gt;

&lt;p&gt;We'll want to clean up these files so that they're easier to compare and the resulting file is meaningful. &lt;/p&gt;

&lt;h3&gt;
  
  
  Cleaning up our top_destinations.csv file &lt;a&gt;
&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Our CSV file successfully returns our network's top destinations, but not all of those destinations are domains -- we'll also see IP addresses. &lt;/p&gt;

&lt;p&gt;While those IP addresses may be worth investigating, they cannot be compared to Umbrella's Top 1-Million, which is a list of domains only. For this reason, we'll want to filter out IP addresses.&lt;/p&gt;

&lt;p&gt;First we'll import Python libraries that will help us check for IP addresses and handle CSV files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from IPy import IP
import csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we'll add logic that checks if something is an IP address.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def isIP(str):
    try:
        IP(str)
    except ValueError:
        return False
    return True 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We'll then incorporate our logic in a new function that takes our csv file as a parameter. For each line of the csv, if the content &lt;em&gt;isn't&lt;/em&gt; an IP address, we'll add it to a list called destinations_list.&lt;/p&gt;

&lt;p&gt;After that, we'll write that "domains only" list to a new csv.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# If destination in Top Destinations is a domain, write it as a new line in a CSV called top_destinations.csv.
def top_destinations_to_csv(top_destinations):

    destinations_list = []

    for destination in top_destinations['data']:
        if not isIP(destination['domain']):
            destinations_list.append(destination['domain'])

    top_destinations_csvfile = open('top_destinations.csv', 'w')

    with open('top_destinations.csv', 'w', newline='') as top_destinations_csvfile: 
        filewriter = csv.writer(top_destinations_csvfile, delimiter=',',
            quotechar='|', quoting=csv.QUOTE_MINIMAL)
        for destination in destinations_list: 
            filewriter.writerow([destination])
        top_destinations_csvfile.close()

    return top_destinations_csvfile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cleaning up the Top 1-Million CSV file &lt;a&gt;
&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Our top destinations file is now just rows upon rows of domains, but how about our Top 1-Million file?&lt;/p&gt;

&lt;p&gt;When we made our GET API call, we received a zipfile in return. We'll first import a library to work with that zipfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import zipfile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We then need to write that zip file to disk, create a fresh csv file to save the cleaned up version to, and unzip the file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Download the Umbrella top 1 million destinations, unzip file, format file. 
def get_top_million():

    # API call to get Umbrella Top 1 Million as a zip file
    get_top_1million_zip = requests.get("http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip")

    # Write the zip file to disk
    open('top-1m.csv.zip', 'wb').write(get_top_1million_zip.content)

    # Create a new CSV file to write the cleaned up Top 1 Million to
    top_1million_csv = 'top-1m.csv'

    # Unzip the file
    with zipfile.ZipFile('top-1m.csv.zip', 'r') as zip_ref: 
        zip_ref.extractall('.')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The unzipped Top 1-Million file looks something like this: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70l5nusx4v3prb1zb6k8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70l5nusx4v3prb1zb6k8.png" alt="Abbreviated example of Umbrella Top 1-Million output" width="258" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need to remove that rank order so that we can just compare domains. To do this, we import a Python library to help us format the csv:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import pandas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we drop that rank order column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    # Removing rank order in first column so that we can compare domains to Top Destinations. 
    top_1million_csv = pandas.read_csv('top-1m.csv')
    first_column = top_1million_csv.columns[0]
    top_1million_csv = top_1million_csv.drop([first_column], axis=1)
    top_1million_csv.to_csv('top_1million_csv', index=False)

    return top_1million_csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Finding uncommon domains &lt;a&gt;
&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We're finally ready to find those uncommon domains! To do this, we compare both CSVs by opening and reading each line of the files. For each domain in our network's top destinations that does &lt;em&gt;not&lt;/em&gt; appear in the Umbrella Top-1 Million, we write that domain as a line in our final CSV named uncommon_domains.csv.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Compares each domain in top_destinations.csv to top-1m.csv (Umbrella's Top 1 Million) and returns any domains that are not in the Top 1 Million.
def find_uncommon_domains():

    top_destinations_file_path = "./top_destinations.csv"
    top_1million_file_path = "./top_1million_csv"

    uncommon_domains_file_path = "./uncommon_domains.csv"

    with open(uncommon_domains_file_path, 'w') as uncommon_domains_csv:

        top_destinations = open(top_destinations_file_path).readlines()
        top_1million = open(top_1million_file_path).readlines()

        for domain in top_destinations:
            if domain not in top_1million: 
                uncommon_domains_csv.write(domain)

    print(f"Uncommon domains have been written to uncommon_domains.csv in your current directory.")

    return uncommon_domains_csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Removing old files &lt;a&gt;
&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;This part is a nicety, but let's remove all of the CSV files besides the resulting uncommon_domains.py to avoid confusion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Clean up files used to determine uncommon domains.
def clean_up_files():
    os.remove('top-1m.csv.zip')
    os.remove('top-1m.csv')
    os.remove('top_destinations.csv')
    os.remove('top_1million_csv')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running the script &lt;a&gt;
&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We'll write a main function that runs when the script runs, calling the relevant functions in order:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Main function
def main():

    access_token = generate_access_token()
    top_destinations = get_top_destinations(access_token)
    top_destinations_csvfile = top_destinations_to_csv(top_destinations)
    cleaned_top_1million_csv = get_top_million()
    uncommon_domains_csv = find_uncommon_domains()
    clean_up_files()

    return uncommon_domains_csv

if __name__ == "__main__":
    main()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can find an example of the output provided in the resulting uncommon_domains.csv below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtimd6mfdyt317fg0tfg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjtimd6mfdyt317fg0tfg.png" alt="Example output of uncommon_domains.csv" width="291" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Cisco DevNet sample code &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;This is an official submission in Cisco Code Exchange, including a suggested use case, available &lt;a href="https://developer.cisco.com/codeexchange/github/repo/erdietri/UmbrellaUncommonDomains/"&gt;here&lt;/a&gt;. You can also access the sample code directly on &lt;a href="https://github.com/erdietri/UmbrellaUncommonDomains"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>cisco</category>
      <category>api</category>
      <category>automation</category>
    </item>
  </channel>
</rss>
