<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Brian</title>
    <description>The latest articles on DEV Community by Brian (@bdmorin).</description>
    <link>https://dev.to/bdmorin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F27114%2F31b72044-3ef1-4411-b88b-08aaf2c678e1.jpeg</url>
      <title>DEV Community: Brian</title>
      <link>https://dev.to/bdmorin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bdmorin"/>
    <language>en</language>
    <item>
      <title>AWS: Wherein I do things the hardest way possible.</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Fri, 09 Apr 2021 20:08:26 +0000</pubDate>
      <link>https://dev.to/bdmorin/aws-wherein-i-do-things-the-hardest-way-possible-3b71</link>
      <guid>https://dev.to/bdmorin/aws-wherein-i-do-things-the-hardest-way-possible-3b71</guid>
      <description>&lt;p&gt;I've used AWS for various projects, but not gotten extremely in depth. I've recently taken a contract that has stretched my AWS skills to the limit, and I've had to learn lots. I'm going to catalog some things senarios that others might find useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Give me all external Interface IPs so I can scan them
&lt;/h2&gt;

&lt;p&gt;So, I need to lock down an environment. I need all the IP addresses. There's likely tons of ways to do this, here's how I torturted myself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 1
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 describe-network-interfaces \
  --query "NetworkInterfaces[].Association.PublicIp" \
  --output yaml | sort -V | awk '{print $2}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lists ALL the network interfaces in your AWS account. This might be perferable depending on your needs.&lt;/p&gt;

&lt;p&gt;So, you can list your interfaces and output them in &lt;code&gt;json&lt;/code&gt;, &lt;code&gt;yaml&lt;/code&gt;, &lt;code&gt;text&lt;/code&gt;, and &lt;code&gt;table&lt;/code&gt;. Now I would've thought that table would give me an IP address per line, it doesn't it just gloms then all on the same line, probably with a tab delimiter or something, I was so disgusted with this output I switched to YAML, since it required little processing. &lt;/p&gt;

&lt;p&gt;Alternatively I could've used &lt;code&gt;jq&lt;/code&gt; to process the default json output. jq isn't always installed everywhere, so I opted for regularly installed tools. &lt;code&gt;sort -V&lt;/code&gt; properly sorts IP addresses, and &lt;code&gt;awk&lt;/code&gt; removes the &lt;code&gt;-&lt;/code&gt; from the yaml output. &lt;/p&gt;

&lt;p&gt;MacOS users, consider using all GNU cli utilies instead of apple utilies. &lt;code&gt;awk&lt;/code&gt; isn't different, but &lt;code&gt;sort&lt;/code&gt; is.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❯ sort --version
2.3-Apple (106)
❯ gsort --version
sort (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later &amp;lt;https://gnu.org/licenses/gpl.html&amp;gt;.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I don't know what apple sort is, however gsort is posix compliant and the arguments you learn with it will translate to linux, or windows &lt;code&gt;cygwin&lt;/code&gt;/&lt;code&gt;WSL&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install coreutils
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;THEN.. I discovered why it was putting it all on one line, if you don't specify the final value as anything other than a string, it will just output the value with nothing else, always surround the final value in brackets, and you'll get the one line per functionality you'd assume is default.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 describe-network-interfaces \
  --query "NetworkInterfaces[].Association.[PublicIp]" \
  --output=text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].[PublicIpAddress]" \
  --output=text | sort -V
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will list only IP addresses from your &lt;code&gt;instances&lt;/code&gt;. This won't list all public IP addresses from other services you might have with AWS.&lt;/p&gt;

&lt;p&gt;Oddly, again, the output threw me. I was getting an annoying &lt;code&gt;None&lt;/code&gt; in the output. Ugh.. Fine. I'll &lt;code&gt;grep -v None&lt;/code&gt; it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].[PublicIpAddress]" \
  --output=text | sort -V | grep -v None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;but... why?&lt;/p&gt;

&lt;p&gt;Turns out in this output, the PublicIPAddress is handled differently in the raw data that comes back from AWS. Specifically calling it out in &lt;code&gt;[PublicIPAddress]&lt;/code&gt; creates a condition where you're WANTING null values to be output. The fix for this was removing the &lt;code&gt;[]&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].PublicIpAddress" \
  --output=text | sort -V
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Boom. Output is 1 address per line, suitable for a text file I can feed to &lt;code&gt;nmap&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  NMAP ... or shodan?!
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nmap -p- -sT -T4 -vvvv -Pn --open -iL scanip -oA scanmeip
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So.. you wanna scan a range? Start with &lt;code&gt;nmap&lt;/code&gt;, or &lt;code&gt;zenmap&lt;/code&gt;. If you aren't sure what all this is, &lt;a href="https://explainshell.com/explain?cmd=nmap+-p-+-sT+-T4+-vvvv+-Pn+--open+-iL+scanip+-oA+scanmeip"&gt;ExplainShell&lt;/a&gt; might be able to help you, it's a great site, but doesn't stay up on all the arguments. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;-p-&lt;/code&gt; is the same as &lt;code&gt;-p 1-65535&lt;/code&gt; Meaning scan every port possible.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-sT&lt;/code&gt; is perform a standard TCP port check, no fancy syn stuff, we're just looking to find out what ports are open.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-T4&lt;/code&gt; this is Timing, 1 being nearly benign, and 5 might actually miss open ports because it'll tax your network connection.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-vvvvvvvvvvvvvvvvvvvvv&lt;/code&gt; veeerrrbbbooosssiitttyyyyyy&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-Pn&lt;/code&gt; don't ping, assume it's alive&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--open&lt;/code&gt; report ONLY on open ports&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-iL scanme&lt;/code&gt; get the target list from the file scanip&lt;code&gt;which I created from the&lt;/code&gt;aws ec2 describe-network-interfaces` above. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;-oA scanmeip&lt;/code&gt; will create 3 files, a standard report, a greppable output, and most importantly an XML file you can transform into an HTML report!&lt;/p&gt;

&lt;p&gt;Once you're done. Clone this repo: &lt;a href="https://github.com/honze-net/nmap-bootstrap-xsl"&gt;https://github.com/honze-net/nmap-bootstrap-xsl&lt;/a&gt; and ensure you have an xml processor installed (macOS comes with a workable &lt;code&gt;xsltproc&lt;/code&gt; binary).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
xsltproc -o scanme.html ./nmap-bootstrap-xsl/nmap-bootstrap.xsl scanme.xml&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Next thing you know you have a beautifully formatted report showing all visible reports of all your externally visible IP addresses!&lt;/p&gt;

&lt;p&gt;Then you get to cry a little at how much work you have to secure everything because the contract you walked into is way larger than you expected and no one ever set anything up correctly and you wonder why database ports are exposed to the public and someone took the time to install fail2ban but never enable or configure it and all the ssh listeners are configured for aws but everyone uses &lt;code&gt;ubuntu&lt;/code&gt; login and the list goes on and on.. Life of ops nerds, lol.&lt;/p&gt;

&lt;h4&gt;
  
  
  Shodan?
&lt;/h4&gt;

&lt;p&gt;Bonus points for anyone using shodan.io to scan your external IP addresses using the cli. If anyone is interested, I'll post how I did it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;wait.. there's points?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>awscli</category>
    </item>
    <item>
      <title>Need a blackhole solution for put.io?</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Thu, 30 Jul 2020 02:43:08 +0000</pubDate>
      <link>https://dev.to/bdmorin/need-a-blackhole-solution-for-put-io-cf5</link>
      <guid>https://dev.to/bdmorin/need-a-blackhole-solution-for-put-io-cf5</guid>
      <description>&lt;p&gt;Need a solution for Sonarr, and putio? &lt;/p&gt;

&lt;p&gt;This is what I came up with. Configure Sonarr to dump torrents to a blackhole directory, modify &lt;a href="https://gist.github.com/bdmorin/d3c53e0f495f4947bf61fb55e1c85310"&gt;this&lt;/a&gt; script. Dump it into your launchd/cron solution and you can upload magnets from your blackhole solution to put.io. &lt;/p&gt;

&lt;p&gt;How you download is on you. I might upload my jank solution later. &lt;/p&gt;

&lt;p&gt;Use of "blackhole" is &lt;a href="https://github.com/rails/rails/issues/33677"&gt;problematic&lt;/a&gt;. Once I have sometime to perform some generic replacement, I'll update this.&lt;/p&gt;

&lt;p&gt;Good Luck.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


</description>
      <category>putio</category>
      <category>sonarr</category>
      <category>blackhole</category>
    </item>
    <item>
      <title>Comcast is proxying all unencrypted content</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Thu, 29 Nov 2018 18:27:58 +0000</pubDate>
      <link>https://dev.to/bdmorin/comcast-is-proxying-all-unencrypted-content-34nh</link>
      <guid>https://dev.to/bdmorin/comcast-is-proxying-all-unencrypted-content-34nh</guid>
      <description>

&lt;p&gt;I originally posted this on the originally unsecure platform, &lt;a href="https://www.facebook.com/bdmorin/posts/2188976868082331"&gt;facebook&lt;/a&gt;. I should edit this for grammar, but I just wanted to bang it out, because you know... job/work.&lt;/p&gt;

&lt;p&gt;I cannot stand #comcast, no one that knows me finds this surprising, I'm forever ranting about them. The fact that Comcast is the only option for so many people is ridiculously sad. My job requires me to be on the internet constantly, I do a lot of security research and general research. &lt;/p&gt;

&lt;p&gt;Today, I found the most horrific thing a security nerd can find. Comcast is &lt;em&gt;FORCING&lt;/em&gt; all unencrypted traffic through Comcast proxy servers. I don't have a choice, I wasn't asked, or notified (I'm sure the TOS that's 938429 pages long mentioned it). This enables Comcast to inject anything they want into your unencrypted web browsing. &lt;/p&gt;

&lt;p&gt;If you want to see technical details about what these jackholes are doing, see here: &lt;a href="https://gist.github.com/bdmorin/7bd16b34cf75c0f6dd56155301793c4d"&gt;https://gist.github.com/bdmorin/7bd16b34cf75c0f6dd56155301793c4d&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I tested a popular website, tvmaze.com (a http only website) with and without a VPN on, and the difference in HTML delivered was comcast HTML injection, which included 3rd party asset calls, analytics tracking, etc. &lt;/p&gt;

&lt;p&gt;I want to protect my entire network (including all those people in my home) against this kind of absolutely unacceptable spying, however it gets fugly, because as cord cutters, we use streaming services, and Netflix and Hulu are NOT VPN friendly. These services actively block VPNs because viewers can appear to be in a different geological location (ODIN FORBID YOU NOT BEING AN AUTHORIZED AREA), so if I run my whole house through a VPN, then we won't be able to use streaming services. &lt;/p&gt;

&lt;p&gt;I've been considering deploying a local forced proxy for any port 80 traffic to be forced through a VPN connection at MY gateway and not comcast's. Nearly every streaming service uses HTTPs, so this wouldn't diddle with streaming services. &lt;/p&gt;

&lt;p&gt;The point of this rant is to SHAME comcast, not that they care in the least about consumers. You may constantly see ads for VPNs as you browse online, and these are the reasons why, you absolutely CANNOT trust your local service provider when it hijacks your content and modifies it before it gets to you. Ask China what it's like to have all your traffic monitored and modified before it gets to you. Comcast could potentially change anything before you have a chance to read the original version. If Comcast obtains a CA that browers accept, they would then be able to hijack your HTTPS connections, which is ABSOLUTELY concievable at this point. &lt;/p&gt;

&lt;p&gt;Websites that use web application firewall services like Cloudflare are also subjected to this kind of risk. Cloudflare inspects all traffic to and from source servers, so it's a single point that could modify, track, and potentially block content. If a BlackHat were to compromise Cloudflare, thousands of ecommerce businesses could be at risk of having traffic snooped. Same with Comcast, if (AND WHEN) they are compromised, they could modify YOUR traffic so that you're seeing what someone else wants you to see.&lt;/p&gt;

&lt;p&gt;Trust no one. Especially worthless corporations like Comcast.&lt;/p&gt;


</description>
      <category>comcast</category>
      <category>proxy</category>
      <category>security</category>
      <category>nothingissecure</category>
    </item>
    <item>
      <title>I want to re-write metasploit? Really?</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Thu, 30 Aug 2018 21:54:37 +0000</pubDate>
      <link>https://dev.to/bdmorin/i-want-to-re-write-metasploit-really-2cca</link>
      <guid>https://dev.to/bdmorin/i-want-to-re-write-metasploit-really-2cca</guid>
      <description>&lt;p&gt;I've had a flurry of app ideas and notions I want to put to code recently. I've been hitting the bits hard sharpening my Python skills. I'm developing some scrapy based web scrapers for a few apis I want to make public, I'm learning &lt;a href="https://www.dartlang.org/"&gt;Dart&lt;/a&gt;/&lt;a href="https://flutter.io/"&gt;Flutter&lt;/a&gt; because I want to write a mobile app to consume the afore mentioned APIs I'm developing. Javascript is a constant learning process, and NodeJS seems to be a requirement or preference in nearly every modern project; however, I can't look at Javascript/NodeJS the same after learning about Typescript -- ARGH!&lt;/p&gt;

&lt;p&gt;None of that's the point though of this post though. I've had a business need on several occasions to perform network (ip/service) discovery and dump that information so it can be mangled and jangled in many different ways. I've not found a single &lt;a href="https://en.wikipedia.org/wiki/Free_and_open-source_software"&gt;FOSS&lt;/a&gt; &lt;a href="https://en.wikipedia.org/wiki/IP_address_management"&gt;IPAM&lt;/a&gt; solution that let's me do discovery, delta tracking over time, and the ability to annotate discovered hosts and services.&lt;/p&gt;

&lt;p&gt;Digital Ocean released a software called &lt;a href="https://github.com/digitalocean/netbox"&gt;Netbox&lt;/a&gt; that is wonderful, only.. There's no discovery, and basic service management. Netbox does have a wonderful &lt;a href="https://netbox.readthedocs.io/en/latest/api/overview/"&gt;API&lt;/a&gt; that I could potentially use for discovery, only the inability to manage and handle network deltas and services really limit the use case of this otherwise brilliant software. &lt;/p&gt;

&lt;p&gt;You know what does network asset management well? &lt;a href="https://www.metasploit.com/"&gt;Metasploit&lt;/a&gt;. Metasploit is a penetration testing tool, you know what else it does really freaking well? Asset Management. I have several databases I keep, I scan work things, home things, other things... Metasploit does a great job of handling my scans and updating discovery. Metasploit, while a community edition does exist, is meant to be a tester's tool. What I need is a management tool.&lt;/p&gt;

&lt;p&gt;So.. I'm considering using other projects to cobble together a solution to handle discovery, deltas, and presentation. Projects like &lt;a href="https://nmap.org/"&gt;nmap&lt;/a&gt; and &lt;a href="https://github.com/robertdavidgraham/masscan"&gt;masscan&lt;/a&gt; handle discovery &lt;em&gt;very&lt;/em&gt; well, and export data in nearly any format I need. I could easily setup a 0MQ/MQTT system that handles regular scans, triggers in-depth scans, and generally manages discovery. I figure a graph database (neo4j?) to handle relationships, paired with a document database (I love Elasticsearch) for metadata, use Django or Meteor as a framework, and I might just have an interesting project on my hands.&lt;/p&gt;

&lt;p&gt;Now I'm in the dilema of "should I do this?" - I have other projects I can be working on, and I'd like to find a project that actually pays. This project could be interesting, and hopefully helpful to other admins.&lt;/p&gt;

&lt;p&gt;I guess we'll see.&lt;/p&gt;

&lt;p&gt;Update:&lt;br&gt;
Here are some more resources for IPAM:&lt;br&gt;
&lt;a href="https://alternativeto.net/software/netbox/?license=opensource"&gt;https://alternativeto.net/software/netbox/?license=opensource&lt;/a&gt;&lt;br&gt;
&lt;a href="https://en.wikipedia.org/wiki/IP_address_management"&gt;https://en.wikipedia.org/wiki/IP_address_management&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/kahun/awesome-sysadmin#service-discovery"&gt;https://github.com/kahun/awesome-sysadmin#service-discovery&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.g2crowd.com/categories/service-discovery"&gt;https://www.g2crowd.com/categories/service-discovery&lt;/a&gt;&lt;br&gt;
&lt;a href="https://arcentry.com/api-docs/"&gt;https://arcentry.com/api-docs/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Photo by &lt;a href="https://unsplash.com/photos/FXFz-sW0uwo"&gt;Markus Spiske&lt;/a&gt; &lt;/p&gt;

</description>
      <category>metasploit</category>
      <category>ipam</category>
      <category>network</category>
      <category>management</category>
    </item>
    <item>
      <title>Web Scraping Lunch and Learn</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Wed, 15 Aug 2018 18:35:37 +0000</pubDate>
      <link>https://dev.to/bdmorin/web-scraping-lunch-and-learn-184j</link>
      <guid>https://dev.to/bdmorin/web-scraping-lunch-and-learn-184j</guid>
      <description>&lt;p&gt;&lt;em&gt;Forward&lt;/em&gt;: Where I work, we have these things called Lunch and Learn where people in the company talk about something to everyone else. Sometimes it's a client overview, other times it's about scuba diving, sometimes it's just to introduce new people. I gave a talk about web scraping and how it could help your day to day business, personal, or other work. This is the presentation I gave, it might not make a ton of sense stand alone, but I wanted to share. Link to original &lt;a href="https://docs.google.com/document/d/e/2PACX-1vS9wX_LaXUCuBD_P_PhCxzMiWduKpKhqhFIHdqDzs_eWpyoVzxk3qsJe4NqqjhF1TNSawjge_bi5sv1/pub"&gt;presentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  
&lt;/h2&gt;

&lt;h1&gt;
  
  
  Web Scraping L&amp;amp;L
&lt;/h1&gt;

&lt;h2&gt;
  
  
  I’ll take structured data for 100 Alex.
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Overview
&lt;/h3&gt;

&lt;p&gt;The purpose of web scraping, or data mining, is to transform web data into structured data you can work with in different formats. There is a huge industry around data mining, web automation, and web scraping. I’ve put together an example method for how to do a simple scrape if you run into data you need to structure yourself.&lt;/p&gt;

&lt;h4&gt;
  
  
  Presentation Tools
&lt;/h4&gt;

&lt;p&gt;These are the tools I used during the presentation&lt;br&gt;
&lt;a href="https://data-miner.io/"&gt;https://data-miner.io/&lt;/a&gt; (chrome extension)&lt;br&gt;
&lt;a href="https://data-miner.io/quick-guides/menu"&gt;https://data-miner.io/quick-guides/menu&lt;/a&gt;&lt;br&gt;
&lt;a href="https://sheets.google.com"&gt;https://sheets.google.com&lt;/a&gt; importxml() and importhtml() functions&lt;/p&gt;
&lt;h4&gt;
  
  
  Sites we scraped from:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://vigilante.pw/"&gt;https://vigilante.pw/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://broadbandnow.com/Cable"&gt;https://broadbandnow.com/Cable&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/"&gt;https://www.npmjs.com/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.linkedin.com/mynetwork/invite-connect/connections/"&gt;https://www.linkedin.com/mynetwork/invite-connect/connections/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://admin.google.com"&gt;https://admin.google.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Challenges
&lt;/h3&gt;

&lt;p&gt;In order to scrape websites using dataminer, you would save yourself a lot of time by watching the tutorial videos. It shows you how to go about using the tool effectively in basic situations. As you need more advanced features, you may need to learn CSS selectors, jquery selectors, or xpath selectors. Additionally for more complex scraping tasks you may need a commercial account from data-miner.io, or move to an open source framework like scrapy/portia.&lt;/p&gt;
&lt;h4&gt;
  
  
  Javascript
&lt;/h4&gt;

&lt;p&gt;One of the biggest challenges in web scraping is dealing with Javascript. Sites that use Angular, Vue, React will not render well to a typical request based web scraper. Data Miner already handles this well for basic use cases, as it’s using your browsers post-rendered HTML to scrape. A scraping library needs to deal with the javascript first either via a headless browser, or other option. There are commercial options for proxy loading HTML that will pre-render sites before your parser analyzes the HTML, and there are projects like Puppeteer that enable you to have a headless chrome browser running natively (not the same as phantomjs/capserjs). &lt;/p&gt;

&lt;p&gt;The scrapy ecosystem has a great project called Splash that is a dockerized headless web browser that’s api driven. Your spider simply makes requests to the api and it handles rendering. Splash has been very useful in many cases where an automated scraper needs to deal with a login page where javascript is required.&lt;/p&gt;
&lt;h3&gt;
  
  
  Scrapy/Portia
&lt;/h3&gt;

&lt;p&gt;Scrapy and Portia are an opensource endeavor with commercial services if you need. Scrapy is a python framework (based in Django) for deploy web scrapers, spiders and crawlers. Scrapy is easy to use and start out with, and scales to very advanced if the need arises. Portia is a opensource application that creates a visual method for developing scraping recipes. Portia can be self-hosted or hosted as a service. I run a local Portia instance via docker, and while it’s neat, it’s problematic and crashes frequently. This would be frustrating for new users.&lt;br&gt;
&lt;a href="https://github.com/scrapinghub/portia"&gt;https://github.com/scrapinghub/portia&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/scrapy/scrapy"&gt;https://github.com/scrapy/scrapy&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/scrapinghub/learn.scrapinghub.com"&gt;https://github.com/scrapinghub/learn.scrapinghub.com&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/scrapy-plugins/scrapy-splash"&gt;https://github.com/scrapy-plugins/scrapy-splash&lt;/a&gt;&lt;br&gt;
&lt;a href="https://django-dynamic-scraper.readthedocs.io/en/latest/"&gt;https://django-dynamic-scraper.readthedocs.io/en/latest/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Frameworkless Python
&lt;/h3&gt;

&lt;p&gt;If you would like to write a scraping bot from scratch, and no framework overhead, BeautifulSoup4, and Requests is a great way to go. You can develop multistage scrapers in about 20 lines of code, but you need to understand the libraries and methods ahead of time. BS4 has excellent documentation, as does Requests and nearly any beginner pythonista could get started with them. There is also a very handy python library that pulls core content automatically from pages (like newspaper article content) called Newpaper3k, if you are looking to pull a large corpus of content for tasks like AI or ML, this is a great module to help you focus on NOT scraping, but what to do with the content you are scraping.&lt;br&gt;
&lt;a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/"&gt;https://www.crummy.com/software/BeautifulSoup/bs4/doc/&lt;/a&gt;&lt;br&gt;
&lt;a href="http://docs.python-requests.org/en/master/"&gt;http://docs.python-requests.org/en/master/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://newspaper.readthedocs.io/en/latest/"&gt;https://newspaper.readthedocs.io/en/latest/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Node Scraping
&lt;/h3&gt;

&lt;p&gt;I haven’t done much research scraping with Node, but I’ve read a lot of articles about it. The biggest barrier to entry for me was that any requests library that didn’t use promises was too easily hung up. I tried some but I really enjoy developing in Python/Jupyter. Here are some resources for starting webscraping in Node.&lt;br&gt;
Framework: &lt;a href="https://expressjs.com/"&gt;https://expressjs.com/&lt;/a&gt;&lt;br&gt;
Request library: &lt;a href="https://github.com/mikeal/request"&gt;https://github.com/mikeal/request&lt;/a&gt; or &lt;a href="https://github.com/axios/axios"&gt;https://github.com/axios/axios&lt;/a&gt;&lt;br&gt;
HTML Parser: &lt;a href="https://github.com/MatthewMueller/cheerio"&gt;https://github.com/MatthewMueller/cheerio&lt;/a&gt; or &lt;a href="https://github.com/jsdom/jsdom"&gt;https://github.com/jsdom/jsdom&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Command Line
&lt;/h3&gt;

&lt;p&gt;Sometimes, you just want to grab data directly from the command line. There are 2 tools that will make this remarkably simple: &lt;a href="https://github.com/ericchiang/pup"&gt;pup&lt;/a&gt; and &lt;a href="https://stedolan.github.io/jq/"&gt;jq&lt;/a&gt;.&lt;br&gt;
Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -s "https://vigilante.pw/" \
| pup 'table tr json{}' \
| jq ' .[] | {"entries": .children[0].text, "database": .children[1].text, "hashing": .children[2].text, "category": .children[3].text, "date": .children[4].text, "acknowledged": .children[5].text }' | head -40


{
  "entries": "34,368",
  "database": "000webhost.com Forum",
  "hashing": "vB",
  "category": "Hosting",
  "date": "2015-10",
  "acknowledged": null
}
{
  "entries": "632,595",
  "database": "000webhost.com Mailbox",
  "hashing": "plaintext",
  "category": "Hosting",
  "date": "2015-10",
  "acknowledged": null
}
{
  "entries": "15,311,565",
  "database": "000webhost.com Main",
  "hashing": "plaintext",
  "category": "Hosting",
  "date": "2015-10",
  "acknowledged": null
}
{
  "entries": "5,344",
  "database": "007.no",
  "hashing": "SHA-1 *MISSING SALTS*",
  "category": "Gaming",
  "date": null,
  "acknowledged": null
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example uses the vigilante.pw website we looked at earlier. On command line you use &lt;code&gt;curl&lt;/code&gt; as the requestor, &lt;code&gt;pup&lt;/code&gt; extracts just the table’s rows and transforms them into json, then &lt;code&gt;jq&lt;/code&gt; processes the json into a workable dataset you could use in any other web application. jq could further remove commas from numbers, and normalize other text if needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bonus Round
&lt;/h3&gt;

&lt;p&gt;Put this in a google sheet cell.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=IMPORTHTML("https://vigilante.pw/","table",1)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can import nearly any xpath you like into google sheets, enabling you to create custom dashboards of web content.&lt;/p&gt;

&lt;p&gt;Photo by &lt;a href="https://unsplash.com/photos/nZcMWOKAJrY?utm_source=unsplash&amp;amp;utm_medium=referral&amp;amp;utm_content=creditCopyText"&gt;Maik Jonietz&lt;/a&gt; &lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>scrapy</category>
      <category>jq</category>
      <category>pup</category>
    </item>
    <item>
      <title>my first octolapse</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Wed, 15 Aug 2018 01:18:55 +0000</pubDate>
      <link>https://dev.to/bdmorin/my-first-octolapse-2a0p</link>
      <guid>https://dev.to/bdmorin/my-first-octolapse-2a0p</guid>
      <description>&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/FBltPLZQpF4"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Check out octolapse on your octopi!!&lt;br&gt;
&lt;/p&gt;
&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--i3JOwpme--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev.to/assets/github-logo-ba8488d21cd8ee1fee097b8410db9deaa41d0ca30b004c0c63de0a479114156f.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/FormerLurker"&gt;
        FormerLurker
      &lt;/a&gt; / &lt;a href="https://github.com/FormerLurker/Octolapse"&gt;
        Octolapse
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Stabilized timelapses for Octoprint
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;h1&gt;
Octolapse&lt;/h1&gt;
&lt;div&gt;
    &lt;a href="https://github.com/FormerLurker/Octolapse/wiki/V0.4---Octolapse-Tab" title="Get more information about this feature from the Octolapse Wiki"&gt;
        &lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--U5ZlJGmw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://raw.githubusercontent.com/wiki/FormerLurker/Octolapse/version/0.4/assets/images/tab/octolapse_tab_mini.png" alt="The Octolapse Tab"&gt;
    &lt;/a&gt;&lt;br&gt;
    &lt;a href="https://github.com/FormerLurker/Octolapse/wiki/V0.4---Octolapse-Tab"&gt;
        The New and Improved Octolapse Tab
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Octolapse is provided without warranties of any kind.  By installing Octolapse, you agree to accept all liability for any damage caused directly or indirectly by Octolapse.  Use caution and never leave your printer unattended.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;h2&gt;
What Octolapse Does&lt;/h2&gt;
&lt;p&gt;Octolapse is designed to make stabilized timelapses of your prints with as little hassle as possible, and it's extremely configurable.  Now you can create a silky smooth timelapse without a custom camera mount, and no GCode customizations are required.&lt;/p&gt;
&lt;div&gt;
    &lt;a href="https://www.youtube.com/watch?v=er0VCYen1MY" rel="nofollow"&gt;
        &lt;img src="https://camo.githubusercontent.com/bdba910272c084663872b6ef33dceec15543cad2c81aa6ddac9d0dc0f883f23c/68747470733a2f2f696d672e796f75747562652e636f6d2f76692f657230564359656e314d592f687164656661756c742e6a7067" alt="Double Spiral Vase Timelapse Taken with Octolapse" title="Watch on Youtube"&gt;
    &lt;/a&gt;&lt;br&gt;
    &lt;a href="https://www.thingiverse.com/thing:570288" alt="Link to the model from this video" title="view model on thingiverse" rel="nofollow"&gt;
            A Timelapse of a Double Spiral Vase Made with Octolapse
    &lt;/a&gt;
&lt;/div&gt;
&lt;p&gt;Octolapse moves the print bed and extruder into position before taking each snapshot, giving you a crisp image in every frame.  Snapshots can be taken at each layer change, at specific height increments, after a period of time has elapsed, or when certain GCodes are detected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Important&lt;/strong&gt;:  &lt;em&gt;Octolapse requires OctoPrint v1.3.9 or higher, and some features&lt;/em&gt;…&lt;/p&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/FormerLurker/Octolapse"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


&lt;p&gt;Here's an excellent &lt;a href="https://hackaday.com/2018/07/02/coolest-way-to-watch-3d-printing-lights-camera-octolapse/"&gt;tutorial&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;sup&gt;&lt;em&gt;cover photo credit: &lt;a href="https://unsplash.com/@powwpic?utm_medium=referral&amp;amp;utm_campaign=photographer-credit&amp;amp;utm_content=creditBadge"&gt;Ines Álvarez Fdez&lt;/a&gt;&lt;/em&gt;&lt;/sup&gt;&lt;/p&gt;

</description>
      <category>3dprinting</category>
      <category>fdm</category>
      <category>octolapse</category>
      <category>octopi</category>
    </item>
    <item>
      <title>Windows, I never missed you.</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Tue, 14 Aug 2018 23:35:39 +0000</pubDate>
      <link>https://dev.to/bdmorin/windows-i-never-missed-you-1f3b</link>
      <guid>https://dev.to/bdmorin/windows-i-never-missed-you-1f3b</guid>
      <description>&lt;p&gt;Thank you to &lt;a href="https://www.joshbeam.com/2017/11/23/making-a-bootable-windows-10-usb-drive-on-macos-high-sierra/"&gt;Josh Beam&lt;/a&gt; for a very easy to do method of making a Windows10 bootable usb drive.&lt;/p&gt;

&lt;p&gt;It'd been almost a decade since I've used Windows in any real way (aside from a steam gaming PC, which is still windows 7). I was asked recently to fix a HP Stream pc, that'd been completely demolished with malware. I had to wipe the whole thing, and start over. I downloaded a new windows10 image, and found I couldn't write it to a usb drive. I didn't have any DVDr anymore, because... WHY?! .. Apple says "use bootcamp assistant to create an image" only.. On High Sierra Bootcamp assistant will only write to the local hard drive. I couldn't get my Kali vm to image the ISO file either.&lt;/p&gt;

&lt;p&gt;Turns out it's ridiculously simple, format the usb drive as something horrifically windows, then literally copy the files with /bin/cp to the drive. &lt;em&gt;BOOM&lt;/em&gt; usb bootable drive. &lt;/p&gt;

&lt;p&gt;Thank you Josh.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;cover photo by &lt;a href="https://unsplash.com/photos/Bb_X4JgSqIM"&gt;Ines Álvarez Fdez&lt;/a&gt; on Unsplash&lt;/em&gt;&lt;/p&gt;

</description>
      <category>windows</category>
      <category>windows10</category>
    </item>
    <item>
      <title>Level up your terminal game</title>
      <dc:creator>Brian</dc:creator>
      <pubDate>Wed, 01 Aug 2018 16:56:12 +0000</pubDate>
      <link>https://dev.to/bdmorin/level-up-your-terminal-game-18kh</link>
      <guid>https://dev.to/bdmorin/level-up-your-terminal-game-18kh</guid>
      <description>&lt;p&gt;A co-worker posted a link about &lt;a href="https://github.com/jwilm/alacritty"&gt;alacritty&lt;/a&gt; in our company slack channel. Alacritty is a gpu-based terminal rendering console application written in Rust. I'm on a mac on my desktop, so I had to fumble around with Rust to get it going, thanks to &lt;a href="https://brew.sh/"&gt;homebrew&lt;/a&gt; it was pretty easy. &lt;/p&gt;

&lt;p&gt;I was pretty floored to find out just how amazing it's performance is. I've used &lt;a href="https://www.iterm2.com/"&gt;iTerm&lt;/a&gt; since I switched to Mac in 2009ish and have a lot of my workflow integrated with iTerm. What I really wasn't prepared for was just how much more I should be expecting out of a terminal application.&lt;/p&gt;

&lt;p&gt;I ran some basic tests using &lt;a href="https://github.com/BurntSushi/ripgrep"&gt;rg&lt;/a&gt; that I knew would pour output to the terminal. My test was I was in my projects directory and simply searched for 'perl'. It returns an mixed bag of ascii and binary text, 176002 lines to be exact.&lt;/p&gt;

&lt;p&gt;So I timed each output in my terminal. This is &lt;em&gt;NOT&lt;/em&gt; the best method for a test, I understand that. However, the results were striking regardless:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/jwilm/alacritty"&gt;alacritty&lt;/a&gt;: rg perl 7.28s user 28.04s system 21% cpu 2:43.80 total&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.iterm2.com/"&gt;iTerm&lt;/a&gt;: rg perl 7.15s user 26.87s system 0% cpu 58:14.69 total&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/kovidgoyal/kitty"&gt;kitty&lt;/a&gt;: rg perl 6.26s user 19.47s system 11% cpu 3:35.96 total&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hyper.is/"&gt;hyper&lt;/a&gt;: rg perl 6.65s user 16.02s system 25% cpu 1:27.68 total&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;alacritty was astounding at how fast it scrolled data, and scrollback was as fast as I could scroll my mouse wheel.&lt;/p&gt;

&lt;p&gt;iterm2 was exactly how I expected it to be, fine.&lt;/p&gt;

&lt;p&gt;kitty felt just like alactritty, but feels like a much more mature project.&lt;/p&gt;

&lt;p&gt;hyper &lt;em&gt;was&lt;/em&gt; the terminal I thought I was going to move to, because I enjoyed the configurability of it, and wingdings it adds, however, it performed only marginally better than iterm2. My test result unfortunately didn't support my feeling, hyper's 'time' result came back worse than iTerm2, but it definitely finished faster. &lt;/p&gt;

&lt;p&gt;So, the differences were intense, and mindblowing. It took Alacritty about 3 minutes to send everything. Iterm2 took nearly an hour. Kitty was about a minute longer than Alacritty. However, it was clear, alacritty/kitty/gpu-based terminal emulators provided unbelievable performance.&lt;/p&gt;

&lt;p&gt;For me, Kitty is a great choice. It feels a little more mature, and was available via homebrew (brew install kitty), and works great on my Antergos linux destktop. Scrolling in vim, and tmux scrollback is silky smooth, and the fonts (once you work it out) is amazing. &lt;del&gt;I use powerline9k and prezto&lt;/del&gt;, and kitty handles my prompt no problem.&lt;/p&gt;

&lt;p&gt;Kitty/Alacritty requires manual configuration files, and getting your fonts right can be a challenge. If you're on a mac, you can save yourself a shitload of time with this command &lt;code&gt;fc-list : family | rg -i powerline&lt;/code&gt; That will give you the font to put in the config. &lt;/p&gt;

&lt;p&gt;If you have a better way to test terminal performance I'd love to try it.&lt;/p&gt;

</description>
      <category>terminal</category>
      <category>kitty</category>
      <category>iterm2</category>
      <category>alacritty</category>
    </item>
  </channel>
</rss>
