DEV Community: Brian

AWS: Wherein I do things the hardest way possible.

Brian — Fri, 09 Apr 2021 20:08:26 +0000

I've used AWS for various projects, but not gotten extremely in depth. I've recently taken a contract that has stretched my AWS skills to the limit, and I've had to learn lots. I'm going to catalog some things senarios that others might find useful.

Give me all external Interface IPs so I can scan them

So, I need to lock down an environment. I need all the IP addresses. There's likely tons of ways to do this, here's how I torturted myself.

Option 1

aws ec2 describe-network-interfaces \
  --query "NetworkInterfaces[].Association.PublicIp" \
  --output yaml | sort -V | awk '{print $2}'

This lists ALL the network interfaces in your AWS account. This might be perferable depending on your needs.

So, you can list your interfaces and output them in json, yaml, text, and table. Now I would've thought that table would give me an IP address per line, it doesn't it just gloms then all on the same line, probably with a tab delimiter or something, I was so disgusted with this output I switched to YAML, since it required little processing.

Alternatively I could've used jq to process the default json output. jq isn't always installed everywhere, so I opted for regularly installed tools. sort -V properly sorts IP addresses, and awk removes the - from the yaml output.

MacOS users, consider using all GNU cli utilies instead of apple utilies. awk isn't different, but sort is.

❯ sort --version
2.3-Apple (106)
❯ gsort --version
sort (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

I don't know what apple sort is, however gsort is posix compliant and the arguments you learn with it will translate to linux, or windows cygwin/WSL.

brew install coreutils

THEN.. I discovered why it was putting it all on one line, if you don't specify the final value as anything other than a string, it will just output the value with nothing else, always surround the final value in brackets, and you'll get the one line per functionality you'd assume is default.

aws ec2 describe-network-interfaces \
  --query "NetworkInterfaces[].Association.[PublicIp]" \
  --output=text

Option 2

aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].[PublicIpAddress]" \
  --output=text | sort -V

This will list only IP addresses from your instances. This won't list all public IP addresses from other services you might have with AWS.

Oddly, again, the output threw me. I was getting an annoying None in the output. Ugh.. Fine. I'll grep -v None it.

aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].[PublicIpAddress]" \
  --output=text | sort -V | grep -v None

but... why?

Turns out in this output, the PublicIPAddress is handled differently in the raw data that comes back from AWS. Specifically calling it out in [PublicIPAddress] creates a condition where you're WANTING null values to be output. The fix for this was removing the []

aws ec2 describe-instances \
  --query "Reservations[*].Instances[*].PublicIpAddress" \
  --output=text | sort -V

Boom. Output is 1 address per line, suitable for a text file I can feed to nmap.

NMAP ... or shodan?!

nmap -p- -sT -T4 -vvvv -Pn --open -iL scanip -oA scanmeip

So.. you wanna scan a range? Start with nmap, or zenmap. If you aren't sure what all this is, ExplainShell might be able to help you, it's a great site, but doesn't stay up on all the arguments.

-p- is the same as -p 1-65535 Meaning scan every port possible.

-sT is perform a standard TCP port check, no fancy syn stuff, we're just looking to find out what ports are open.

-T4 this is Timing, 1 being nearly benign, and 5 might actually miss open ports because it'll tax your network connection.

-vvvvvvvvvvvvvvvvvvvvv veeerrrbbbooosssiitttyyyyyy

-Pn don't ping, assume it's alive

--open report ONLY on open ports

-iL scanme get the target list from the file scanipwhich I created from theaws ec2 describe-network-interfaces` above.

-oA scanmeip will create 3 files, a standard report, a greppable output, and most importantly an XML file you can transform into an HTML report!

Once you're done. Clone this repo: https://github.com/honze-net/nmap-bootstrap-xsl and ensure you have an xml processor installed (macOS comes with a workable xsltproc binary).

xsltproc -o scanme.html ./nmap-bootstrap-xsl/nmap-bootstrap.xsl scanme.xml

Next thing you know you have a beautifully formatted report showing all visible reports of all your externally visible IP addresses!

Then you get to cry a little at how much work you have to secure everything because the contract you walked into is way larger than you expected and no one ever set anything up correctly and you wonder why database ports are exposed to the public and someone took the time to install fail2ban but never enable or configure it and all the ssh listeners are configured for aws but everyone uses ubuntu login and the list goes on and on.. Life of ops nerds, lol.

Shodan?

Bonus points for anyone using shodan.io to scan your external IP addresses using the cli. If anyone is interested, I'll post how I did it.

wait.. there's points?

Need a blackhole solution for put.io?

Brian — Thu, 30 Jul 2020 02:43:08 +0000

Need a solution for Sonarr, and putio?

This is what I came up with. Configure Sonarr to dump torrents to a blackhole directory, modify this script. Dump it into your launchd/cron solution and you can upload magnets from your blackhole solution to put.io.

How you download is on you. I might upload my jank solution later.

Use of "blackhole" is problematic. Once I have sometime to perform some generic replacement, I'll update this.

Good Luck.

Comcast is proxying all unencrypted content

Brian — Thu, 29 Nov 2018 18:27:58 +0000

I originally posted this on the originally unsecure platform, facebook. I should edit this for grammar, but I just wanted to bang it out, because you know... job/work.

I cannot stand #comcast, no one that knows me finds this surprising, I'm forever ranting about them. The fact that Comcast is the only option for so many people is ridiculously sad. My job requires me to be on the internet constantly, I do a lot of security research and general research.

Today, I found the most horrific thing a security nerd can find. Comcast is FORCING all unencrypted traffic through Comcast proxy servers. I don't have a choice, I wasn't asked, or notified (I'm sure the TOS that's 938429 pages long mentioned it). This enables Comcast to inject anything they want into your unencrypted web browsing.

If you want to see technical details about what these jackholes are doing, see here: https://gist.github.com/bdmorin/7bd16b34cf75c0f6dd56155301793c4d

I tested a popular website, tvmaze.com (a http only website) with and without a VPN on, and the difference in HTML delivered was comcast HTML injection, which included 3rd party asset calls, analytics tracking, etc.

I want to protect my entire network (including all those people in my home) against this kind of absolutely unacceptable spying, however it gets fugly, because as cord cutters, we use streaming services, and Netflix and Hulu are NOT VPN friendly. These services actively block VPNs because viewers can appear to be in a different geological location (ODIN FORBID YOU NOT BEING AN AUTHORIZED AREA), so if I run my whole house through a VPN, then we won't be able to use streaming services.

I've been considering deploying a local forced proxy for any port 80 traffic to be forced through a VPN connection at MY gateway and not comcast's. Nearly every streaming service uses HTTPs, so this wouldn't diddle with streaming services.

The point of this rant is to SHAME comcast, not that they care in the least about consumers. You may constantly see ads for VPNs as you browse online, and these are the reasons why, you absolutely CANNOT trust your local service provider when it hijacks your content and modifies it before it gets to you. Ask China what it's like to have all your traffic monitored and modified before it gets to you. Comcast could potentially change anything before you have a chance to read the original version. If Comcast obtains a CA that browers accept, they would then be able to hijack your HTTPS connections, which is ABSOLUTELY concievable at this point.

Websites that use web application firewall services like Cloudflare are also subjected to this kind of risk. Cloudflare inspects all traffic to and from source servers, so it's a single point that could modify, track, and potentially block content. If a BlackHat were to compromise Cloudflare, thousands of ecommerce businesses could be at risk of having traffic snooped. Same with Comcast, if (AND WHEN) they are compromised, they could modify YOUR traffic so that you're seeing what someone else wants you to see.

Trust no one. Especially worthless corporations like Comcast.

I want to re-write metasploit? Really?

Brian — Thu, 30 Aug 2018 21:54:37 +0000

I've had a flurry of app ideas and notions I want to put to code recently. I've been hitting the bits hard sharpening my Python skills. I'm developing some scrapy based web scrapers for a few apis I want to make public, I'm learning Dart/Flutter because I want to write a mobile app to consume the afore mentioned APIs I'm developing. Javascript is a constant learning process, and NodeJS seems to be a requirement or preference in nearly every modern project; however, I can't look at Javascript/NodeJS the same after learning about Typescript -- ARGH!

None of that's the point though of this post though. I've had a business need on several occasions to perform network (ip/service) discovery and dump that information so it can be mangled and jangled in many different ways. I've not found a single FOSS IPAM solution that let's me do discovery, delta tracking over time, and the ability to annotate discovered hosts and services.

Digital Ocean released a software called Netbox that is wonderful, only.. There's no discovery, and basic service management. Netbox does have a wonderful API that I could potentially use for discovery, only the inability to manage and handle network deltas and services really limit the use case of this otherwise brilliant software.

You know what does network asset management well? Metasploit. Metasploit is a penetration testing tool, you know what else it does really freaking well? Asset Management. I have several databases I keep, I scan work things, home things, other things... Metasploit does a great job of handling my scans and updating discovery. Metasploit, while a community edition does exist, is meant to be a tester's tool. What I need is a management tool.

So.. I'm considering using other projects to cobble together a solution to handle discovery, deltas, and presentation. Projects like nmap and masscan handle discovery very well, and export data in nearly any format I need. I could easily setup a 0MQ/MQTT system that handles regular scans, triggers in-depth scans, and generally manages discovery. I figure a graph database (neo4j?) to handle relationships, paired with a document database (I love Elasticsearch) for metadata, use Django or Meteor as a framework, and I might just have an interesting project on my hands.

Now I'm in the dilema of "should I do this?" - I have other projects I can be working on, and I'd like to find a project that actually pays. This project could be interesting, and hopefully helpful to other admins.

I guess we'll see.

Update:
Here are some more resources for IPAM:
https://alternativeto.net/software/netbox/?license=opensource
https://en.wikipedia.org/wiki/IP_address_management
https://github.com/kahun/awesome-sysadmin#service-discovery
https://www.g2crowd.com/categories/service-discovery
https://arcentry.com/api-docs/

Photo by Markus Spiske

Web Scraping Lunch and Learn

Brian — Wed, 15 Aug 2018 18:35:37 +0000

Forward: Where I work, we have these things called Lunch and Learn where people in the company talk about something to everyone else. Sometimes it's a client overview, other times it's about scuba diving, sometimes it's just to introduce new people. I gave a talk about web scraping and how it could help your day to day business, personal, or other work. This is the presentation I gave, it might not make a ton of sense stand alone, but I wanted to share. Link to original presentation.

Web Scraping L&L

I’ll take structured data for 100 Alex.

Overview

The purpose of web scraping, or data mining, is to transform web data into structured data you can work with in different formats. There is a huge industry around data mining, web automation, and web scraping. I’ve put together an example method for how to do a simple scrape if you run into data you need to structure yourself.

Presentation Tools

These are the tools I used during the presentation
https://data-miner.io/ (chrome extension)
https://data-miner.io/quick-guides/menu
https://sheets.google.com importxml() and importhtml() functions

Sites we scraped from:

Challenges

In order to scrape websites using dataminer, you would save yourself a lot of time by watching the tutorial videos. It shows you how to go about using the tool effectively in basic situations. As you need more advanced features, you may need to learn CSS selectors, jquery selectors, or xpath selectors. Additionally for more complex scraping tasks you may need a commercial account from data-miner.io, or move to an open source framework like scrapy/portia.

Javascript

One of the biggest challenges in web scraping is dealing with Javascript. Sites that use Angular, Vue, React will not render well to a typical request based web scraper. Data Miner already handles this well for basic use cases, as it’s using your browsers post-rendered HTML to scrape. A scraping library needs to deal with the javascript first either via a headless browser, or other option. There are commercial options for proxy loading HTML that will pre-render sites before your parser analyzes the HTML, and there are projects like Puppeteer that enable you to have a headless chrome browser running natively (not the same as phantomjs/capserjs).

The scrapy ecosystem has a great project called Splash that is a dockerized headless web browser that’s api driven. Your spider simply makes requests to the api and it handles rendering. Splash has been very useful in many cases where an automated scraper needs to deal with a login page where javascript is required.

Scrapy/Portia

Scrapy and Portia are an opensource endeavor with commercial services if you need. Scrapy is a python framework (based in Django) for deploy web scrapers, spiders and crawlers. Scrapy is easy to use and start out with, and scales to very advanced if the need arises. Portia is a opensource application that creates a visual method for developing scraping recipes. Portia can be self-hosted or hosted as a service. I run a local Portia instance via docker, and while it’s neat, it’s problematic and crashes frequently. This would be frustrating for new users.
https://github.com/scrapinghub/portia
https://github.com/scrapy/scrapy
https://github.com/scrapinghub/learn.scrapinghub.com
https://github.com/scrapy-plugins/scrapy-splash
https://django-dynamic-scraper.readthedocs.io/en/latest/

Frameworkless Python

If you would like to write a scraping bot from scratch, and no framework overhead, BeautifulSoup4, and Requests is a great way to go. You can develop multistage scrapers in about 20 lines of code, but you need to understand the libraries and methods ahead of time. BS4 has excellent documentation, as does Requests and nearly any beginner pythonista could get started with them. There is also a very handy python library that pulls core content automatically from pages (like newspaper article content) called Newpaper3k, if you are looking to pull a large corpus of content for tasks like AI or ML, this is a great module to help you focus on NOT scraping, but what to do with the content you are scraping.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
http://docs.python-requests.org/en/master/
https://newspaper.readthedocs.io/en/latest/

Node Scraping

I haven’t done much research scraping with Node, but I’ve read a lot of articles about it. The biggest barrier to entry for me was that any requests library that didn’t use promises was too easily hung up. I tried some but I really enjoy developing in Python/Jupyter. Here are some resources for starting webscraping in Node.
Framework: https://expressjs.com/
Request library: https://github.com/mikeal/request or https://github.com/axios/axios
HTML Parser: https://github.com/MatthewMueller/cheerio or https://github.com/jsdom/jsdom

Command Line

Sometimes, you just want to grab data directly from the command line. There are 2 tools that will make this remarkably simple: pup and jq.
Example:

curl -s "https://vigilante.pw/" \
| pup 'table tr json{}' \
| jq ' .[] | {"entries": .children[0].text, "database": .children[1].text, "hashing": .children[2].text, "category": .children[3].text, "date": .children[4].text, "acknowledged": .children[5].text }' | head -40


{
  "entries": "34,368",
  "database": "000webhost.com Forum",
  "hashing": "vB",
  "category": "Hosting",
  "date": "2015-10",
  "acknowledged": null
}
{
  "entries": "632,595",
  "database": "000webhost.com Mailbox",
  "hashing": "plaintext",
  "category": "Hosting",
  "date": "2015-10",
  "acknowledged": null
}
{
  "entries": "15,311,565",
  "database": "000webhost.com Main",
  "hashing": "plaintext",
  "category": "Hosting",
  "date": "2015-10",
  "acknowledged": null
}
{
  "entries": "5,344",
  "database": "007.no",
  "hashing": "SHA-1 *MISSING SALTS*",
  "category": "Gaming",
  "date": null,
  "acknowledged": null
}

This example uses the vigilante.pw website we looked at earlier. On command line you use curl as the requestor, pup extracts just the table’s rows and transforms them into json, then jq processes the json into a workable dataset you could use in any other web application. jq could further remove commas from numbers, and normalize other text if needed.

Bonus Round

Put this in a google sheet cell.

=IMPORTHTML("https://vigilante.pw/","table",1)

You can import nearly any xpath you like into google sheets, enabling you to create custom dashboards of web content.

Photo by Maik Jonietz

my first octolapse

Brian — Wed, 15 Aug 2018 01:18:55 +0000

Check out octolapse on your octopi!!

FormerLurker / Octolapse

Stabilized timelapses for Octoprint

Octolapse

The New and Improved Octolapse Tab

Octolapse is provided without warranties of any kind. By installing Octolapse, you agree to accept all liability for any damage caused directly or indirectly by Octolapse. Use caution and never leave your printer unattended.

What Octolapse Does

Octolapse is designed to make stabilized timelapses of your prints with as little hassle as possible, and it's extremely configurable. Now you can create a silky smooth timelapse without a custom camera mount, and no GCode customizations are required.

A Timelapse of a Double Spiral Vase Made with Octolapse

Octolapse moves the print bed and extruder into position before taking each snapshot, giving you a crisp image in every frame. Snapshots can be taken at each layer change, at specific height increments, after a period of time has elapsed, or when certain GCodes are detected.

Important: Octolapse requires OctoPrint v1.3.9 or higher, and some features…

View on GitHub

Here's an excellent tutorial.

^{cover photo credit: Ines Álvarez Fdez}

Windows, I never missed you.

Brian — Tue, 14 Aug 2018 23:35:39 +0000

Thank you to Josh Beam for a very easy to do method of making a Windows10 bootable usb drive.

It'd been almost a decade since I've used Windows in any real way (aside from a steam gaming PC, which is still windows 7). I was asked recently to fix a HP Stream pc, that'd been completely demolished with malware. I had to wipe the whole thing, and start over. I downloaded a new windows10 image, and found I couldn't write it to a usb drive. I didn't have any DVDr anymore, because... WHY?! .. Apple says "use bootcamp assistant to create an image" only.. On High Sierra Bootcamp assistant will only write to the local hard drive. I couldn't get my Kali vm to image the ISO file either.

Turns out it's ridiculously simple, format the usb drive as something horrifically windows, then literally copy the files with /bin/cp to the drive. BOOM usb bootable drive.

Thank you Josh.

cover photo by Ines Álvarez Fdez on Unsplash

Level up your terminal game

Brian — Wed, 01 Aug 2018 16:56:12 +0000

A co-worker posted a link about alacritty in our company slack channel. Alacritty is a gpu-based terminal rendering console application written in Rust. I'm on a mac on my desktop, so I had to fumble around with Rust to get it going, thanks to homebrew it was pretty easy.

I was pretty floored to find out just how amazing it's performance is. I've used iTerm since I switched to Mac in 2009ish and have a lot of my workflow integrated with iTerm. What I really wasn't prepared for was just how much more I should be expecting out of a terminal application.

I ran some basic tests using rg that I knew would pour output to the terminal. My test was I was in my projects directory and simply searched for 'perl'. It returns an mixed bag of ascii and binary text, 176002 lines to be exact.

So I timed each output in my terminal. This is NOT the best method for a test, I understand that. However, the results were striking regardless:

alacritty: rg perl 7.28s user 28.04s system 21% cpu 2:43.80 total
iTerm: rg perl 7.15s user 26.87s system 0% cpu 58:14.69 total
kitty: rg perl 6.26s user 19.47s system 11% cpu 3:35.96 total
hyper: rg perl 6.65s user 16.02s system 25% cpu 1:27.68 total

alacritty was astounding at how fast it scrolled data, and scrollback was as fast as I could scroll my mouse wheel.

iterm2 was exactly how I expected it to be, fine.

kitty felt just like alactritty, but feels like a much more mature project.

hyper was the terminal I thought I was going to move to, because I enjoyed the configurability of it, and wingdings it adds, however, it performed only marginally better than iterm2. My test result unfortunately didn't support my feeling, hyper's 'time' result came back worse than iTerm2, but it definitely finished faster.

So, the differences were intense, and mindblowing. It took Alacritty about 3 minutes to send everything. Iterm2 took nearly an hour. Kitty was about a minute longer than Alacritty. However, it was clear, alacritty/kitty/gpu-based terminal emulators provided unbelievable performance.

For me, Kitty is a great choice. It feels a little more mature, and was available via homebrew (brew install kitty), and works great on my Antergos linux destktop. Scrolling in vim, and tmux scrollback is silky smooth, and the fonts (once you work it out) is amazing. ~~I use powerline9k and prezto~~, and kitty handles my prompt no problem.

Kitty/Alacritty requires manual configuration files, and getting your fonts right can be a challenge. If you're on a mac, you can save yourself a shitload of time with this command fc-list : family | rg -i powerline That will give you the font to put in the config.

If you have a better way to test terminal performance I'd love to try it.