There are lots of great articles around on AWS Virtual Private Cloud (VPC) networking components so here is my VPC internet access troubleshooting checklist for workloads running in AWS VPCs which need to access the internet or be accessed from the internet.
Draw a Diagram
To start, I begin by drawing a picture in my head of the network. The detail level is coarse at first just the primary hops. If not a mental drawing, then grab some paper or a tablet and chart that thing out! (No shame in making a diagram). Make a note of public/private subnets, gateways and IP address schemes (to the best of your knowledge)
For the purposes of this discussion I define:
- Public Subnet - a subnet containing instances with both private and public IPs assigned to them and a route table with a route directly to an Internet Gateway
- Private Subnet - a subnet containing instances with only private IPs (non-internet-routable)
Work the Checklist
Once I understand the general source/destination flow of traffic I can go through my checklist:
1 - Do I have a public IP?
Your instance needs to either have a public IP or borrow one for it to use to send/receive requests from the internet.
Dynamic AWS Public IP - can be assigned at EC2 launch time "Auto-Assig Public IP: Enabled" or the default option of "Auto-Assign Public IP: Use Subnet Setting" if the subnet you are deploying to has Auto-Assign Public IP enabled. Remember these ones are dynamic and can change across start/stop/crash of the instance.
Elastic IP (static public IP) - can be allocated to your account and then "associated" with one of your instances or Elastic Network Interfaces (ENIs). These IPs do not change unless you release them back to AWS, useful for certain types of simple EC2 transitions through re-associating the EIP with different instances or ENIs
Network Address Translation (NAT) Server or Gateway - a NAT server can be deployed in a VPC to allow "private" instances (no direct public IPs) to borrow/use the NAT's public IP (through encapsulation). At AWS you can build your own NAT instance or use the AWS NAT Gateway (can be expensive, be careful). If using a NAT-based solution the route table in your instance's subnet needs to have its' default route (0.0.0.0/0) pointed to the NAT instance/GW rather than just pointed to the Internet Gateway (see below)
2 - Do I have a gateway to the internet?
The internet gateway for a VPC works like the WWW/Internet port on your home router, it provides a path from the VPC to the internet and back.
Make sure you have an internet gateway (IGW) provisioned.
Make sure the IGW is also attached to your VPC. This is last step makes the IGW available as a routing destination target (see routing below)
If NAT-based access is being used for "private" instances, your VPC still needs an IGW for the NAT instance/gateway to send traffic to (see routing below)
3 - Do I have a route to the internet?
Route tables and rules provide instructions for directing IP traffic to the internet.
-
Public Subnets:
- Verify you have a route table associated with your VPC
- Verify your subnet's associated route table contains a default (0.0.0.0/0) route
- Verify the default route (0.0.0.0/0) is targeting a valid IGW (see internet gateway above)
-
Private Subnets: (Requires NAT)
- Verify you have a route table associated with your VPC
- Verify your subnet's associated route table contains a default (0.0.0.0/0) route
- Verify the default route (0.0.0.0/0) is targeting the ENI of your NAT instance/gateway
4 - Is anything filtering traffic to/from the internet?
Review any firewalls along the path to/from the internet. Consider the direction of the direction of the traffic and the application ports that are in use i.e. HTTP/S (80/443), SSH (22), RDC/RDP (3389)
-
Is anything on the instance's OS filtering traffic?
- Windows Firewall
- Linux IPTables
- 3rd Party protection tools (TrendMicro, Avast, CheckPoint, etc)
-
Is a Security Group (SG) filtering traffic?
- Which SGs (there may be more than one) are attached to my instance's ENIs, take inventory
- Is there a valid inbound rule configured?
- Correct source address pattern?
- Correct protocol?
- SG rules can only be configured to allow traffic (never deny)
- SGs are "stateful" by default they allow replies. Is the default outbound Allow All rule intact? (It should be in most cases)
-
Is a Network Access Control List (NACL) filtering traffic for my instance's subnet?
- Is the default NACL still intact? (allow all inbound and allow all outbound)
- If a custom NACL:
- Is there a valid inbound rule configured?
- Is there a valid outbound rule configured?
- Is there an ordering problem with the rules? "Rule Number" controls ordering/priority. Higher value = higher priority
- NACL rules are "stateless". You must define both inbound and outbound
- Are there any DENY rules configured in the NACL?
Are there any other network-devices which could be filtering traffic?
Depending on how your subnet/VPC/routing has been configured it is possible that traffic is passing through some sort of network-attached system which is filtering traffic. This would likely show up in the routing check you did previously. Check for strange destinations in the default rule (0.0.0.0/0) or odd gateway destinations. Many networks may have VPN tunnels or specialized filtering, or monitoring devices configured. Contact your security teams and escalate through those channels.
Conclusion
Checklists like this are helpful to even "seasoned" pros because it forces you to use a process rather than rely on cognitive recollection and potentially introducing human error. Start with a basic diagram of what you think the network path looks like and then work through the list. Often these steps can help troubleshoot access to just about any network-attached resources.
Heck, just the other day I was doing a demo on an AWS topic and overlooked a simple Security Group issue. Simple steps can add up to simple solutions and that's a plus in an overly complex technical world.
Top comments (2)
Yes to checklists! Thanks for sharing this!
Fantastic resource Bart!