DEV Community

Cover image for Running forward proxy in AWS
michal salanci for AWS Community Builders

Posted on

Running forward proxy in AWS

Hello friends, let me introduce you to our serverless forward proxy concept in AWS, which runs on AWS Network Firewall and Squid proxy in ECS container.

There will be upcoming articles soon, where I will dive deeper into setup of the AWS NFW and Squid in ECS, Cloudwatch logs, DNS setup with Dnsmasq, testing the network performance with K9, monitoring with Telegraf, etc...

Now let's see how the basic setup of forward proxy in AWS may look like.

Introduction to forward proxy

What is forward proxy and why we need it

Imagine you are in a corporate datacenter, or at home and you want to connect to a website in the internet. You send HTTP or HTTPS request to a website. Webserver process the request and responds with the payload.

Image description

This is how it should look like in the ideal world. However, you can unintentionally access a harmful website, risking exposure to malware or other security threats? To mitigate those risks, organizations often use an outbound filtering system known as a forward proxy.

Image description

A forward proxy acts as an intermediary solution between a user's device and the internet. It helps manage and control internet traffic, ensuring security and compliance.

It examines outgoing requests and filters the traffic based on pre-set rules. This could include checking the destination URL, IP address, or type of requested content. By doing so, the proxy ensures that only safe and compliant requests reach the internet, thereby enhancing security and privacy.

For instance, in a corporate environment, a forward proxy might block access to non-work-related websites, ensuring both network security and employee productivity.

When user creates a request, if the request complies with the rules, the proxy allows it to pass through to the internet. If not, it blocks the request, effectively preventing access to potentially harmful or non-compliant content.

Forward proxies can also anonymize web requests, hiding the user's IP address from external web servers. This adds a layer of privacy and security, protecting users from potential tracking or hacking.

Some forward proxies cache frequently accessed content. This means that if multiple users request the same resource, the proxy can serve it from its cache, reducing load times and saving bandwidth.

Image description

Explicit and Transparent proxy

Proxy can handle the traffic in two ways – as an explicit proxy or transparent proxy.

Below is the brief comparison of both:

Image description

Transparent proxy being invisible to users is actually a great security advantage, because explicit proxy can be bypassed simply by not specifying its address in the request, however user can’t bypass the transparent, as the requests are routed there by default.

Serverless forward proxy in AWS

Let’s imagine that customers managing their own VPC and are connecting to the internet via Outbound VPC, as a central point of internet access.

Image description

Outbound VPC is the place where egress connections can be secured and controlled and this is also the place where forward proxy operates.

The initial design is modified by introducing an inspection subnet, where all the magic happens.

Image description

AWS offers a native solution for transparent proxy – AWS Network Firewall.

Since there is no native solution for explicit proxy, 3rd party solution, such as Squid proxy can be used. It can be placed into the container and managed by AWS Fargate.
Let’s examine the components of the Inspection subnet in more detail.

Explicit forward proxy on Squid

As mentioned before, since there is no native AWS solution for explicit proxy, it is necessary to use some of the 3rd party solutions. This article aims to use of Squid Proxy.

Squid Proxy is widely used open source proxy solution. It can terminate the TCP and that makes it a perfect candidate for explicit proxy. It can run on EC2 instance, or in ECS container.

Image description

In this architecture, Squid runs in an ECS container, managed by AWS Fargate.

AWS Fargate is a compute engine for Amazon ECS, which allows you to run containers without having to manage servers or clusters. Fargate abstracts the underlying infrastructure management tasks such as provisioning, scaling, and maintaining servers, enabling you to focus on designing and building your applications.

When creating a Docker image for squid proxy, we used 3 main components:

  • urlwhitelist.txt – list of allowed URLs.

  • ipwhitelist.txt – list of allowed IP addresses.

  • squid.conf – configuration file of the Squid - this is where all the behavior (what is denied, what is allowed, caching, etc..) is defined.

In this particular scenario squid proxy configured like this:

  • Listens for HTTP and HTTPS traffic on port 3128 and enable SSL bumping for HTTPS traffic.

  • Blocks access to all destinations (URLs and/or IPs), except for what is allowed in the whitelist files.

  • Caches the content.

When user establish a HTTP/HTTPS request via explicit proxy this is what happens:

  1. Since Squid is configured to operate as a proxy and is listening for incoming requests on port 3128.

  2. Request is evaluated against the rules which determine if the requested URL is permitted. This decision is based on whether the URL is listed in the whitelist_URL.txt file.

  3. If the requested URL is not whitelisted in urlwhitelist.txt file, the request is denied.

  4. If the requested URL is whitelisted it is allowed further.

  5. For allowed requests, Squid checks its cache. If a cached version of the requested resource is available, Squid will serve this content directly to the client.

  6. If the requested content is not in the cache, Squid fetches the content from the destination web server and forwards it to the original client.

  7. To the client, it appears as if it received the response directly from the web server, even though it was routed through Squid.

Combo with AWS Network Loadbalancer

For users to be able to successfully send HTTP/HTTPS request to the Squid container, another AWS component is necessary – AWS Network Load Balancer

ECS Tasks with Squid running inside as a container are part of NLB’s target group.

The purpose of AWS Network Loadbalancer is to listen to the traffic in front of the Squid and then redistribute the traffic to its targets – ECS Tasks running Squid.

This setup has several advantages:

Performance: NLB is designed to handle millions of requests per second while maintaining low latencies. It operates at Transport Layer (L4) of the OSI model, which allows them to efficiently route TCP traffic. This is particularly beneficial for a proxy server like Squid that handles a significant amount of TCP traffic.

High Availability and Reliability: The use of a Network Load Balancer ensures that traffic is distributed efficiently across available ECS Tasks. If one instance becomes unhealthy or fails, the NLB can redirect traffic to the remaining healthy instances, maintaining service availability. With that setup, we can have as many ECS containers as we need.

Image description

Running with sidecar

Putting the Squid container into an ECS Task, has another advantage – possibility of using a sidecar container.

A sidecar container is a design pattern where a secondary container is deployed alongside a primary application container, sharing the same lifecycle and resources, but performing a supporting function that's essential to the operation or management of the primary container.

As it turned out, logs created by Squid are not visible in the Cloud Watch, so some kind of a log processor is needed to parse the logs from Squid and send them to the Cloudwatch.

There are plenty of log processors available, however AWS supports and provides the Docker image of FluentBit log processor. Except for others, it includes plugins and configurations that are optimized for sending logs to CloudWatch.

Because ECS Task allows us to run multiple containers inside, FluentBit can now run as sidecar container, to gather the logs from Squid container and to send them to CloudWatch.

Image description

But how exactly Fluentbit gets the logs created by Squid?

Let’s examine the ECS topology in more detail:

Squid container and Fluentbit as a sidecar container are both part of same ECS Task.

ECS Tasks are part of ECS service, which is part of ECS Cluster. ECS Cluster spans through multiple Fargate instances.

For squid to be able to exchange the logs with fluentbit, some kind of a storage is needed. There are multiple options here, such as using EFS, or instance store. We decided to use instance store of particular Fargate instance, as it seems to be the simplest and most cost effective solution.

When squid created the log, it sends it immediately to the instane store of the Fargate instance it runs on. Fluentbit then reads the logs from the store, parse it to the appropriate format and forwards to Cloudwatch.

Image description

Please beware, that instance store is temporary – once the container dies and is redeployed in new Fargate instance, you loose all your data. However, this should not be a big concern, because once the logs are sent to the Cloudwatch, they stay there even if the instance store is gone.

Transparent forward proxy on AWS network firewall

Transparent proxy is also necessary, in case the users do not specify any proxy in the request. AWS provides a native solution for that – AWS Network Firewall.

AWS Network Firewall, introduced in 2020, is a managed firewall that primarily provides firewall protection for VPC resources in AWS. It's designed to provide stateful inspection of network traffic, intrusion detection and prevention, and web filtering.

AWS Network Firewall is able to inspect both ingress and egress traffic.

All its features are behind the scope of this article, but let’s just focus on some which are important for transparent proxy capabilities.

Stateful Inspection: AWS Network Firewall tracks the state of active connections and makes decisions based on the context of the traffic (not just the individual packets). It is able to inspect both inbound and outbound traffic.

Web Filtering: It can also block or allow access to specific websites or categories of websites.

Those 2 features are exactly what we need for AWS Network Firewall to act as a transparent proxy.

AWS Network Firewall consists of 3 main components

Firewall rule

  • Basic building component of network inspection behavior.

  • It defines the criteria to inspect and control the traffic, such as IP addresses, ports, protocols, etc…

  • Rules are grouped in the Rule Group

Firewall rule group

  • Collection of rules, organized into single manageable unit.

  • Can be stateful or stateless. Stateful rule groups can track the state of network connections, while stateless Rule groups treat each packet individually and independently.

  • Rule groups can be applied to Firewall policy.

Firewall Policy

  • Collection of one or more rule groups, organized into single manageable unit.

  • Organizes the order in which the rule groups are being evaluated and defines a default action (what happens if no rule is hit).

More on AW Network Firewall concepts can be found here:
https://aws.amazon.com/blogs/aws/aws-network-firewall-new-managed-firewall-service-in-vpc/
https://aws.amazon.com/de/blogs/networking-and-content-delivery/deployment-models-for-aws-network-firewall/

Setting up AWS Network Firewall for transparent proxy

In Firewall policy, the default order in the stateful rule group is Strict, and the default action is Alert established + Drop all

Image description

Let’s break it down:

Drop all + Alert established:

  • Drop all: Any traffic that doesn't match any of the rules in the stateful rule group, will be dropped. This is kind of implicit deny at the end of the ruleset.

  • Alert established: While network firewall drops traffic not matching the allow rules, it will specifically log (alert) the traffic that is part of an already established connection. An established connection is part of already ongoing session, when 3-way TCP handshake is done. It does not log the TCP 3-way handshake itself, instead it logs traffic that occurs after the TCP is correctly established.

Strict rule ordering – when firewall finds a match in the rule of the rulegroup, no further evaluation is done and the action defined in the rule is taken

When user creates a HTTP/HTTPS request via transparent proxy this is what happens:

  1. Request is evaluated against rules in the rulegroups. The decision is based on whether it finds a match in any of the rules or not.

  2. If request matches any of the rules, appropriate action defined in that rule is taken.

  3. If request does not match any of the rules, the default action is taken (Drop all) and request is dropped.

  4. There are no caching possibilities in network firewall.

Routing and network flow

Once everything is set up, let’s check the routing and network flow of explicit and transparent proxy

Explicit proxy network flow

When user wants to reach www.amazon.com while usage explicit proxy is required, the proxy address must be specified in the request. In this case, the network loadbalancer DNS acts as a proxy address.

  1. User creates request to www.amazon.com, from EC2 10.0.1.130, while specifying network loadnalncer DNS name in the request - internal-fwdproxynlb-1234567890-eu-central-1.elb.amazonaws.com and port 3128.

  2. DNS name of the loadbalancer is translated to its IP address 192.168.3.10 – which is now the destination IP address of the packet.

  3. Based on the default route in the user’s VPC, traffic is sent to AWS transit gateway.

  4. In transit gateway, there is a route to 192.168.0.0/16, towards transit gateway attachment in private subnet of Outbound VPC.

  5. From Outbound VPC private subnet, the traffic gets to network loadbalancer, based on a local route.

  6. Network loadbalancer makes a loadbalancing decision and picks up one of the members of its target group, to send packets to. This is actually an ECS Task. NLB preserves the client's source IP, so the Squid inside the ECS Task sees the original source IP - 10.0.1.130.

  7. In ECS Task, the packet is evaluated against the urlwhitelist.txt, and if allowed, squid terminates the initial request, and creates a new one. Now the source IP address is ECS Task IP – 192.168.2.28 and destination is www.amazon.com. There is a default route towards the NAT gateway, so the packet is sent there.

  8. NAT gateway performs source NAT from 192.168.2.28 to its own public IP 3.48.29.55 and sends it to the internet gateway.

  9. Internet gateway sends it to the destination.

  10. When destination responds, and packet gets back to the internet gateway, it is sent back to NAT Gateway.

  11. In NAT gateway the destination IP is changed back to 192.168.2.28 and on a local route the packet gets back to ECS Task and the Squid inside. Squid forwards the response back to network loadbalancer.

  12. Network loadbalancer knows the client IP and based on the route 10.0.0.0/16 in the routing table, the packet is sent to transit gateway.

  13. Transit gateway checks its routing tables and finds a route to 10.0.0.0/16 towards its attachment in private subnet of client VPC.

  14. Once packet reaches private subnet of client VPC, by local route it gets back to client’s EC2.

Image description

Transparent proxy network flow

When user wants to reach www.amazon.com and no proxy is specified, it automatically goes via transparent proxy.

  1. User creates request to www.amazon.com, from EC2 10.0.1.130.

  2. Based on the default route in the user’s VPC, traffic is sent to AWS Transit Gateway.

  3. From transit gateway, the packets is sent to the transit gateway attachment in private subnet of Outbound VPC.

  4. From there, based on the default route it gets to AWS network firewall.

  5. Traffic is inspected against the firewall rules, and if allowed, based on the default route it gets to NAT gateway.

  6. NAT gateway performs source NAT from 10.0.1.130 to its own public IP 3.48.29.55 and sends it to the internet gateway.

  7. Internet gateway sends it to the destination.

  8. When destination responds, and packet gets back to the internet gateway, it is sent back to NAT Gateway.

  9. In the NAT gateway the destination IP is changed back to 10.0.1.130. NAT gateway knows the route for 10.0.0.0/16, so response packet is sent to network firewall.

  10. In network firewall the response packet is evaluated against the rules and if allowed, based on the routing it is sent to transit gateway.

  11. Transit gateway checks its routing tables and finds a route to 10.0.0.0/16 towards its attachment in private subnet of client VPC.

  12. Once packet reaches private subnet of client VPC, by local route it gets back to client’s EC2.

Image description

Conclusion

As we conclude this comprehensive exploration of forward proxies, it's clear that these tools are very important.

Forward proxies play a critical role in enhancing network security, regulating internet traffic, and ensuring compliance with organizational policies. Their ability to filter, monitor, and control access to web resources is vital in protecting against cyber threats.

Whether it's a explicit proxy running in container, or transparent proxy in AWS Network Firewall, these solutions are tailored to address a broad spectrum of security and compliance requirements.

We've seen that explicit proxies offer more control and detailed traffic inspection, making them ideal for environments requiring stringent security measures.

Transparent proxies, on the other hand, provide ease of use and maintenance, making them suitable for basic filtering and routing without needing end-user configuration.
The integration of forward proxies within the AWS VPC, such as using Squid inside the ECS container managed by Amazon Fargate, for explicit forward proxy or leveraging AWS Network Firewall for transparent forward proxy, showcases the versatility and scalability of AWS ecosystem.

Top comments (0)