Protect Your On-Premises Website with AWS WAF and Amazon CloudFront
Protecting web applications—regardless of where they run—with a scalable, highly available, and cost-effective edge layer is a cornerstone of modern architecture. This article walks through a hands-on proof of concept (PoC) that leverages AWS WAF for security, CloudFront for global caching and performance, and a serverless pipeline for on-the-fly HTTP context modification.
You’ll learn how to:
- Enhance availability and reduce origin load by caching static and dynamic content at the edge.
- Secure traffic with AWS WAF rules before it ever reaches CloudFront.
- Offload requests from the origin to minimize connectivity and costs.
- Modify any part of the HTTP request or response—including headers, query strings, and body—using a Lambda-powered proxy.
Sensitive data and IP addresses in screenshots have been blurred.
1. Introduction
Disclaimer: I understand this guide—demonstrating how to proxy and modify HTTP content—might resemble instructions used by phishing site creators. That was never my intention; these techniques are meant solely for legitimate security, performance, and reliability purposes.
Traditional WAF appliances and CDNs can be expensive, rigid, and complex to manage—particularly when you need to retrofit existing on-premises applications. By combining AWS WAF and Amazon CloudFront, you get:
- Edge Security: AWS WAF blocks malicious requests at AWS’s global PoPs, shielding your origin.
- Global Caching: CloudFront serves cached content from locations closest to your users, reducing latency and origin hits.
- Origin Offload: Less network traffic and compute on your servers, lowering TCO.
- On-the-Fly HTTP Manipulation: Inject, strip, or rewrite headers and body content without touching origin code.
We’ll demonstrate these capabilities using a simple IP-lookup service (noc.co.il) to keep the PoC lean and focused on edge mechanics.
2. Evaluating Origin Compatibility
For this PoC, we chose a simple IP-lookup service (noc.co.il) to isolate edge mechanics. Our goals were to verify that the origin would:
- Respond over HTTPS without client interruptions.
- Trust and reflect standard proxy headers (
X-Forwarded-For). - Parse header values as plain text (no validation), allowing us to spoof or inject values.
2.1 Initial Compatibility Test
We first obtained the curl command via the browser’s Copy as cURL option:
curl 'https://noc.co.il/' \
-H 'Accept: text/html' \
--compressed
Running this returned the expected HTML shell of the IP lookup page:
$ curl -s -D - --compressed https://noc.co.il/
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
...
<html>My IP: 203.0.113.10</html>
2.2 Testing X-Forwarded-For
Next, we injected a fake IP via the X-Forwarded-For header to see if the service honors it:
curl -s -D - \
-H "X-Forwarded-For: 198.51.100.24" \
--compressed \
https://noc.co.il/
Findings:
- The response body included both our real IP and
198.51.100.24, confirming the header was trusted. - No header validation errors occurred.
2.3 Spoofing Non-existent Addresses
To verify lack of validation, we added another bogus IP:
curl -s \
-H "X-Forwarded-For: 198.51.100.24, 300.300.300.300" \
https://noc.co.il/ | tee response.html
Inspecting response.html in a browser showed the exact header text echoed in the page—demonstrating the origin parses the header as raw text.
Sidebar: Because CloudFront automatically appends X-Forwarded-For, this origin behavior is crucial for forwarding the true client IP downstream.
3. Solution Architecture Overview
3.1 Original Architecture
In the initial approach, I fronted the origin directly with AWS WAF and CloudFront, without any additional proxy layer.
Clients -> AWS WAF -> CloudFront -> Original Site (noc.co.il)
- AWS WAF: Enforces IP allow-lists and managed security rules at AWS edge locations before traffic reaches CloudFront.
- CloudFront: Distributes and caches content globally, forwarding only necessary headers to the origin.
-
Original Site: The unmodified
noc.co.ilendpoint, responding directly to CloudFront requests.
3.2 Final Architecture (With API Gateway Proxy)
To overcome response-body limitations, I introduced an API Gateway/Lambda proxy between CloudFront and the origin. The Lambda function serves as a reverse proxy for dynamic HTML, while static assets like favicon.ico are served directly from S3 via a separate behavior.
- AWS WAF: Applies IP filters and managed rules at the AWS edge.
- CloudFront: Uses two cache behaviors:
-
/favicon.ico-> S3 origin for static assets with optimized TTLs. -
/*-> API Gateway for dynamic HTML via the reverse-proxy Lambda.- API Gateway (REST API Reverse Proxy Mode): Receives viewer requests for dynamic content and invokes the Lambda reverse-proxy function.
- Lambda Function (Reverse Proxy):
Fetch: Makes an HTTP request to
https://noc.co.ilincluding path and query parameters.Modify: Reads and parses the HTML response body.
-
Return: Sends the modified HTML back through API Gateway and CloudFront while preserving the original status code and headers.
-
S3 Origin: Serves the custom
favicon.icodirectly from edge caches, bypassing API Gateway.
-
S3 Origin: Serves the custom
4. Detailed Implementation
4.1 Certificate Management
- Generate a Let’s Encrypt certificate for your custom domain (
noc.ittools.net).
- Import it into ACM in
us-east-1for CloudFront.
4.2 AWS WAF Setup
- IP Sets: Define trusted IPv4/IPv6 addresses such as office and datacenter ranges.
- Web ACL: Allow traffic from the IP sets and set the default action to Block.
- Rate-based rules (optional): Throttle anomalous request spikes.
4.3 CloudFront Distribution
Configure the CloudFront distribution with the cache behavior and origin request policy that match your proxy design.
- Attach your AWS WAF Web ACL to this distribution.
- Enable response headers policies to add HSTS, CSP, and custom headers at the edge.
4.4 Reducing Origin Connectivity
Caching at CloudFront dramatically cuts down on requests to your origin:
- Static assets (images, CSS, JS) are served from edge caches for configurable TTLs.
- Dynamic content hits origin only on cache misses—adjustable via cache keys (headers, cookies, query strings).
This offload not only improves performance but reduces bandwidth and compute costs on your servers.
4.5 Full HTTP Context Access and Modification
Neither CloudFront Functions nor Lambda@Edge allow modifying response bodies. Additionally, since both CloudFront and API Gateway automatically append their own X-Forwarded-For entries, we need a reverse proxy to:
- Fetch the original HTML from
https://noc.co.ilwhile preserving query strings and path. - Inject a custom
<link rel="icon" href="/favicon.ico" />into the<head>. - Strip the two trailing IP addresses added by CloudFront and API Gateway from any comma-separated IP lists embedded in the HTML body.
- Return the modified HTML through API Gateway and CloudFront while preserving the origin’s headers and status codes.
To achieve this, configure API Gateway in REST API reverse-proxy mode and invoke a Lambda function like this:
import os
import http.client
import re
from urllib.parse import urlencode
# Configuration
REMOVE_IP_COUNT = int(os.environ.get("REMOVE_IP_COUNT", "2"))
FAVICON_URL = os.environ.get("FAVICON_URL", "/favicon.ico")
ORIGIN_HOST = os.environ.get("ORIGIN_HOST", "noc.co.il")
ORIGIN_PATH = os.environ.get("ORIGIN_PATH", "")
def lambda_handler(event, context):
# 1) Parse client request info
method = event.get("httpMethod", "GET")
client_path = event.get("path", "/")
qs_params = event.get("queryStringParameters") or {}
query_str = f"?{urlencode(qs_params)}" if qs_params else ""
# 2) Build the upstream path
origin_path = (ORIGIN_PATH.rstrip("/") + client_path) or "/"
origin_path += query_str
# 3) Copy headers except Host
headers = {
k: v
for k, v in (event.get("headers") or {}).items()
if k.lower() != "host"
}
# 4) Fetch from the real origin
conn = http.client.HTTPSConnection(ORIGIN_HOST, 443, timeout=30)
conn.request(method, origin_path, headers=headers)
resp = conn.getresponse()
body_bytes = resp.read()
conn.close()
# 5) Decode HTML
body = body_bytes.decode("utf-8", errors="replace")
# 6) Inject favicon link right after <head>
body = re.sub(
r"(<head\b[^>]*>)",
rf"\1<link rel=\"icon\" href=\"{FAVICON_URL}\" type=\"image/png\">",
body,
count=1,
flags=re.IGNORECASE,
)
# 7) Remove the last N IPv4s from every comma-separated list
def strip_trailing_ips(match):
seq = match.group(0)
parts = re.split(r"\s*,\s*", seq)
if len(parts) > REMOVE_IP_COUNT:
parts = parts[:-REMOVE_IP_COUNT]
return ", ".join(parts)
body = re.sub(
r"\b(?:\d{1,3}\.){3}\d{1,3}(?:\s*,\s*(?:\d{1,3}\.){3}\d{1,3})+",
strip_trailing_ips,
body,
)
# 8) Build the API Gateway Lambda-proxy response
out_bytes = body.encode("utf-8")
return {
"statusCode": resp.status,
"headers": {
"Content-Type": resp.getheader("Content-Type", "text/html"),
"Content-Length": str(len(out_bytes)),
},
"body": body,
"isBase64Encoded": False,
}
This pattern gives you full control over outgoing responses without touching the origin code.
5. Final Results
Below is a comparison of the original IP lookup page and the modified version, showing the injected favicon and response changes.
Original Site
Modified Site
6. High Availability, Security, and CDN Benefits
- DDoS Mitigation: AWS WAF integrates with AWS Shield at no extra cost for CloudFront distributions.
- Global Failover: Edge caches can continue serving content even if the origin is briefly unreachable.
- Cost Efficiency: Pay only for WAF rule evaluations, Lambda invocations, and data transfer—no upfront hardware.
6.1 Amazon CloudFront as a CDN
Amazon CloudFront is AWS’s global content delivery network, designed to accelerate both static and dynamic content by caching it at edge locations worldwide. Key benefits include:
- Reduced latency: Delivers content from the closest edge location to end users.
- Customizable cache behaviors: Control TTLs, forwarded headers, query strings, and cookies per path pattern.
- Dynamic content support: Forward dynamic requests to any HTTP origin and selectively cache or bypass content.
- Security integrations: AWS WAF, TLS termination at the edge, and Origin Access Control for private S3 buckets.
- Invalidation and versioning: Invalidate cache on demand or use versioned URLs for instant updates.
- Pay-as-you-go pricing: Based on data transfer, request counts, and invalidation usage.
6.2 Origin Failover with Multiple Origins
To maximize availability, CloudFront supports Origin Groups—a primary/secondary origin configuration that automatically fails over if the primary origin becomes unhealthy.
- Primary origin: Your default origin, such as API Gateway or a custom origin.
- Secondary origin: A fallback endpoint, such as another region, an on-premises endpoint, or a static S3 bucket.
- Health checks: CloudFront monitors the primary origin using configurable HTTP(S) health checks.
- Failover logic: On health-check failure, requests are routed to the secondary origin with minimal latency impact.
Configuration steps:
- In the CloudFront distribution, create two origins.
- Define an origin group linking both origins and specify health-check parameters.
- Update cache behaviors to use the origin group as the target.
This failover capability ensures that even if your primary backend experiences downtime, CloudFront can seamlessly serve content from an alternate source.
7. Conclusion
By fronting any website—on-premises or in the cloud—with AWS WAF and CloudFront, you achieve:
- Enhanced availability through aggressive edge caching and failover.
- Robust security with customizable WAF rules at every PoP.
- Minimal origin load via cache offload and selective forwarding.
- Complete HTTP context manipulation using CloudFront Functions, Lambda@Edge, or, in very specific cases, API Gateway and Lambda for dynamic transformations.
Note: Normally the API Gateway + Lambda part is not required, since many HTML body changes can be made on the origin web server directly and these workarounds are unnecessary.
This architecture transforms legacy and modern applications alike into resilient, secure, and performant cloud-edge services.
Appendix A: Edge Compute Options — CloudFront Functions vs Lambda@Edge
AWS CloudFront provides two serverless compute options at the edge—CloudFront Functions and Lambda@Edge—each tailored to different use cases.
Parallels to F5 iRules:
- F5 iRules allow TCP, UDP, and HTTP inspection and manipulation on traditional load balancers.
- CloudFront Functions provide similar lightweight header- and URL-based logic at the edge, executing JavaScript within milliseconds.
- Lambda@Edge extends these capabilities with richer runtimes and more advanced traffic steering and content transformation.
- Both CloudFront serverless options and F5 iRules enable granular control over HTTP flows, but CloudFront offloads compute to a global CDN with pay-as-you-go billing.
Limitations:
- CloudFront Functions cannot perform network calls or access request bodies.
- Lambda@Edge offers more runtime and compute but incurs higher latency and cost for short-lived tasks.
This appendix clarifies compute options independently of the PoC’s specific Lambda reverse-proxy implementation.
Appendix B: Pricing Notes
The original LinkedIn article includes AWS WAF and CloudFront pricing examples. Before publishing on DEV, verify current pricing in the official AWS pricing pages because service prices can change over time. The original article listed example monthly Web ACL, rule, request, data transfer, request, invalidation, and certificate price points. (linkedin.com)










Top comments (0)