loading...
Cover image for Nginx: Everything about proxy_pass

Nginx: Everything about proxy_pass

danielkun profile image Daniel Albuschat ・7 min read

With the advent of Microservices™, ingress routing and routing between services has been an every-increasing demand. I currently default to nginx for this - with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently.

However, the often needed proxy_pass directive has driven me crazy because of it's - to me unintuitive - behavior. So I decided to take notes on how it works and what is possible with it, and how to circumvent some of it's quirks.

First, a note on https

By default proxy_pass does not verify the certificate of the endpoint if it is https (how can this be the default behavior, really?!). This can be useful internally, but usually you want to do this very explicitly. And in case that you use publicly routed endpoints, which I have done in the past, make sure to set proxy_ssl_verify to on. You can also authenticate against the upstream server that you proxy_pass to using client certificates and more, make sure to have a look at the available options at https://docs.nginx.com/nginx/admin-guide/security-controls/securing-http-traffic-upstream/.

A simple example

A proxy_pass is usually used when there is an nginx instance that handles many things, and delegates some of those requests to other servers. Some examples are ingress in a Kubernetes cluster that spreads requests among the different microservices that are responsible for the specific locations. Or you can use nginx to directly deliver static files for a frontend, while some server-side rendered content or API is delivered by a WebApp such as ASP.NET Core or flask.

Let's imagine we have a WebApp running on http://localhost:5000 and want it to be available on http://localhost:8080/webapp/, here's how we would do it in a minimal nginx.conf:

daemon off;
events {
}
http {
    server {
        listen 8080;
        location /webapp/ {
            proxy_pass http://127.0.0.1:5000/api/;
        }
    }
}

You can save this to a file, e.g. nginx.conf, and run it with

nginx -c $(pwd)/nginx.conf.

Now, you can access http://localhost:8080/webapp/ and all requests will be forwarded to http://localhost:5000/api/.
Note how the /webapp/ prefix is "cut away" by nginx. That's how locations work: They cut off the part specified in the location specification, and pass the rest on to the "upstream". "upstream" is called whatever is behind the nginx.

To slash or not to slash

Except for when you use variables in the proxy_pass upstream definition, as we will learn below, the location and upstream definition are very simply tied together. That's why you need to be aware of the slashes, because some strange things can happen when you don't get it right.

Here is a handy table that shows you how the request will be received by your WebApp, depending on how you write the location and proxy_pass declarations. Assume all requests go to http://localhost:8080:

location proxy_pass Request Received by upstream
/webapp/ http://localhost:5000/api/ /webapp/foo?bar=baz /api/foo?bar=baz
/webapp/ http://localhost:5000/api /webapp/foo?bar=baz /apifoo?bar=baz
/webapp http://localhost:5000/api/ /webapp/foo?bar=baz /api//foo?bar=baz
/webapp http://localhost:5000/api /webapp/foo?bar=baz /api/foo?bar=baz
/webapp http://localhost:5000/api /webappfoo?bar=baz /apifoo?bar=baz

In other words: You usually always want a trailing slash, never want to mix with and without trailing slash, and only want without trailing slash when you want to concatenate a certain path component together (which I guess is quite rarely the case). Note how query parameters are preserved!

$uri and $request_uri

You have to ways to circumvent that the location is cut off: First, you can simply repeat the location in the proxy_pass definition, which is quite easy:

location /webapp/ {
    proxy_pass http://127.0.0.1:5000/api/webapp/;
}

That way, your upstream WebApp will receive /api/webapp/foo?bar=baz in the above examples.

Another way to repeat the location is to use $uri or $request_uri. The difference is that $request_uri preserves the query parameters, while $uri discards them:

location proxy_pass request received by upstream
/webapp/ http://localhost:5000/api$request_uri /webapp/foo?bar=baz /api/webapp/foo?bar=baz
/webapp/ http://localhost:5000/api$uri /webapp/foo?bar=baz /api/webapp/foo

Note how in the proxy_pass definition, there is no slash between "api" and $request_uri or $uri. This is because a full URI will always include a leading slash, which would lead to a double-slash if you wrote "api/$uri".

Capture regexes

While this is not exclusive to proxy_pass, I find it generally handy to be able to use regexes to forward parts of a request to an upstream WebApp, or to reformat it. Example: Your public URI should be http://localhost:8080/api/cart/items/123, and your upstream API handles it in the form of http://localhost:5000/cart_api?items=123. In this case, or more complicated ones, you can use regex to capture parts of the request uri and transform it in the desired format.

location ~ ^/api/cart/([a-z]*)/(.*)$ {
   proxy_pass http://127.0.0.1:5000/cart_api?$1=$2;
}

Use try_files with a WebApp as fallback

A use-case I came across was that I wanted nginx to handle all static files in a folder, and if the file is not available, forward the request to a backend. For example, this was the case for a Vue single-page-application (SPA) that is delivered through flask - because the master HTML needs some server-side tuning - and I wanted to handle nginx the static files instead of flask. (This is recommended by the official gunicorn docs.)

You might have everything for your SPA except for your index.html available at /app/wwwroot/, and http://localhost:5000/ will deliver your server-tuned index.html.

Here's how you can do this:

location /spa/ {
   root /app/wwwroot/;
   try_files $uri @backend;
}
location @backend {
   proxy_pass http://127.0.0.1:5000;
}

Note that you can not specify any paths in the proxy_pass directive in the @backend for some reason. Nginx will tell you:

nginx: [emerg] "proxy_pass" cannot have URI part in location given by regular expression, or inside named location, or inside "if" statement, or inside "limit_except" block in /home/daniel/projects/nginx_blog/nginx.conf:28

That's why your backend should receive any request and return the index.html for it, or at least for the routes that are handled by the frontend's router.

Let nginx start even when not all upstream hosts are available

One reason that I used 127.0.0.1 instead of localhost so far, is that nginx is very picky about hostname resolution. For some unexplainable reason, nginx will try to resolve all hosts defined in proxy_pass directives on startup, and fail to start when they are not reachable. However, especially in microservice environments, it is very fragile to require all upstream services to be available at the time the ingress, load balancer or some intermediate router starts.

You can circumvent nginx's requirement for all hosts to be available at startup by using variables inside the proxy_pass directives. HOWEVER, for some unfathomable reason, if you do so, you require a dedicated resolver directive to resolve these paths. For Kubernetes, you can use kube-dns.kube-system here. For other environments, you can use your internal DNS or for publicly routed upstream services you can even use a public DNS such as 1.1.1.1 or 8.8.8.8.

Additionally, using variables in proxy_pass changes completely how URIs are passed on to the upstream. When just changing

proxy_pass https://localhost:5000/api/;

to

set $upstream https://localhost:5000;
proxy_pass $upstream/api/;

... which you might think should result in exactly the same, you might be surprised. The former will hit your upstream server with /api/foo?bar=baz with our example request to /webapp/foo?bar=baz. The latter, however, will hit your upstream server with /api/. No foo. No bar. And no baz. :-(

We need to fix this by putting the request together from two parts: First, the path after the location prefix, and second the query parameters. The first part can be captured using the regex we learned above, and the second (query parameters) can be forwarded using the built-in variables $is_args and $args. If we put it all together, we will end up with a config like this:

daemon off;
events {
}
http {
    server {
        access_log /dev/stdout;
        error_log /dev/stdout;
        listen 8080;
        # My home router in this case:
        resolver 192.168.178.1;
        location ~ ^/webapp/(.*)$ {
            # Use a variable so that localhost:5000 might be down while nginx starts:
            set $upstream http://localhost:5000;
            # Put together the upstream request path using the captured component after the location path, and the query parameters:
            proxy_pass $upstream/api/$1$is_args$args;
        }
    }
}

While localhost is not a great example here, it works with your service's arbitrary DNS names, too. I find this very valuable in production, because having an nginx refuse to start because of a probably very unimportant service can be quite a hassle while wrangling a production issue. However, it makes the location directive much more complex. From a simple location /webapp/ with a proxy_pass http://localhost/api/ it has become this behemoth. I think it's worth it, though.

Better logging format for proxy_pass

To debug issues, or simply to have enough information at hand when investigating issues in the future, you can maximize the information about what is going on in your location that uses proxy_pass.

I found this handy log_format, which I enhanced with a custom variable $upstream, as we have defined above. If you always call your variables $upstream in all your locations that use proxy_pass, you can use this log_format and have often much needed information in your log:

log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: $upstream: $request upstream_response_time $upstream_response_time msec $msec request_time $request_time';

Here is a full example:

daemon off;
events {
}
http {
    log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: "$upstream": "$request" upstream_response_time $upstream_response_time msec $msec request_time $request_time';
    server {
        listen 8080;

        location /webapp/ {
            access_log /dev/stdout upstream_logging;
            set $upstream http://127.0.0.1:5000/api/;
            proxy_pass $upstream;
        }
    }
}

However, I have not found a way to log the actual URI that is forwarded to $upstream, which would be one of the most important things to know when debugging proxy_pass issues.

Conclusion

I hope that you have found helpful information in this article that you can put to good use in your development and production nginx configurations.

Posted on by:

danielkun profile

Daniel Albuschat

@danielkun

Have had many hats on in my life: Developer, Team Lead, Scrum Master, Architect and Product Owner. Now back to developer \o/ Interested in product discovery, quality assurance and language design.

Discussion

pic
Editor guide
 

"with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently" - we all do, mate. Props for being honest though!

Really useful article, I wish I'd had this when I started my first job.
Back then I just assumed nginx was magic.

 

"is that nginx is very picky about hostname resolution. For some unexplainable reason, nginx will try to resolve all hosts defined in proxy_pass directives on startup"

In some particular use-cases - especially when using static configuration - this is THE main reason why nginx is not production ready. One temporarily unresolved backend host can cause nginx farm not restartable. This is exact opposite behavior from very disliked nowadays: Apache httpd.

From the other hand for dynamic generated configuration this is a good tool, but has many good successors too: traefik, istio, linkerd, fabio...

There is another piece: missing a global configuration ProxyPreserveHost Off - you cannot do that globally in nginx. You need to specify it per each proxy_paas

using proxy_set_header Host foo.bar. That makes config very unreadable.

 

I was completely clueless why the proxy_pass wasn't working with the "set directive". Thanks for sharing. :)

In fact, the "set directive" is also the only way to force nginx to resolve the domain name again (using the parameters defined in the resolver). Otherwise, it will be cached forever...

 

Thanks for sharing the above Daniel. I am trying to achieve something on similar lines but somehow not been able to. May be you are able to point out the wrong stuff here..

Have an application running at website.com. When I load this website there are various internal calls made to load css, js like: website.com/static/css/main.c87626...

I want to proxy a call from my localhost to website.com which works well but it fails when it tries to load these static files during proxy_pass as it tries to look at it locally in the nginx path.

snippet of my configuration file:

    location /test/website {
        resolver 8.8.8.8;
        proxy_set_header Host website.com;
        proxy_pass http://website.com;         
    }

Error : CreateFile() "\nginx-1.16.1/html/static/css/main.c8762633.chunk.css" failed (3: The system cannot find the path specified), client: 127.0.0.1, server: localhost, request: "GET /static/css/main.c8762633.chunk.css HTTP/1.1", host: "localhost", referrer: "localhost/test/website"

How can I force nginx to look at those assets, js on the remote server itself ? Is there any header / properties that needs to be set ?

 

Thanks for sharing <3 I agree, that unintuitive config can drive someone crazy. Especially the relevance of the slashes.

 

Thank you, Daniel, for sharing your experience with proxy_pass issues, very useful