Daniel Albuschat

Posted on Aug 20, 2019

Nginx: Everything about proxy_pass

#nginx #microservices #http

With the advent of Microservices™, ingress routing and routing between services has been an every-increasing demand. I currently default to nginx for this - with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently.

However, the often needed proxy_pass directive has driven me crazy because of it's - to me unintuitive - behavior. So I decided to take notes on how it works and what is possible with it, and how to circumvent some of it's quirks.

First, a note on https

By default proxy_pass does not verify the certificate of the endpoint if it is https (how can this be the default behavior, really?!). This can be useful internally, but usually you want to do this very explicitly. And in case that you use publicly routed endpoints, which I have done in the past, make sure to set proxy_ssl_verify to on. You can also authenticate against the upstream server that you proxy_pass to using client certificates and more, make sure to have a look at the available options at https://docs.nginx.com/nginx/admin-guide/security-controls/securing-http-traffic-upstream/.

A simple example

A proxy_pass is usually used when there is an nginx instance that handles many things, and delegates some of those requests to other servers. Some examples are ingress in a Kubernetes cluster that spreads requests among the different microservices that are responsible for the specific locations. Or you can use nginx to directly deliver static files for a frontend, while some server-side rendered content or API is delivered by a WebApp such as ASP.NET Core or flask.

Let's imagine we have a WebApp running on http://localhost:5000 and want it to be available on http://localhost:8080/webapp/, here's how we would do it in a minimal nginx.conf:

daemon off;
events {
}
http {
    server {
        listen 8080;
        location /webapp/ {
            proxy_pass http://127.0.0.1:5000/api/;
        }
    }
}

You can save this to a file, e.g. nginx.conf, and run it with

nginx -c $(pwd)/nginx.conf.

Now, you can access http://localhost:8080/webapp/ and all requests will be forwarded to http://localhost:5000/api/.
Note how the /webapp/ prefix is "cut away" by nginx. That's how locations work: They cut off the part specified in the location specification, and pass the rest on to the "upstream". "upstream" is called whatever is behind the nginx.

To slash or not to slash

Except for when you use variables in the proxy_pass upstream definition, as we will learn below, the location and upstream definition are very simply tied together. That's why you need to be aware of the slashes, because some strange things can happen when you don't get it right.

Here is a handy table that shows you how the request will be received by your WebApp, depending on how you write the location and proxy_pass declarations. Assume all requests go to http://localhost:8080:

location	proxy_pass	Request	Received by upstream
/webapp/	http://localhost:5000/api/	/webapp/foo?bar=baz	/api/foo?bar=baz
/webapp/	http://localhost:5000/api	/webapp/foo?bar=baz	/apifoo?bar=baz
/webapp	http://localhost:5000/api/	/webapp/foo?bar=baz	/api//foo?bar=baz
/webapp	http://localhost:5000/api	/webapp/foo?bar=baz	/api/foo?bar=baz
/webapp	http://localhost:5000/api	/webappfoo?bar=baz	/apifoo?bar=baz

In other words: You usually always want a trailing slash, never want to mix with and without trailing slash, and only want without trailing slash when you want to concatenate a certain path component together (which I guess is quite rarely the case). Note how query parameters are preserved!

$uri and $request_uri

You have to ways to circumvent that the location is cut off: First, you can simply repeat the location in the proxy_pass definition, which is quite easy:

location /webapp/ {
    proxy_pass http://127.0.0.1:5000/api/webapp/;
}

That way, your upstream WebApp will receive /api/webapp/foo?bar=baz in the above examples.

Another way to repeat the location is to use $uri or $request_uri. The difference is that $request_uri preserves the query parameters, while $uri discards them:

location	proxy_pass	request	received by upstream
/webapp/	http://localhost:5000/api$request_uri	/webapp/foo?bar=baz	/api/webapp/foo?bar=baz
/webapp/	http://localhost:5000/api$uri	/webapp/foo?bar=baz	/api/webapp/foo

Note how in the proxy_pass definition, there is no slash between "api" and $request_uri or $uri. This is because a full URI will always include a leading slash, which would lead to a double-slash if you wrote "api/$uri".

Capture regexes

While this is not exclusive to proxy_pass, I find it generally handy to be able to use regexes to forward parts of a request to an upstream WebApp, or to reformat it. Example: Your public URI should be http://localhost:8080/api/cart/items/123, and your upstream API handles it in the form of http://localhost:5000/cart_api?items=123. In this case, or more complicated ones, you can use regex to capture parts of the request uri and transform it in the desired format.

location ~ ^/api/cart/([a-z]*)/(.*)$ {
   proxy_pass http://127.0.0.1:5000/cart_api?$1=$2;
}

Use try_files with a WebApp as fallback

A use-case I came across was that I wanted nginx to handle all static files in a folder, and if the file is not available, forward the request to a backend. For example, this was the case for a Vue single-page-application (SPA) that is delivered through flask - because the master HTML needs some server-side tuning - and I wanted to handle nginx the static files instead of flask. (This is recommended by the official gunicorn docs.)

You might have everything for your SPA except for your index.html available at /app/wwwroot/, and http://localhost:5000/ will deliver your server-tuned index.html.

Here's how you can do this:

location /spa/ {
   root /app/wwwroot/;
   try_files $uri @backend;
}
location @backend {
   proxy_pass http://127.0.0.1:5000;
}

Note that you can not specify any paths in the proxy_pass directive in the @backend for some reason. Nginx will tell you:

nginx: [emerg] "proxy_pass" cannot have URI part in location given by regular expression, or inside named location, or inside "if" statement, or inside "limit_except" block in /home/daniel/projects/nginx_blog/nginx.conf:28

That's why your backend should receive any request and return the index.html for it, or at least for the routes that are handled by the frontend's router.

Let nginx start even when not all upstream hosts are available

One reason that I used 127.0.0.1 instead of localhost so far, is that nginx is very picky about hostname resolution. For some unexplainable reason, nginx will try to resolve all hosts defined in proxy_pass directives on startup, and fail to start when they are not reachable. However, especially in microservice environments, it is very fragile to require all upstream services to be available at the time the ingress, load balancer or some intermediate router starts.

You can circumvent nginx's requirement for all hosts to be available at startup by using variables inside the proxy_pass directives. HOWEVER, for some unfathomable reason, if you do so, you require a dedicated resolver directive to resolve these paths. For Kubernetes, you can use kube-dns.kube-system here. For other environments, you can use your internal DNS or for publicly routed upstream services you can even use a public DNS such as 1.1.1.1 or 8.8.8.8.

Additionally, using variables in proxy_pass changes completely how URIs are passed on to the upstream. When just changing

proxy_pass https://localhost:5000/api/;

set $upstream https://localhost:5000;
proxy_pass $upstream/api/;

... which you might think should result in exactly the same, you might be surprised. The former will hit your upstream server with /api/foo?bar=baz with our example request to /webapp/foo?bar=baz. The latter, however, will hit your upstream server with /api/. No foo. No bar. And no baz. :-(

We need to fix this by putting the request together from two parts: First, the path after the location prefix, and second the query parameters. The first part can be captured using the regex we learned above, and the second (query parameters) can be forwarded using the built-in variables $is_args and $args. If we put it all together, we will end up with a config like this:

daemon off;
events {
}
http {
    server {
        access_log /dev/stdout;
        error_log /dev/stdout;
        listen 8080;
        # My home router in this case:
        resolver 192.168.178.1;
        location ~ ^/webapp/(.*)$ {
            # Use a variable so that localhost:5000 might be down while nginx starts:
            set $upstream http://localhost:5000;
            # Put together the upstream request path using the captured component after the location path, and the query parameters:
            proxy_pass $upstream/api/$1$is_args$args;
        }
    }
}

While localhost is not a great example here, it works with your service's arbitrary DNS names, too. I find this very valuable in production, because having an nginx refuse to start because of a probably very unimportant service can be quite a hassle while wrangling a production issue. However, it makes the location directive much more complex. From a simple location /webapp/ with a proxy_pass http://localhost/api/ it has become this behemoth. I think it's worth it, though.

Better logging format for proxy_pass

To debug issues, or simply to have enough information at hand when investigating issues in the future, you can maximize the information about what is going on in your location that uses proxy_pass.

I found this handy log_format, which I enhanced with a custom variable $upstream, as we have defined above. If you always call your variables $upstream in all your locations that use proxy_pass, you can use this log_format and have often much needed information in your log:

log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: $upstream: $request upstream_response_time $upstream_response_time msec $msec request_time $request_time';

Here is a full example:

daemon off;
events {
}
http {
    log_format upstream_logging '[$time_local] $remote_addr - $remote_user - $server_name to: "$upstream": "$request" upstream_response_time $upstream_response_time msec $msec request_time $request_time';
    server {
        listen 8080;

        location /webapp/ {
            access_log /dev/stdout upstream_logging;
            set $upstream http://127.0.0.1:5000/api/;
            proxy_pass $upstream;
        }
    }
}

However, I have not found a way to log the actual URI that is forwarded to $upstream, which would be one of the most important things to know when debugging proxy_pass issues.

Conclusion

I hope that you have found helpful information in this article that you can put to good use in your development and production nginx configurations.

Top comments (30)

Peters Chikezie • Mar 6 '21 • Edited

hello, Daniel thanks for this post.

However, I am having trouble passing all paths from a request to the server.

example:

servername = 127.0.0.1;

location / {
proxy_pass 192.168.4.22:3000/;
}

I want all requests with dynamic paths coming from e.g 127.0.0.1/ be matched to the above location and passed to the proxy.

I get a 404 from the server when my URL looks like 127.0.0.1/foo/bar but when its just 127.0.0.1, it works fine.

is there something i am not doing right?

Daniel Albuschat • Mar 7 '21 • Edited

Hey Peters,

was it just a formatting issue in the comment, or are you missing the // in http://127.0.0.1 in the proxy_pass statement?

Peters Chikezie • Mar 7 '21

Hi Daniel,

It's a formating issue.

Daniel Albuschat • Mar 7 '21

Hm... looks correct, though. Maybe the error lies elsewhere?

Peters Chikezie • Mar 7 '21

Any idea where to look?

I have two sites enabled. The default nginx site running on port 80 and my API site running on a different port.

Daniel Albuschat • Mar 7 '21

Oh, then you forgot to provide the correct port in the URL. What you posted uses the default port 80, so it uses the default nginx site. Use http://127.0.0.1:<you-port>/foo/bar

Peters Chikezie • Mar 7 '21

I actually have the correct port in the file. The one I posted here is just an example. The IP is different

Peters Chikezie • Mar 8 '21 • Edited

Hi Daniel,

Here is how the site looks

upstream backend {
 server 192.168.34.23:3000;
}
server {
        listen 5000 default_server;
        listen [::]:5000 default_server;

        root /var/www/html;
        index index.html index.htm index.nginx-debian.html;

        server_name http://example.com;

         location / {
                proxy_pass http://backend/;
                try_files $uri $uri/ =404;
        }
}

Peters Chikezie • Mar 8 '21

i am thinking, could the root there be the problem?

Peters Chikezie • Mar 8 '21

I have removed it and it's still the same thing.

Peters Chikezie • Mar 8 '21 • Edited

Hi again Daniel,

I have been able to solve it.

I had to comment this line in the location

location / {
                proxy_pass http://backend/;
                #try_files $uri $uri/ =404;
        }

Thanks for your time. It's really appreciated

Aditya Todkar • Oct 20 '22 • Edited

In my case I have react + node app + nginx setup and I want to run webflow website only on /blog/* routes. I have added location blog like this:

location /blog {
      proxy_pass https://myblog-blog.webflow.io;
}

But in that case react + node routes and /blog/* routes work fine. Only issue is /blog route gives 404. Any suggestions on what can be done to fix it?

Daniel Albuschat • Oct 20 '22

Heya Aditya!

Could you have a closer look at which server gives you the 404? i.e. does the server logs of myblog-blog.webflow.io show an incoming request that is answered with 404, or is nginx already giving the 404? Maybe you can tell from how the 404 page looks like.

Aditya Todkar • Oct 20 '22

404 page of webflow is displayed. Do you think this university.webflow.com/lesson/href... can help in my case? I am currently not using paid plan so cannot use href-prefix. So I wanted to know if there is something which I can do in nginx config file.

Gustavo Pinsard • Dec 12 '22

Daniel,
I ended up here by chance. However, I was so impressed by the good work you've done in organizing this article that I decided to join dev.io - so I could post this and hopefully contribute back to the community.
This article is both useful and inspiring. Thanks!

Carlos Trapet • Aug 21 '19

"with no plausible reason or experience to back this decision, just because it seems to be the most used tool currently" - we all do, mate. Props for being honest though!

Really useful article, I wish I'd had this when I started my first job.
Back then I just assumed nginx was magic.

sansnom • Nov 12 '19

I was completely clueless why the proxy_pass wasn't working with the "set directive". Thanks for sharing. :)

In fact, the "set directive" is also the only way to force nginx to resolve the domain name again (using the parameters defined in the resolver). Otherwise, it will be cached forever...

Wojciech Sielski • Aug 22 '19

"is that nginx is very picky about hostname resolution. For some unexplainable reason, nginx will try to resolve all hosts defined in proxy_pass directives on startup"

In some particular use-cases - especially when using static configuration - this is THE main reason why nginx is not production ready. One temporarily unresolved backend host can cause nginx farm not restartable. This is exact opposite behavior from very disliked nowadays: Apache httpd.

From the other hand for dynamic generated configuration this is a good tool, but has many good successors too: traefik, istio, linkerd, fabio...

There is another piece: missing a global configuration ProxyPreserveHost Off - you cannot do that globally in nginx. You need to specify it per each proxy_paas

using proxy_set_header Host foo.bar. That makes config very unreadable.

Elen Sim • Mar 2

Hey Daniel!
Could you please let me know how we can get in touch with you to clarify the details of a potential collaboration? Our product is Nodemaven, and we would like to be included in your article.

coxse • Sep 2 '21

Hi Daniel,

With regard to the actual URI that is forwarded, is it not $host that you need?

In the attached picture the yellow highlight is $host and the blue is the upstream $upstream.

Anyway, thanks for the post it really helped setting up our nginx proxy with decent logging.

Adrian

Mike Green • Mar 7 '24

Hi - found this very useful, and have a question around the $upstream variable in the log format. Using nginx 1.24.0.
I couldn't get $server_name to: "$upstream_addr": to work - it barked with nginx: [emerg] unknown "upstream" variable

Based on the variables here - nginx.org/en/docs/stream/ngx_strea..., I changed it to $upstream_addr and it worked.

Entire log:

log_format upstream_logging '[$time_local] $remote_addr - $remote_user - '
      '$server_name to: "$upstream_addr": "$request" upstream_response_time '
      '$upstream_response_time msec $msec request_time $request_time';

Anyone know if this is a variable change or I misunderstood what $upstream in the log format was?

Xavier • Feb 14 '24

Hi Daniel,

First of all, thanks for the post.

I am facing the same issue with nginx does not start when not all upstream hosts are available.
I was trying to reverse proxy the

public.domain.com/mydomain/dev06/api ===>

aws-mydomain-dev06.aws.org/api/

so I changed it to following,

location ~ ^/mydomain/dev06/(.*)$ {
    resolver 10.43.83.2;

    set $upstream https://aws-mydomain-dev06.aws.org/; 

    access_log  /var/log/nginx/access.dev06.log upstream_logging;
    error_log  /var/log/nginx/error.dev06.log debug;

    proxy_pass $upstream/$1$is_args$args;
 }

it returns 404, not found

using string proxy_pass aws-mydomain-dev06.aws.org/; works fine

Is there something I am not doing right? Or where can I have a look to fine out more?

Regards