NGINX reverse proxy metrics to monitor

NGINX is one of the most popular web servers nowadays, especially for Linux web servers. According to nginx.com, it powers over 400 million websites. However, it is probably more commonly used as a reverse proxy. Since it acts as the go-between between your application and your users, it is important to properly monitor NGINX metrics. NGINX comes free with Linux. Sometimes you may find yourself trying to figure out some performance degradation of the application, whereas a problem may come from NGINX itself. In this post, we will discuss the most important NGINX metrics to monitor when used as a reverse proxy.

Monitoring can be overwhelming

A common approach to monitoring is to set up a device with some built-in dashboards and add even more customized information to it. However, this easily leads to too much irrelevant information. Having two big screens with all the metrics doesn’t give you a better overview. It gives you a headache. This will make it harder to trace the cause of the issues.

On the other hand, if you choose only those metrics that really tell you something important, you will benefit on several fronts. You will save time because it will be easier to make quick connections between matrices A and B. You’ll save money because fewer metrics means less storage and network traffic is needed. Plus, the easier it is for engineers to understand what they’re looking for, the quicker they’ll be able to figure out what’s causing the problem and, ultimately, fix it.

But which metrics are “important”?

It all depends who you ask. Different metrics will be useful for application developers, different for operations teams and different for product owners/technical leads.

application developers

First of all, yes, developers should definitely have access to NGINX metrics. A common misconception is that it is only relevant to the ops team. So, what can be important to developers?

  • $upstream_response_time: The first, and perhaps most important, is the metric that describes how long it took the application server to respond to the request. Why is this important for developers? Because the response time will be directly related to the quality of the code and libraries used. With easy access to this metric, developers can improve application performance.
  • $request_uri , $status: These two metrics can give a developer early insight into any issues that may arise from incorrect routing or authentication issues. These two metrics help to find it whenever, for example, some product may be missing on the website (URL of the product + 404 code). Whenever a bad code occurs, these two metrics help identify which request causes an Internal Server Error (500).
  • $request_uri , $upstream_cache_status: If NGINX is used as a cache as well, developers would benefit from taking a look at these two metrics. They will tell if there is any HIT or MISS on the cache of a particular URL. It is important to make the most of the cache. An improperly configured cache header can lead to too many MISS requests on the cache. Having access to these, developers can easily customize the headers sent by their applications to increase the amount of cache HITs.

SRE/Operations Team

Ops teams will benefit most from a different set of metrics that developers may find useful. They need to focus more on NGINX itself. Therefore, NGINX performance- and security-related metrics that are not relevant to developers will be important here. Let’s discuss something:

  • $connections_active: First, this simple metric provides the total number of active connections. This is important for performance assessment. Based on the average and peak number of active connections, ops teams can estimate the resources required for NGINX. Also based on those metrics, they may see sudden spikes in traffic and increase NGINX instances accordingly.
  • $request_length: It provides the full request size (in bytes). This is important for computing the overall bandwidth and is essential for proper network size.
  • $request_time in relation with $upstream_response_time: Both of these can quickly tell whether NGINX or an upstream application is causing poor performance. If there is a big difference between the two, it means that NGINX is having trouble or there is some misconfiguration.
  • $upstream_connect_time: Indicates how much time NGINX spent establishing a connection with the upstream server. It indicates how stable the connection between NGINX and the proxy server is.

Tech Lead/Product Owner

Tech leads/product owners don’t need to know what the ops team needs to know. They need to focus on what is important to the business, which relies heavily on customers. Therefore, for product owners, metrics that tell them something about customers are more valuable:

  • $http_user_agent: This identifies a specific browser/device. It is important to get an overview of what equipment is used. If 90% of the traffic is coming from mobile devices, then as a product owner, you should focus on mobile features and make the website responsive. If almost no one is using, for example, the Opera browser, then you know that you can give low priority to fixing Opera-related issues.
  • $status: A quick look at HTTP status metrics can give a technical lead a general overview on the health of an application. If 500 errors out as 200, you know something is wrong.

Different requirements for different environments

What about different environments? Should you use the same metric for all environments? Well no, you shouldn’t. You can monitor more metrics on the dev environment, and this will allow you to better tune the configuration and understand what is relevant. If you think that metrics X and Y are important, but you never see them, you know you can leave them in a production environment. Some metrics can also be omitted in the development environment. For example, if you do not use the NGINX caching mechanism in development environments (which is a common practice), you do not need to cache related metrics.

How to Monitor NGINX on Linux

Now that you know what to monitor, let’s discuss how to monitor it.

nginx_status

Most basic solution. It’s built into NGINX, so you don’t need to install anything extra. You only need to enable it in the configuration. Nginx_status Will provide you with a very simple page out of the box, showing only very basic information about the number of connections. Then why are we mentioning it? Because most of the tools needed to monitor NGINX nginx_status To be able, and sometimes quickly, is beneficial curl Get your NGINX examples and basics.

nginx prometheus exporter

NGINX Prometheus Exporter is a more advanced solution, but also much more difficult. This is valuable when you are already using Prometheus to monitor the rest of your infrastructure. It provides you with NGINX-specific metrics.

app optics

If you don’t feel like installing the entire Prometheus stack yourself, and nginx_status Just isn’t enough for you, tools like orion app optics Can provide you ready-to-use solution. AppOptics is not limited to NGINX. It can monitor your entire infrastructure which includes Linux And windows environment, but adapted to seamless integration with NGINX, so you won’t need to spend hours getting started. But don’t take my word for it-click Here To start a free trial.

OK, I have all the metrics I need. now what?

So, you have chosen the appropriate metrics, and you have chosen a tool to monitor your NGINX on the Ubuntu Linux operating system. The next step is to really understand your data. Choosing only valid metrics can help you stay focused, but it won’t solve all your problems on its own. You still need to correlate them yourself. For example, if you see a sudden increase in 500 HTTP errors, it doesn’t necessarily mean that something is wrong. Check your overall traffic. If there is a spike in traffic, there will be a spike in 500 as well.

Correlating NGINX metrics with other data

The metrics described here can help you get the most important information about your NGINX, but since NGINX doesn’t do much on its own in reverse proxy mode, you’ll need to correlate metrics with the rest of the system to get a complete overview. need to. , You should compare NGINX metrics with application and infrastructure metrics. For example, if you see long response times from upstream servers, you should look at application-specific metrics to find out why the upstream responds slowly. If you see some performance degradation, you should look at the underlying infrastructure metrics.

summary

Choosing only relevant metrics to watch helps you focus on what’s really important. If you suddenly see a spike on 20 different metrics, you won’t quickly understand where the spike is coming from. As a developer, you don’t want to bother with the details and performance counters of NGINX yourself. You want to focus on application-related metrics. Similarly, as a team lead, you will get more valuable information from user related information than from infrastructure details. Choosing metrics that help you better understand what’s going on ultimately helps save you time and money.

As far as monitoring solutions are concerned, there are not many NGINX-specific tools, but many generic monitoring tools can also monitor NGINX. However, you should consider one that can provide you with important information without bringing in too many irrelevant numbers and graphs. orion app optics NGINX is optimized for monitoring Linux or Windows operating systems and the entire application stack at the same time. The more data you feed into it, the more correlations it will provide you.

This post was written by David Ziolkowski. David Have 10 years of experience as Network/Systems Engineer in the beginning, more recently DevOps, Cloud Native Engineer. He has worked for an IT outsourcing company, a research institute, a telco, a hosting company and a consulting company, so he has gathered a lot of knowledge from different perspectives. Nowadays he’s helping companies move to the cloud and/or redesign their infrastructure for a more cloud-native approach.

Leave a Comment