If your Linux server is down, your first step may often be to use the top command to check the load average in Terminal. However, there are times when
top Shows very high load average even with low CPU ‘us’ (user) and high CPU ‘id’ (idle) percentages. This is the case in the video below, on a server with 24 cores the load average is above 30, but the CPU shows about 70 percent idle. One of the common causes of this situation is a disk I/O bottleneck.
What is an I/O wait interrupt?
Storage I/O is input/output (or write/read) operations on a physical disk (or other storage, for example, disk or SSD). If the CPU needs to wait on disk to read or write data, requests that involve disk I/O can be greatly slowed down. I/O wait is the percentage of time the CPU has to wait on storage.
Let’s see how we can confirm that disk I/O is slowing down application performance using some Terminal command-line tools (top, atop and iotop) on a LAMP-installed web server.
Using top command – load average and wa (wait time)
As in the video above, when you enter
top, you’ll first look at the top-right to check the load average. Which, in this case, is too much and thus indicates a plethora of requests. Next, we’ll look at the most likely CPU and Mem horizontal lines, followed by the %CPU and %MEM columns, to see which processes use the most resources.
top, you might also want to see ‘wa’ (see video above), it should be 0.0% almost all the time. values Continuously More than 1% may indicate that your storage device is too slow to handle incoming requests. Note in the video that the initial price average is about 6% of the wait time. However, this averages out to 24 cores, some of which are not active because the CPU cores themselves are not close to capacity. So we must expand the view by pressing ‘1’ on your keyboard to see the ‘Wa’ time for each CPU core when in use. As per the screenshot above, there are 24 cores when expanded from 0 to 23. Once this is done, we see that the ‘%wa’ time is as high as 60% for some CPU cores! So we know there is a hindrance, a great hindrance. Next, let’s confirm this disk bottleneck.
Using the ATOP command to monitor DSK (storage) I/O statistics
atopAfter this, we see that the storage device is 90 to 100 percent busy. This is a serious bottleneck. The effect is that requests block until disk I/O catches on. when in
atoppress ‘D’ To see the processes and PIDs that are using disk I/O. Here we see MySQL, Nginx, and PHP-FPM, which are essential processes and I need to write another article about reducing disk I/O on high traffic L*MP servers. In short, be aware that the access and error logs of Nginx (or Apache), MySQL, and PHP-FPM are not set up to be written to disk very frequently and that you may want to store cache (eg, Nginx cache) in disk. also want to avoid. High concurrent traffic environment. In addition to LEMP services, also notice ‘flush-8:0’ (detected to be a PHP cache issue) and JBD2/SDA5-8 (detected for access/kernel logs) with their PIDs.
On this server, I performed a quick SSD benchmark after stopping the services and noticed that the disk performance was abysmal. result: 1073741824 bytes (1.1 GB) copied, 46.0156 sec, 23.3 MB/s, So although reads/writes can be reduced, the real problem here is extremely slow disk I/O. This client’s web host provider denied this and said instead that MySQL was the problem as it often grows in size and suffers OOM kill, In contrast, MySQL’s increase in memory usage was a symptom of disk I/O blocking the timely return of MySQL queries, and with MySQL’s my.cnf max_connections setting being too high (2000) on this server, it This also meant that connections and queries to MySQL would pile up and outweigh the available server RAM. for all services, Rising to the point where the Linux kernel OOM will kill MySQL. Considering MySQL’s maximum allocated memory is multiplied by the size of per-thread buffers by that ‘max_connections=2000’ setting, this left PHP-FPM with little free memory as it was waiting for MySQL < on disk. Also stacks up existing connections. But with MySQL being the largest process, the Linux kernel chooses to kill MySQL first.
Using the IOTOP Command for Real-Time Insights on Disk Reads/Writes
iotop Views I/O usage information output by the Linux kernel and displays a table of current I/O usage by processes or threads on the system. I used command:
iotop -oPa, Here’s an explanation of those options. -This, -In college (Instead of showing all processes or threads show only those processes or threads that are actually doing I/O. This can be toggled dynamically by pressing o .) -P, -processes (Show only processes. Normally iotop shows all threads.) -A, -accumulated (Show accumulated I/O instead of bandwidth. In this mode, iotop shows the amount of I/O processes that have occurred since iotop was started.)
Look for the ‘Disk Write’ column; These are not huge figures. At the rate they increase, a fairly average speed storage device will not be busy with some kernel logging and disk cache. But at <25MB/s write speeds (and more memory), disk IO is maximized by regular disk usage from the nginx cache, kernel logs, access logs, etc. The fix here was to replace the storage one with a better display. Device, something with faster speed than SD card.
Of course, MySQL should not be allowed to make more connections to the server than it is capable of serving. Also, PHP-FPM’s solution to throttle incoming traffic by lowering pm.max_children should be avoided or only temporary as it means refusing web traffic (basically moving the bottleneck location). Thankfully, the above case of a storage device being so slow is not common with most hosting providers. If you have a disk with average I/O, you can also use Varnish Cache or other caching methods, but these will only work as a shield if they are completely prime. If you have enough server memory, always opt to store everything there first.
I hope this short article was useful. Feel free to leave suggestions, feedback and/or tools below or contact me directly. Also, take a look at this list of the top 50 APM tools.
Results from a quick dd disk write benchmark on a small StackLinux SSD VPS: [root@host ~]# dd if=/dev/zero of=diskbench bs=1M count=1024 conv=fdatasync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 0.751188 s, 1.4 GB/s
See also: Your Web Host Doesn’t Want You to Read This: Benchmark Your VPS.
Published: July 9, 2017
Last Updated: June 23, 2021