This was a tricky problem. A retail ecommerce site was having performance problems during the holiday season on a linux instance at AWS. The server itself was beefy enough – 8 vcpus and 32 GB of memory. Also, plenty of storage – 300 GB SSD disk with provisioned IOPS at the right scale to disk size. The server kept throwing errors as if it were running out of disk space, but plenty of space was left. Also, new httpd processes wouldn’t spawn as then load increased.
After much head scratching and research (thanks, StackOverflow), the problem serving http requests I traced to the apache config file. Even though the server had been scaled over the years to meet increasing demand, the apache configuration file was unchanged. When I looked, the apache server would only allow 255 active connections at any given time. Once it hit that limit, it would queue subsequent requests. Once I modified the config to allow up to 1000 threads for apache, that problem was solved.
The next problem with the disk throwing out of space errors was new to me. It took a while to sort this one out. I traced the problem to the number of inodes in use. The issue here was the OS had run out of file handles to hand out to the application layer, regardless of the disk usage. This could only happen if there were millions of small files in use that hadn’t been purged.
Again, found the problem in Apache. The mod_cache plug-in was configured to cache every file being served for scores of different product websites. Also, there was never any cleanup scheduled, so years of different websites had generated cached files of a few thousand bytes in the milions. Once I disabled mod_cache, and started deleting the old cached files, that did the trick.
https://stackoverflow.com/questions/3389496/how-do-you-increase-the-max-number-of-concurrent-connections-in-apache
https://stackoverflow.com/questions/653096/how-to-free-inode-usage
Leave a Comment