On March 16, the new versions of php, namely v7.0.17 and v7.1.3, were released. This is good news for many php developers, but for us it has an extra meaning, because our patch has been accepted there. The patch that fixes ‘keep-alive’ connections in php-fpm sapi.

Php-fpm first appeared as a separate patch for php 5.2, adding a manager for fastcgi processes, that allows you to organize individual pools, monitors the execution times of workflows and much more. In the php 5.4 branch it was accepted as an official sapi, which mean we didn’t need to apply this patch every time a new bugfix version of php arrived.

Php-fpm has a pretty nice feature to use one tcp connection for several consecutive fastcgi requests, the so-called ‘keepalive’. The same feature is present in nginx, which we use as frontend http server and, accordingly as our fastcgi client. Using keepalive shortens the time to establish new tcp connections and eliminates bunch of TIME-WAIT entries in tcp table.

In 2015 we decided to use this feature, sowWe set it up in a test environment and tested its functionality. After we were satisfied, we put it into production use. However, we quickly discovered strange entries in the php-fpm-slow.log, which logically should not be there and corresponding entries in php-fpm.log that work processes were being killed due to long execution time. At first, we thought, that everything is fine, because we have a nginx backup server in upstream, so anyway, each http request would not remain without response. However we quickly found out that the response header had already been sent, therefore the process was killed in the middle and users would only receive half of the page.

After half a day of researching the problem, reading the code and observing what was happening in gdb, we found a funny thing. During keepalive at the beginning of a new fastcgi request, the counter of spent time was not reset to zero. I submitted a bug-report, offered a patch for it and sent a pull request to the github repo. The pull-request was on the branches for version 5.5 and 5.6, which we used at that time.

After a little while, the maintainers responded with a refusal on my pull-request, because these branches were already frozen. Only  security changes were accepted. However after 2 years the maintainers finally reacted and proposed to make the changes to branches 7.0 and 7.1. At first I thought that everything was already “fixed ahead of us”, but there were other users who confirmed the existence of the problem. Luckily the fix, as before, was just a few lines.

Now our nginx configuration looks something like this:

http {
  ...
  upstream main {
  server 10.20.30.10:9001 max_fails=10 fail_timeout=10s;
  server 10.20.30.20:9001 max_fails=10 fail_timeout=10s;
  server 10.20.30.30:9001 max_fails=10 fail_timeout=10s backup;
  keepalive 32;
}

server {
  ...
  location ~ (\.php)$ {
    fastcgi_pass   main;
    fastcgi_keep_conn on;
    ...
  }
}

TL; DR: if you used keep-alive in fastcgi and experienced discomfort, or if you were afraid to use it earlier, now, with the release of new versions of php, it’s time to start doing it, it will save you excedrin and server resources.