On 7/14/20 3:54 AM, Brian Carpenter wrote:
On 7/14/20 6:49 AM, Gilles Filippini wrote:
Mailman Core logs: [2020-07-14 10:41:21 +0000] [36] [CRITICAL] WORKER TIMEOUT (pid:49) [2020-07-14 10:41:21 +0000] [49] [INFO] Worker exiting (pid: 49) [2020-07-14 10:41:22 +0000] [50] [INFO] Booting worker with pid: 50
I am pretty sure that timeout is related to gunicorn. If it is, then increasing the worker out timeout setting for gunicorn should work. You tend to get such errors when doing an export of a large membership roster. This is the first time I have seen it related to a held message.
It seems weird that a gunicorn worker would run for more that the default 30 seconds because of a 23MB message, but I think Brian is correct.
I have this in my setup:
mailman.cfg:
[webservice] configuration: /opt/mailman/mm/gunicorn.cfg
gunicorn.cfg:
[gunicorn] workers = 4 timeout = 900
You will need to restart gunicorn after adjusting the timeout setting.
It's confusing, but after adjusting that setting, you need to restart Mailman core, not gunicorn.
If you are running a gunicorn service, that is what is providing WSGI support for Django (Postorius and/or HyperKitty). You may be using gunicorn for this or uWSGI or Apache mod_wsgi, but is any case, that is not the gunicorn we are considering here.
Regardless of what you use to provide WSGI support for Django, Mailman core uses gunicorn to support the REST API, and that is the gunicorn affected by the settings in the gunicorn.cfg pointed to by
[webservice] configuration: /opt/mailman/mm/gunicorn.cfg
If, with the above configuration, you do ps -fwwA|grep runner=rest
you
will see something like:
mailman 20582 20561 0 01:21 ? 00:00:16 /opt/mailman/mm/venv/bin/python /opt/mailman/mm/venv/bin/runner -C /opt/mailman/mm/deployment/mailman.cfg --runner=rest:0:1 mailman 20722 20582 0 01:21 ? 00:01:19 /opt/mailman/mm/venv/bin/python /opt/mailman/mm/venv/bin/runner -C /opt/mailman/mm/deployment/mailman.cfg --runner=rest:0:1 mailman 20725 20582 0 01:21 ? 00:01:18 /opt/mailman/mm/venv/bin/python /opt/mailman/mm/venv/bin/runner -C /opt/mailman/mm/deployment/mailman.cfg --runner=rest:0:1 mailman 20726 20582 0 01:21 ? 00:01:18 /opt/mailman/mm/venv/bin/python /opt/mailman/mm/venv/bin/runner -C /opt/mailman/mm/deployment/mailman.cfg --runner=rest:0:1 mailman 20727 20582 0 01:21 ? 00:01:50 /opt/mailman/mm/venv/bin/python /opt/mailman/mm/venv/bin/runner -C /opt/mailman/mm/deployment/mailman.cfg --runner=rest:0:1
The first of these, pid 20582, is Mailman's actual REST runner. The other 4 with parent pid 20582 are the 4 gunicorn worker processes forked from the REST runner.
It's the REST runner or Mailman core that needs to be restarted to pick up that change.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan