On 2/24/22 14:01, dancab@caltech.edu wrote:
Only mail delivery was hung overnight. I had to restart the Gunicorn process followed by a Mailman core restart for mail to flow again.
Because Mailman's "out" runner (/opt/mailmanve/lib/python3.9/site-packages/mailman/runners/outgoing.py from the above traceback) is what hit the exception and died. The master watcher should restart it, but see https://gitlab.com/mailman/mailman/-/issues/898
Any insight into how we could monitor this in the future? For example which specific "runner" reported the error above?
Something like
ps --ppid=`cat /opt/mailman/mm/var/master.pid|tr -d '\n'`|wc -l
will report the number of child processes of the master. You could test that against the expected number and do something.
Note that mailman restart
probably won't start a runner that died -
you need to do mailman stop;mailman start
to be sure.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan