So our mailman got stuck again, but I did follow your advise. Here's what I've found:
On 3/29/19 2:58 PM, Mark Sapiro wrote:
On 3/29/19 2:28 PM, Dmitry Makovey wrote:
I've had encountered this situation now twice: mail is being stuck in "out" queue and nothing in the logs indicates the issue. Is there a way to ask mailman-core politely to attempt delivery (perhaps with increased debugging in the logs) *without* restarting the core (which does solve *this* problem but creates other problems, like availability of the mailman as a whole)
Is the "out" runner running?
yes:
# docker top mailman-core | grep -F out: 100 21681 13215 0 Mar28 ? 01:14:16 /usr/local/bin/python /usr/local/bin/runner --runner=out:0:1 -C /etc/mailman.cfg
ps -fwwA |grep runner=out
If not, you should be able to start it as the Mailman user with a command like
/path/to/python3 /path/to/mailman/bin/runner -C /path/to/mailman.cfg --runner=out:0:1
See 'ps -fwwA|grep runner' for examples
If the out runner is running, is there a file named *.bak along with all the *.pck files in queue/out/
there is one, yes:
# ls -l /srv/mail/mailman/core/var/queue/out/*.bak -rw-rw---- 1 100 65533 21512 Apr 14 19:45 /srv/mail/mailman/core/var/queue/out/1555296331.8386078+2c905825b82708e5a682b872b0aff320c9dd0b2e.bak
what does it indicate and what should my steps be to resolve this (other than restarting queues)?
And if it's not, is there anything in Mailman's logs to indicate why it died?
Also, 'mailman restart' should only take a matter of seconds, but if the out runner is running but not processing its queue, I think sending it a SIGINT should restart it.
SIGINT killed the process allright, but it didn't auto-respawn. I ended up spawning it with:
/usr/local/bin/python /usr/local/bin/runner --runner=out:0:1 -C /etc/mailman.cfg
but soon realized it doesn't go to background nicely so I've nohup'ed it in the end. but it did solve my problem.
Sr System and DevOps Engineer SoM IRT