Strange issue with bounce processing
Dear List,
First of all Merry Christmas to those of you that are observing it.
I have an issue on one of my Mailman 3 servers. The bounce runner is not running, even after restarting mailman core. All the other runners are working fine. Here is the problem, after restarting it, a number of messages show up in queue/bounces from a single list. However the bounce runner is still not running. So I remove these messages from queue/bounces folder and restart mailman core. The same messages show up from the same single list again in queue/bounces. In the var/logs/bounce.log file, addresses from this list are being logged as being removed after starting mailman core. They correlate with the messages showing up in queue/bounces.
I assume these "cached" bounces are what is interfering with the running of the bounce runner. I don't see anything in the mailman or mailmansuite log that would explain this behavior. How can I fix this?
Thank you for any assistance.
-- Brian Carpenter Harmonylists.com Emwd.com
On 12/25/20 6:36 PM, Brian Carpenter wrote:
Dear List,
First of all Merry Christmas to those of you that are observing it.
I have an issue on one of my Mailman 3 servers. The bounce runner is not running, even after restarting mailman core. All the other runners are working fine. Here is the problem, after restarting it, a number of messages show up in queue/bounces from a single list.
What happens if instead of restating core, you stop and then start it?
However the bounce runner is still not running. So I remove these messages from queue/bounces folder and restart mailman core. The same messages show up from the same single list again in queue/bounces. In the var/logs/bounce.log file, addresses from this list are being logged as being removed after starting mailman core. They correlate with the messages showing up in queue/bounces.
Bounce runner doesn't actually do much with received/queued bounces. It just gets the bounce and stores a bounce event in in the bounceevent table in the database. It then periodically processes the unprocessed events in the bounceevent table. See <https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/model/docs/bounce.html>.
I assume these "cached" bounces are what is interfering with the running of the bounce runner. I don't see anything in the mailman or mailmansuite log that would explain this behavior. How can I fix this?
I don't really understand what's going on. The only messages which get queued in queue/bounces/ are messages to the LIST-bounces address delivered from the MTA, unless possibly they are somehow in queue/retry/ and being retried. What do you see in the MTA logs?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 12/25/20 10:07 PM, Mark Sapiro wrote:
On 12/25/20 6:36 PM, Brian Carpenter wrote:
Dear List,
First of all Merry Christmas to those of you that are observing it.
I have an issue on one of my Mailman 3 servers. The bounce runner is not running, even after restarting mailman core. All the other runners are working fine. Here is the problem, after restarting it, a number of messages show up in queue/bounces from a single list.
What happens if instead of restating core, you stop and then start it?
Same behavior.
I don't really understand what's going on. The only messages which get queued in queue/bounces/ are messages to the LIST-bounces address delivered from the MTA, unless possibly they are somehow in queue/retry/ and being retried. What do you see in the MTA logs?
The mail logs shows bounce notifications being sent out as per the list settings.
--
Brian Carpenter Harmonylists.com Emwd.com
I have a somewhat related question.
I monitor the Mailman runner processes via cron and restart based on a drop in the number of processes. As a result I've noticed that the bounces runner process becomes defunct at varying times. For example, the cron job restarted the Mailman processes early this morning and I can see the bounce runner process is currently defunct (Z state). It appears to have become defunct after about 3 hours of CPU time.
From mailman.log: Sep 12 02:17:50 2023 (23350) bounces runner started.
From ps command: mailman 23350 0.6 0.0 0 0 ? Z 02:17 3:14 [python3] <defunct>
I don't see anything else in the logs that would indicate why the process is dying. Any ideas?
Thanks in advance.
On 9/12/23 10:16 AM, Dan Caballero wrote:
I have a somewhat related question.
I monitor the Mailman runner processes via cron and restart based on a drop in the number of processes. As a result I've noticed that the bounces runner process becomes defunct at varying times. For example, the cron job restarted the Mailman processes early this morning and I can see the bounce runner process is currently defunct (Z state). It appears to have become defunct after about 3 hours of CPU time.
There are some relevant issues on this, some of which are fixed. See https://gitlab.com/mailman/mailman/-/issues/?state=all&search=bounce%20runner&first_page_size=100
When the runner dies, the master watcher should restart it. See https://gitlab.com/mailman/mailman/-/issues/898, fixed (maybe) by https://gitlab.com/mailman/mailman/-/merge_requests/1094 which will be in Mailman 3.3.9. However, this doesn't address why it dies in the first place.
You might find some info about the death in syslog.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks for the quick reply. Is there an ETA for Mailman 3.3.9 release?
We run Mailman in a container so I'll see about enabling some more logging.
On 9/12/23 2:13 PM, Dan Caballero wrote:
Thanks for the quick reply. Is there an ETA for Mailman 3.3.9 release?
Abhilash has been working on a release, so it should be soon.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
Brian Carpenter
-
Dan Caballero
-
Mark Sapiro