Held messages not delivered after approval

Hi all,
I'm running into an issue with Mailman 3. Mails sent to a list are being held due to moderation settings — which is expected — but after approving them, they are not delivered to list members.
The message does not end up in the shunt queue, and nothing is delivered.
Here's what I see in the Mailman log:
Jul 30 16:12:23 2025 (213300) HOLD: mm@lists.example.com post from sender@sub.example.com held, message-id=<20250730141221.5VVrn%sender@sub.example.com>: The message is not from a list member Jul 30 16:12:39 2025 (213338) held message approved, message-id: <20250730141221.5VVrn%sender@sub.example.com>
In the mail log, I only see the message arriving — there's no sign of it being sent out.
Any ideas on what could be going wrong or where else I should look?
Mailman Core-Version GNU Mailman 3.3.10 (Tom Sawyer) Mailman Core API-Version 3.1 Mailman Core Python-Version 3.11.11 (main, Jun 20 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-5)]
OS: Rocky 9 with Postfix MTA, installed as venv
Thanks in advance Stephan

On Wed, Jul 30, 2025 at 5:28 PM Stephan Krinetzki < krinetzki@itc.rwth-aachen.de> wrote:
Hi all,
I'm running into an issue with Mailman 3. Mails sent to a list are being held due to moderation settings — which is expected — but after approving them, they are not delivered to list members.
The message does not end up in the shunt queue, and nothing is delivered.
Here's what I see in the Mailman log:
Jul 30 16:12:23 2025 (213300) HOLD: mm@lists.example.com post from sender@sub.example.com held, message-id=< 20250730141221.5VVrn%sender@sub.example.com>: The message is not from a list member Jul 30 16:12:39 2025 (213338) held message approved, message-id: < 20250730141221.5VVrn%sender@sub.example.com>
In the mail log, I only see the message arriving — there's no sign of it being sent out.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This here does not make sense. the queue?
- You approve/UNHOLD the message - what then do you see in /opt/mailman/mm/var/logs/smtp.log?
- The mail is handed over to your MTA, and you see evidence of that in mail.log, but it is not sent out. Doesn't that mean you have the mails in
Any ideas on what could be going wrong or where else I should look?
What is the output from running the mailq
command?
If the post has not reached the subscribers, then I suppose the emails are
stuck in the Postfix queue for some reason.
Mailman Core-Version GNU Mailman 3.3.10 (Tom Sawyer) Mailman Core API-Version 3.1 Mailman Core Python-Version 3.11.11 (main, Jun 20 2025, 00:00:00) [GCC 11.5.0 20240719 (Red Hat 11.5.0-5)]
OS: Rocky 9 with Postfix MTA, installed as venv
Do you know if the list has been working, or is this a new install?
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 In an Internet failure case, the #1 suspect is a constant: DNS. "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html]

On 7/30/25 07:27, Stephan Krinetzki wrote:
Hi all,
I'm running into an issue with Mailman 3. Mails sent to a list are being held due to moderation settings — which is expected — but after approving them, they are not delivered to list members. Are there any regular members with delivery enabled?
mailman members --regular --nomail enabled LISTSPEC
The message does not end up in the shunt queue, and nothing is delivered.
Here's what I see in the Mailman log:
Jul 30 16:12:23 2025 (213300) HOLD: mm@lists.example.com post from sender@sub.example.com held, message-id=<20250730141221.5VVrn%sender@sub.example.com>: The message is not from a list member Jul 30 16:12:39 2025 (213338) held message approved, message-id: <20250730141221.5VVrn%sender@sub.example.com>
This is expected. Is there anything relevant in Mailman's smtp.log?
Do any posts to this list get delivered, e.g., posts which aren't held.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi,
After accepting the message the smtp.log doesn't show any sign of further processing. Other mailinglists on the same host are doing fine. Even a 1:1 copy of the list works - only the original list does not work. Evene after deleting and recreating the list the error persists. The maillog oft the host only shows the incoming mail, but not the outgoing. The mail seems to vanish after accepting.
mailman members --regular --nomail enabled LISTSPEC
Shows a lot of addresses - so all the members of the list don't recieve an mail, which explains the vanished mail. Next step: Enable the the mailing for them.

Well..after checking the API, there are no Users with nomail. So the mail should be send.

Stephan Krinetzki writes:
Well..after checking the API, there are no Users with nomail. So the mail should be send.
- You mentioned checking the shunt queue. Have you checked the bad queue, and all the other queues for good measure?
- If archiving is enabled, are the messages being archived?
- Even though there's nothing in smtp.log, you should still check mailq for held mail (there may be related issues).
Mailman operates on the traditional "store and forward" principle. That is, unless an explicit decision is made to discard a message, Mailman will not discard the qfile until all destinations (usually the local MTA, archiver, and possibly news gateway) have accepted responsibility for further delivery. There's a pretty good chance that evidence of what happened is in either the Mailman or MTA queues.
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

My queues:
ls -al /opt/mailman/var/queue/* /opt/mailman/var/queue/archive: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/bad: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 10:18 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/bounces: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/command: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:17 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/digest: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:15 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/in: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/nntp: total 0 drwxrwx--- 2 mailman mailman 6 Jun 27 2024 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/out: total 2868 drwxrwx--- 2 mailman mailman 4096 Jul 31 13:26 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 221708 Jul 30 13:16 1753874180.0845337+0cc7849043859a79dc3678a0d8b63c1c66df0c66.pck.tmp -rw-rw---- 1 mailman mailman 733425 Jul 31 00:00 1753912841.8847518+da55f8789ae41b75d19a02b595e3fd6d45983ade.pck.tmp -rw-rw---- 1 mailman mailman 17758 Jul 31 13:26 1753961185.0481033+6adf966467266567275f1146f5054c95e4365c13.pck -rw-rw---- 1 mailman mailman 31122 Jul 31 13:26 1753961185.1064982+b1f502e2af56b9b11680135c6de5fcc5285d967e.bak -rw-rw---- 1 mailman mailman 12945 Jul 31 13:26 1753961185.1562705+a88534cfa9b03012c248ac00cbb54f4fc08c8dcc.pck -rw-rw---- 1 mailman mailman 1407634 Jul 31 13:26 1753961185.2598634+96e501d7aa8e72fbb9f9a5c953d48fc2c7db2bc7.pck -rw-rw---- 1 mailman mailman 17570 Jul 31 13:26 1753961185.3654013+d693065f2b9a2338f887d3ac9c0669c957131100.pck -rw-rw---- 1 mailman mailman 60624 Jul 31 13:26 1753961185.4132354+8998ee9eb9a8a99faf1fcf045d604e669f5d0c9e.pck -rw-rw---- 1 mailman mailman 7361 Jul 31 13:26 1753961185.413462+50dcef9bb178f54ecb2d4707ccfe02773f92336c.pck -rw-rw---- 1 mailman mailman 84848 Jul 31 13:26 1753961185.4464717+17836b8875aa6643f7e5fb0744e72940ce5603c6.pck -rw-rw---- 1 mailman mailman 18003 Jul 31 13:26 1753961185.4929028+17e9c78a2721cbc344124c6e04a129217ea6c794.pck -rw-rw---- 1 mailman mailman 66982 Jul 31 13:26 1753961185.5228019+4ef59e61c6a79dd4cdee6d1110fb0582fcfafb6c.pck -rw-rw---- 1 mailman mailman 221723 Jul 31 13:26 1753961185.5756643+6de90ae99b3d528ce5234b78a287e19f7b6686fc.pck
/opt/mailman/var/queue/pipeline: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/retry: total 0 drwxrwx--- 2 mailman mailman 6 Feb 18 08:57 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/shunt: total 3304 drwxrwx--- 2 mailman mailman 4096 Jul 31 11:44 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 451 Jul 31 00:00 1753912822.2651796+28eceef7e18eb70393377b88dc7117af8f9362a0.pck -rw-rw---- 1 mailman mailman 490 Jul 31 00:00 1753912838.4197352+ea531cf0262c1faa58b1679b907fee92bc16822c.pck -rw-rw---- 1 mailman mailman 1407870 Jul 31 00:00 1753912841.9177196+ccea15bdefce3a54301281c8eddf86e8230244a6.pck -rw-rw---- 1 mailman mailman 86108 Jul 31 00:00 1753912841.9197443+7dcef4febc71e44c6d9309a24a08b08753e1ff42.pck -rw-rw---- 1 mailman mailman 1407668 Jul 31 00:00 1753912841.9849963+a3f1869b750060c97262ece38737480d91652828.pck -rw-rw---- 1 mailman mailman 38992 Jul 31 00:01 1753912860.5167956+940584c4f361cbd8c29e390b2f60590558effe40.pck -rw-rw---- 1 mailman mailman 440 Jul 31 00:01 1753912860.6972685+635c065bac8dff5f9d562275d707001d773b84c1.pck -rw-rw---- 1 mailman mailman 445 Jul 31 00:01 1753912868.7903054+befa066f254d7a3529a8555a6c942a554715d837.pck -rw-rw---- 1 mailman mailman 33494 Jul 31 00:01 1753912878.7562895+82b724fd93260ab9a2bb49709d3a42a2f32f2c80.pck -rw-rw---- 1 mailman mailman 217073 Jul 31 00:01 1753912878.9337828+de73a65d9c6febfa80275853921f4b53fd1d9e2a.pck -rw-rw---- 1 mailman mailman 85888 Jul 31 00:02 1753912950.303359+8a87bce0be63ac1df8493c6b1ad6ae154fcedba7.pck -rw-rw---- 1 mailman mailman 50244 Jul 31 00:02 1753912950.4970112+d44d493912bb3024547b8a5112f86f035dcb352f.pck -rw-rw---- 1 mailman mailman 12887 Jul 31 00:02 1753912970.038427+31bcbd7fb2ebdf81f6de24b7283b50bcda6ded21.pck.tmp -rw-rw---- 1 mailman mailman 443 Jul 31 11:44 1753955094.900898+67bc76525412da66a7c76363f65f583989716305.pck
/opt/mailman/var/queue/virgin: total 32 drwxrwx--- 2 mailman mailman 81 Jul 31 13:17 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 32013 Jan 11 2025 1736550035.6163204+472f81ece5e45a2651a4499bef418f611b43c619.pck.tmp
Nothing special there (shunt should be checked, but not in correlation with my mail).
I have one list which archives the mail. In the archive there is the mail, but it doesn't get deliverd
mailq is empty, so my postfix works as expected.
An other ideas?

On Thu, Jul 31, 2025 at 2:30 PM Stephan Krinetzki < krinetzki@itc.rwth-aachen.de> wrote:
My queues:
ls -al /opt/mailman/var/queue/* /opt/mailman/var/queue/archive: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/bad: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 10:18 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/bounces: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/command: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:17 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/digest: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:15 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/in: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/nntp: total 0 drwxrwx--- 2 mailman mailman 6 Jun 27 2024 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/out: total 2868 drwxrwx--- 2 mailman mailman 4096 Jul 31 13:26 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 221708 Jul 30 13:16 1753874180.0845337+0cc7849043859a79dc3678a0d8b63c1c66df0c66.pck.tmp -rw-rw---- 1 mailman mailman 733425 Jul 31 00:00 1753912841.8847518+da55f8789ae41b75d19a02b595e3fd6d45983ade.pck.tmp -rw-rw---- 1 mailman mailman 17758 Jul 31 13:26 1753961185.0481033+6adf966467266567275f1146f5054c95e4365c13.pck -rw-rw---- 1 mailman mailman 31122 Jul 31 13:26 1753961185.1064982+b1f502e2af56b9b11680135c6de5fcc5285d967e.bak -rw-rw---- 1 mailman mailman 12945 Jul 31 13:26 1753961185.1562705+a88534cfa9b03012c248ac00cbb54f4fc08c8dcc.pck -rw-rw---- 1 mailman mailman 1407634 Jul 31 13:26 1753961185.2598634+96e501d7aa8e72fbb9f9a5c953d48fc2c7db2bc7.pck -rw-rw---- 1 mailman mailman 17570 Jul 31 13:26 1753961185.3654013+d693065f2b9a2338f887d3ac9c0669c957131100.pck -rw-rw---- 1 mailman mailman 60624 Jul 31 13:26 1753961185.4132354+8998ee9eb9a8a99faf1fcf045d604e669f5d0c9e.pck -rw-rw---- 1 mailman mailman 7361 Jul 31 13:26 1753961185.413462+50dcef9bb178f54ecb2d4707ccfe02773f92336c.pck -rw-rw---- 1 mailman mailman 84848 Jul 31 13:26 1753961185.4464717+17836b8875aa6643f7e5fb0744e72940ce5603c6.pck -rw-rw---- 1 mailman mailman 18003 Jul 31 13:26 1753961185.4929028+17e9c78a2721cbc344124c6e04a129217ea6c794.pck -rw-rw---- 1 mailman mailman 66982 Jul 31 13:26 1753961185.5228019+4ef59e61c6a79dd4cdee6d1110fb0582fcfafb6c.pck -rw-rw---- 1 mailman mailman 221723 Jul 31 13:26 1753961185.5756643+6de90ae99b3d528ce5234b78a287e19f7b6686fc.pck
/opt/mailman/var/queue/pipeline: total 0 drwxrwx--- 2 mailman mailman 6 Jul 31 13:20 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/retry: total 0 drwxrwx--- 2 mailman mailman 6 Feb 18 08:57 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/shunt: total 3304 drwxrwx--- 2 mailman mailman 4096 Jul 31 11:44 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 451 Jul 31 00:00 1753912822.2651796+28eceef7e18eb70393377b88dc7117af8f9362a0.pck -rw-rw---- 1 mailman mailman 490 Jul 31 00:00 1753912838.4197352+ea531cf0262c1faa58b1679b907fee92bc16822c.pck -rw-rw---- 1 mailman mailman 1407870 Jul 31 00:00 1753912841.9177196+ccea15bdefce3a54301281c8eddf86e8230244a6.pck -rw-rw---- 1 mailman mailman 86108 Jul 31 00:00 1753912841.9197443+7dcef4febc71e44c6d9309a24a08b08753e1ff42.pck -rw-rw---- 1 mailman mailman 1407668 Jul 31 00:00 1753912841.9849963+a3f1869b750060c97262ece38737480d91652828.pck -rw-rw---- 1 mailman mailman 38992 Jul 31 00:01 1753912860.5167956+940584c4f361cbd8c29e390b2f60590558effe40.pck -rw-rw---- 1 mailman mailman 440 Jul 31 00:01 1753912860.6972685+635c065bac8dff5f9d562275d707001d773b84c1.pck -rw-rw---- 1 mailman mailman 445 Jul 31 00:01 1753912868.7903054+befa066f254d7a3529a8555a6c942a554715d837.pck -rw-rw---- 1 mailman mailman 33494 Jul 31 00:01 1753912878.7562895+82b724fd93260ab9a2bb49709d3a42a2f32f2c80.pck -rw-rw---- 1 mailman mailman 217073 Jul 31 00:01 1753912878.9337828+de73a65d9c6febfa80275853921f4b53fd1d9e2a.pck -rw-rw---- 1 mailman mailman 85888 Jul 31 00:02 1753912950.303359+8a87bce0be63ac1df8493c6b1ad6ae154fcedba7.pck -rw-rw---- 1 mailman mailman 50244 Jul 31 00:02 1753912950.4970112+d44d493912bb3024547b8a5112f86f035dcb352f.pck -rw-rw---- 1 mailman mailman 12887 Jul 31 00:02 1753912970.038427+31bcbd7fb2ebdf81f6de24b7283b50bcda6ded21.pck.tmp -rw-rw---- 1 mailman mailman 443 Jul 31 11:44 1753955094.900898+67bc76525412da66a7c76363f65f583989716305.pck
/opt/mailman/var/queue/virgin: total 32 drwxrwx--- 2 mailman mailman 81 Jul 31 13:17 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 32013 Jan 11 2025 1736550035.6163204+472f81ece5e45a2651a4499bef418f611b43c619.pck.tmp
Nothing special there (shunt should be checked, but not in correlation with my mail).
I have one list which archives the mail. In the archive there is the mail, but it doesn't get deliverd
How many subscribers do you have on this list? And you said their "Delivery Status" is set to "Enabled"??
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 In an Internet failure case, the #1 suspect is a constant: DNS. "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html]

On the current list there were about 43000 Subscribers.
And the "Delivery Status" of the members is enabled. Via REST API:
{"address": "http://localhost:8001/3.1/addresses/user@example.com", "bounce_score": 0, "last_warning_sent": "0001-01-01T00:00:00", "total_warnings_sent": 0, "delivery_mode": "regular", "email": "user@example.com", "list_id": "users.lists.example.com", "subscription_mode": "as_address", "role": "member", "user": "http://localhost:8001/3.1/users/c108f28ca58c450ea2c2a91e56b79a3a", "display_name": "", "self_link": "http://localhost:8001/3.1/members/4529b34a6977481b8f9f33efe6e6aa35", "member_id": "4529b34a6977481b8f9f33efe6e6aa35", "http_etag": "\"d36844a7a58a87880efa4f84c90ae3c90df38e29\""}
On another list with the same problem are 63 subscribers.

On 7/31/25 04:29, Stephan Krinetzki wrote:
/opt/mailman/var/queue/out: total 2868 drwxrwx--- 2 mailman mailman 4096 Jul 31 13:26 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 221708 Jul 30 13:16 1753874180.0845337+0cc7849043859a79dc3678a0d8b63c1c66df0c66.pck.tmp -rw-rw---- 1 mailman mailman 733425 Jul 31 00:00 1753912841.8847518+da55f8789ae41b75d19a02b595e3fd6d45983ade.pck.tmp
These indicate a problem of some sort. .pck.tmp files should only exist for a fraction of a second. See https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/core/switchboar...
What does `mailman qfile report for these? Are they the missing messages?
-rw-rw---- 1 mailman mailman 17758 Jul 31 13:26 1753961185.0481033+6adf966467266567275f1146f5054c95e4365c13.pck -rw-rw---- 1 mailman mailman 31122 Jul 31 13:26 1753961185.1064982+b1f502e2af56b9b11680135c6de5fcc5285d967e.bak -rw-rw---- 1 mailman mailman 12945 Jul 31 13:26 1753961185.1562705+a88534cfa9b03012c248ac00cbb54f4fc08c8dcc.pck -rw-rw---- 1 mailman mailman 1407634 Jul 31 13:26 1753961185.2598634+96e501d7aa8e72fbb9f9a5c953d48fc2c7db2bc7.pck -rw-rw---- 1 mailman mailman 17570 Jul 31 13:26 1753961185.3654013+d693065f2b9a2338f887d3ac9c0669c957131100.pck -rw-rw---- 1 mailman mailman 60624 Jul 31 13:26 1753961185.4132354+8998ee9eb9a8a99faf1fcf045d604e669f5d0c9e.pck -rw-rw---- 1 mailman mailman 7361 Jul 31 13:26 1753961185.413462+50dcef9bb178f54ecb2d4707ccfe02773f92336c.pck -rw-rw---- 1 mailman mailman 84848 Jul 31 13:26 1753961185.4464717+17836b8875aa6643f7e5fb0744e72940ce5603c6.pck -rw-rw---- 1 mailman mailman 18003 Jul 31 13:26 1753961185.4929028+17e9c78a2721cbc344124c6e04a129217ea6c794.pck -rw-rw---- 1 mailman mailman 66982 Jul 31 13:26 1753961185.5228019+4ef59e61c6a79dd4cdee6d1110fb0582fcfafb6c.pck -rw-rw---- 1 mailman mailman 221723 Jul 31 13:26 1753961185.5756643+6de90ae99b3d528ce5234b78a287e19f7b6686fc.pck
Also, this is a lot of outgoing messages to be queued. Are you sending
periodic digests at this time? If you normally see a lot of messages in
the out queue, you might consider setting it's instances
to 2 or 4. E.g.,
[runner.out]
instances: 4
in mailman.cfg
/opt/mailman/var/queue/shunt: total 3304 drwxrwx--- 2 mailman mailman 4096 Jul 31 11:44 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 451 Jul 31 00:00 1753912822.2651796+28eceef7e18eb70393377b88dc7117af8f9362a0.pck -rw-rw---- 1 mailman mailman 490 Jul 31 00:00 1753912838.4197352+ea531cf0262c1faa58b1679b907fee92bc16822c.pck -rw-rw---- 1 mailman mailman 1407870 Jul 31 00:00 1753912841.9177196+ccea15bdefce3a54301281c8eddf86e8230244a6.pck -rw-rw---- 1 mailman mailman 86108 Jul 31 00:00 1753912841.9197443+7dcef4febc71e44c6d9309a24a08b08753e1ff42.pck -rw-rw---- 1 mailman mailman 1407668 Jul 31 00:00 1753912841.9849963+a3f1869b750060c97262ece38737480d91652828.pck -rw-rw---- 1 mailman mailman 38992 Jul 31 00:01 1753912860.5167956+940584c4f361cbd8c29e390b2f60590558effe40.pck -rw-rw---- 1 mailman mailman 440 Jul 31 00:01 1753912860.6972685+635c065bac8dff5f9d562275d707001d773b84c1.pck -rw-rw---- 1 mailman mailman 445 Jul 31 00:01 1753912868.7903054+befa066f254d7a3529a8555a6c942a554715d837.pck -rw-rw---- 1 mailman mailman 33494 Jul 31 00:01 1753912878.7562895+82b724fd93260ab9a2bb49709d3a42a2f32f2c80.pck -rw-rw---- 1 mailman mailman 217073 Jul 31 00:01 1753912878.9337828+de73a65d9c6febfa80275853921f4b53fd1d9e2a.pck -rw-rw---- 1 mailman mailman 85888 Jul 31 00:02 1753912950.303359+8a87bce0be63ac1df8493c6b1ad6ae154fcedba7.pck -rw-rw---- 1 mailman mailman 50244 Jul 31 00:02 1753912950.4970112+d44d493912bb3024547b8a5112f86f035dcb352f.pck -rw-rw---- 1 mailman mailman 12887 Jul 31 00:02 1753912970.038427+31bcbd7fb2ebdf81f6de24b7283b50bcda6ded21.pck.tmp -rw-rw---- 1 mailman mailman 443 Jul 31 11:44 1753955094.900898+67bc76525412da66a7c76363f65f583989716305.pck
/opt/mailman/var/queue/virgin: total 32 drwxrwx--- 2 mailman mailman 81 Jul 31 13:17 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 32013 Jan 11 2025 1736550035.6163204+472f81ece5e45a2651a4499bef418f611b43c619.pck.tmp
Nothing special there (shunt should be checked, but not in correlation with my mail).
Actually those .pck.tmp files indicate some issue.
I have one list which archives the mail. In the archive there is the mail, but it doesn't get deliverd
Which says the pipeline to_archive handler is adding the message to the archive queue and the archive runner is handling it, so presumably the pipeline to_outgoing handler is adding the message to the out queue, so the issue is in the outging runner.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On Thu, Jul 31, 2025 at 8:21 PM Mark Sapiro <mark@msapiro.net> wrote:
On 7/31/25 04:29, Stephan Krinetzki wrote:
/opt/mailman/var/queue/out: total 2868 drwxrwx--- 2 mailman mailman 4096 Jul 31 13:26 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 221708 Jul 30 13:16
1753874180.0845337+0cc7849043859a79dc3678a0d8b63c1c66df0c66.pck.tmp
-rw-rw---- 1 mailman mailman 733425 Jul 31 00:00 1753912841.8847518+da55f8789ae41b75d19a02b595e3fd6d45983ade.pck.tmp
These indicate a problem of some sort. .pck.tmp files should only exist for a fraction of a second. See
https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/core/switchboar...
What does `mailman qfile report for these? Are they the missing messages?
-rw-rw---- 1 mailman mailman 17758 Jul 31 13:26 1753961185.0481033+6adf966467266567275f1146f5054c95e4365c13.pck -rw-rw---- 1 mailman mailman 31122 Jul 31 13:26 1753961185.1064982+b1f502e2af56b9b11680135c6de5fcc5285d967e.bak -rw-rw---- 1 mailman mailman 12945 Jul 31 13:26 1753961185.1562705+a88534cfa9b03012c248ac00cbb54f4fc08c8dcc.pck -rw-rw---- 1 mailman mailman 1407634 Jul 31 13:26 1753961185.2598634+96e501d7aa8e72fbb9f9a5c953d48fc2c7db2bc7.pck -rw-rw---- 1 mailman mailman 17570 Jul 31 13:26 1753961185.3654013+d693065f2b9a2338f887d3ac9c0669c957131100.pck -rw-rw---- 1 mailman mailman 60624 Jul 31 13:26 1753961185.4132354+8998ee9eb9a8a99faf1fcf045d604e669f5d0c9e.pck -rw-rw---- 1 mailman mailman 7361 Jul 31 13:26 1753961185.413462+50dcef9bb178f54ecb2d4707ccfe02773f92336c.pck -rw-rw---- 1 mailman mailman 84848 Jul 31 13:26 1753961185.4464717+17836b8875aa6643f7e5fb0744e72940ce5603c6.pck -rw-rw---- 1 mailman mailman 18003 Jul 31 13:26 1753961185.4929028+17e9c78a2721cbc344124c6e04a129217ea6c794.pck -rw-rw---- 1 mailman mailman 66982 Jul 31 13:26 1753961185.5228019+4ef59e61c6a79dd4cdee6d1110fb0582fcfafb6c.pck -rw-rw---- 1 mailman mailman 221723 Jul 31 13:26 1753961185.5756643+6de90ae99b3d528ce5234b78a287e19f7b6686fc.pck
Also, this is a lot of outgoing messages to be queued. Are you sending periodic digests at this time? If you normally see a lot of messages in the out queue, you might consider setting it's
instances
to 2 or 4. E.g.,[runner.out] instances: 4
in mailman.cfg
/opt/mailman/var/queue/shunt: total 3304 drwxrwx--- 2 mailman mailman 4096 Jul 31 11:44 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 451 Jul 31 00:00 1753912822.2651796+28eceef7e18eb70393377b88dc7117af8f9362a0.pck -rw-rw---- 1 mailman mailman 490 Jul 31 00:00 1753912838.4197352+ea531cf0262c1faa58b1679b907fee92bc16822c.pck -rw-rw---- 1 mailman mailman 1407870 Jul 31 00:00 1753912841.9177196+ccea15bdefce3a54301281c8eddf86e8230244a6.pck -rw-rw---- 1 mailman mailman 86108 Jul 31 00:00 1753912841.9197443+7dcef4febc71e44c6d9309a24a08b08753e1ff42.pck -rw-rw---- 1 mailman mailman 1407668 Jul 31 00:00 1753912841.9849963+a3f1869b750060c97262ece38737480d91652828.pck -rw-rw---- 1 mailman mailman 38992 Jul 31 00:01 1753912860.5167956+940584c4f361cbd8c29e390b2f60590558effe40.pck -rw-rw---- 1 mailman mailman 440 Jul 31 00:01 1753912860.6972685+635c065bac8dff5f9d562275d707001d773b84c1.pck -rw-rw---- 1 mailman mailman 445 Jul 31 00:01 1753912868.7903054+befa066f254d7a3529a8555a6c942a554715d837.pck -rw-rw---- 1 mailman mailman 33494 Jul 31 00:01 1753912878.7562895+82b724fd93260ab9a2bb49709d3a42a2f32f2c80.pck -rw-rw---- 1 mailman mailman 217073 Jul 31 00:01 1753912878.9337828+de73a65d9c6febfa80275853921f4b53fd1d9e2a.pck -rw-rw---- 1 mailman mailman 85888 Jul 31 00:02 1753912950.303359+8a87bce0be63ac1df8493c6b1ad6ae154fcedba7.pck -rw-rw---- 1 mailman mailman 50244 Jul 31 00:02 1753912950.4970112+d44d493912bb3024547b8a5112f86f035dcb352f.pck -rw-rw---- 1 mailman mailman 12887 Jul 31 00:02 1753912970.038427+31bcbd7fb2ebdf81f6de24b7283b50bcda6ded21.pck.tmp -rw-rw---- 1 mailman mailman 443 Jul 31 11:44 1753955094.900898+67bc76525412da66a7c76363f65f583989716305.pck
/opt/mailman/var/queue/virgin: total 32 drwxrwx--- 2 mailman mailman 81 Jul 31 13:17 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 32013 Jan 11 2025 1736550035.6163204+472f81ece5e45a2651a4499bef418f611b43c619.pck.tmp
Nothing special there (shunt should be checked, but not in correlation with my mail).
Actually those .pck.tmp files indicate some issue.
I have one list which archives the mail. In the archive there is the mail, but it doesn't get deliverd
Which says the pipeline to_archive handler is adding the message to the archive queue and the archive runner is handling it, so presumably the pipeline to_outgoing handler is adding the message to the out queue, so the issue is in the outging runner.
In one response, he mentioned that "Other mailinglists on the same host are doing fine. Even a 1:1 copy of the list works - only the original list does not work". Does it mean that "the issue is in the outgoing runner" only for a single list?
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 In an Internet failure case, the #1 suspect is a constant: DNS. "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html]

On 7/31/25 10:38, Odhiambo Washington via Mailman-users wrote:
In one response, he mentioned that "Other mailinglists on the same host are doing fine. Even a 1:1 copy of the list works - only the original list does not work". Does it mean that "the issue is in the outgoing runner" only for a single list?
So far we have not determined what the issue is. Until we do, we don't know in which module it is. It could be in outgoing runner involving something in the list's configuration or membership that isn't in the "1:1 copy" or it could be something to do with the message itself. It could even be an OS error such as "out of memory" due to the large number of list members.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
On 7/31/25 04:29, Stephan Krinetzki wrote:
I have one list which archives the mail. In the archive there is the mail, but it doesn't get deliverd
Which says the pipeline to_archive handler is adding the message to the archive queue and the archive runner is handling it, so presumably the pipeline to_outgoing handler is adding the message to the out queue, so the issue is in the outging runner.
Actually, there's more involved in the pipeline between the to_archive and to_outgoing handlers. The to-digest, to-usenet, after-delivery, acknowledge and dmarc handlers are all invoked between to_archive and to_outgoing
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro writes:
Actually, there's more involved in the pipeline between the to_archive and to_outgoing handlers. The to-digest, to-usenet, after-delivery, acknowledge and dmarc handlers are all invoked between to_archive and to_outgoing
IIRC all of the shunted messages that Stephen looked at with qfiles were those special digest messages (ie, message component empty, pointer to lists/$LIST/something.mmdf in the msg_data component). So something is going wrong in the to-digest handler.
I don't understand why the original .pck would disappear without either leaving an error log or delivering. (Hmm, @Stephan, have you looked at your "vette" log? it is a separate log in your mailman.cfg I think.)
The number of .tmp files lying around bothers me. AFA grep CS, the only place that can happen is in switchboard.py:136 in .enqueue:
with open(tmpfile, 'wb') as fp:
fp.write(msgsave)
pickle.dump(data, fp, protocol)
fp.flush()
os.fsync(fp.fileno())
os.rename(tmpfile, filename)
where msgsave
is already a pickled object. So either pickle.dump is
choking on something in data (the metadata, which I believe is all
primitive Python data types), or something (OOM kill?) is happening at
the OS level. A crash in pickle.dump should leave an exception log
and backtrace in the logs.
AFAIK, Mailman does not clean up .tmp files at startup, right?
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

On 8/2/25 01:51, Stephen J. Turnbull wrote:
IIRC all of the shunted messages that Stephen looked at with qfiles were those special digest messages (ie, message component empty, pointer to lists/$LIST/something.mmdf in the msg_data component). So something is going wrong in the to-digest handler.
And for every one of those shunted messages there should be an exception with traceback logged in mailman.log. Those tracebacks should be helpful.
The number of .tmp files lying around bothers me. AFA grep CS, the only place that can happen is in switchboard.py:136 in .enqueue:
with open(tmpfile, 'wb') as fp: fp.write(msgsave) pickle.dump(data, fp, protocol) fp.flush() os.fsync(fp.fileno()) os.rename(tmpfile, filename)
where
msgsave
is already a pickled object. So either pickle.dump is choking on something in data (the metadata, which I believe is all primitive Python data types), or something (OOM kill?) is happening at the OS level. A crash in pickle.dump should leave an exception log and backtrace in the logs.AFAIK, Mailman does not clean up .tmp files at startup, right?
That is correct. The *.pck.tmp file is created by the above code and immediately after writing is renamed to *.pck. It is done this way to prevent another process picking up a partially written *.pck.
If a *.pck.tmp file is somehow left behind, it is never looked at or deleted by any Mailman code.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi all,
after the Weekend, this is the status of the queues:
/opt/mailman/var/queue/archive: total 0 drwxrwx--- 2 mailman mailman 6 Aug 4 08:47 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/bad: total 3016 drwxrwx--- 2 mailman mailman 4096 Aug 2 00:02 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 221723 Aug 1 17:26 1754061973.4885209+46b1ae3716439bf3ef98090296dfce0320fc3017.psv -rw-rw---- 1 mailman mailman 32912 Aug 2 00:00 1754085602.191851+3576cf33232db110fa7761233f67245564553652.psv -rw-rw---- 1 mailman mailman 416 Aug 2 00:00 1754085604.0204346+ad485da0c45cb0ad17a5dc42613c3eb3f313c20e.psv -rw-rw---- 1 mailman mailman 1407649 Aug 2 00:00 1754085623.275817+f23139c8127c454b4fe65453af3db18e558b0e87.psv -rw-rw---- 1 mailman mailman 1407634 Aug 2 00:02 1754085729.3529432+1643f907bac39a22a7d71e50b031c4f8a574082c.psv
/opt/mailman/var/queue/bounces: total 0 drwxrwx--- 2 mailman mailman 6 Aug 4 05:22 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/command: total 0 drwxrwx--- 2 mailman mailman 6 Aug 4 08:14 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/digest: total 0 drwxrwx--- 2 mailman mailman 6 Aug 4 08:21 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/in: total 0 drwxrwx--- 2 mailman mailman 6 Aug 4 08:47 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/nntp: total 0 drwxrwx--- 2 mailman mailman 6 Jun 27 2024 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/out: total 1772 drwxrwx--- 2 mailman mailman 4096 Aug 4 08:49 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 1407649 Aug 2 00:00 1754085626.995262+ebf03275f7441b1bc7bbaf063cb6238bec30ff9f.pck.tmp -rw-rw---- 1 mailman mailman 50244 Aug 3 00:00 1754172045.4400518+1da9c6fd82ee0dbc0e893eeca713d498a7273150.pck.tmp -rw-rw---- 1 mailman mailman 31122 Aug 4 08:49 1754290173.7476344+7237c062741059807b66024091571ce3399d8eda.pck -rw-rw---- 1 mailman mailman 18091 Aug 4 08:49 1754290173.8104873+cb4c78dffcd7147a5deca51b2cfd92829c06af9d.pck -rw-rw---- 1 mailman mailman 18243 Aug 4 08:49 1754290173.8333225+d77876f47de73141dd5f6f9ac8d33d937bf6a727.pck -rw-rw---- 1 mailman mailman 17585 Aug 4 08:49 1754290173.8802657+cb8103f6db5193a75706a7c1bc0d4f90126e587a.pck -rw-rw---- 1 mailman mailman 217073 Aug 4 08:49 1754290173.9709926+90b2bfd3adf5faa7d0ec336f07dbe2267e666c75.pck -rw-rw---- 1 mailman mailman 33494 Aug 4 08:49 1754290174.019171+553fc3546ae6c5f8e1b0f27ffc60e154629b9f1d.pck
/opt/mailman/var/queue/pipeline: total 0 drwxrwx--- 2 mailman mailman 6 Aug 4 08:47 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/retry: total 0 drwxrwx--- 2 mailman mailman 6 Feb 18 08:57 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
/opt/mailman/var/queue/shunt: total 9012 drwxrwx--- 2 mailman mailman 8192 Aug 4 00:00 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 490 Aug 1 10:26 1754036797.3635633+4a7750d6b8765f9f982dbfcbf9e972d8055bb4c5.pck -rw-rw---- 1 mailman mailman 445 Aug 1 10:26 1754036797.618993+a6e6aefca6a76ed55c5bb4fae448f59c42a6f246.pck -rw-rw---- 1 mailman mailman 443 Aug 1 10:26 1754036797.647361+67d36f287c76ea09b0524dcf03e080d68604d063.pck -rw-rw---- 1 mailman mailman 443 Aug 1 10:26 1754036797.67375+0a78febb8e89d073607be70ccf90196b3e7fac17.pck -rw-rw---- 1 mailman mailman 14233 Aug 1 10:50 1754038238.585611+3814bb1ce4232c97991a9cbcaed482966788e7c6.pck -rw-rw---- 1 mailman mailman 14206 Aug 1 10:50 1754038251.6686325+2cacc5d5709cf5c9d8ce571947c8395eb0da37c9.pck -rw-rw---- 1 mailman mailman 14466 Aug 1 10:51 1754038263.7415857+7f8260ca4bdf6f109be00206e07d03e717bfa6e1.pck -rw-rw---- 1 mailman mailman 14297 Aug 1 10:51 1754038273.8673453+9ccb00ea1dc05fcbfb2bc0ef071e1f3083b0e73f.pck -rw-rw---- 1 mailman mailman 443 Aug 1 13:30 1754047820.8963482+a122a01a47aa5c6dd8240d8ea7bc35fb3960a46d.pck -rw-rw---- 1 mailman mailman 10870 Aug 1 15:21 1754054475.6638494+f8156ebc84effc7680b64ec03edc7d86b8f6eb65.pck -rw-rw---- 1 mailman mailman 14782 Aug 1 16:02 1754056965.1972625+8273637db3056c2325a0903b7171725fb6f4d5e8.pck -rw-rw---- 1 mailman mailman 221723 Aug 1 17:26 1754061973.4941757+4dc6368d88536bc195afdbce9432375166817413.pck -rw-rw---- 1 mailman mailman 32912 Aug 2 00:00 1754085603.42874+d0b308c26d31e682d256e6cda3786797cad7b062.pck.tmp -rw-rw---- 1 mailman mailman 17773 Aug 2 00:00 1754085627.0487497+cf822cd6608fcca545f5350124eae405df598f59.pck -rw-rw---- 1 mailman mailman 478 Aug 2 00:00 1754085627.2411385+cbc654e510b8d618a2999274476b000542feff0e.pck -rw-rw---- 1 mailman mailman 733213 Aug 2 00:00 1754085627.264005+5a16138d0e33252f2606ceb69577bb58d88bb2e5.pck.tmp -rw-rw---- 1 mailman mailman 86108 Aug 2 00:00 1754085646.590158+624db84af49b742971e10004557f4289a2c5bccf.pck -rw-rw---- 1 mailman mailman 446 Aug 2 00:00 1754085646.7170782+9ee0dedf32447f0e860dbe589fed9a272ceeadf8.pck.tmp -rw-rw---- 1 mailman mailman 18750 Aug 2 00:01 1754085665.8180223+2d11df5f3b4b96ba710fc1bd24551cf67598be79.pck -rw-rw---- 1 mailman mailman 21926 Aug 2 00:01 1754085665.823551+63198e22da918bc7193d93ec2e3cc01f8aaf5e44.pck -rw-rw---- 1 mailman mailman 60639 Aug 2 00:01 1754085665.901899+4e47db603c553fa357350865de35092a4ebef5fe.pck -rw-rw---- 1 mailman mailman 85888 Aug 2 00:01 1754085665.9268048+8ec4679a850410c2e574d4e63f0a4f3ab602af3b.pck -rw-rw---- 1 mailman mailman 449 Aug 2 00:01 1754085666.0164568+fee1f138066b27b2752812f59e8154f47a534df7.pck -rw-rw---- 1 mailman mailman 1407870 Aug 2 00:01 1754085696.4017332+94ab6c71d0e984764abd160f14775bb7f73b1222.pck -rw-rw---- 1 mailman mailman 475 Aug 2 00:01 1754085696.6109+ced325862a46537b64b26257214a9190c3dde0f2.pck -rw-rw---- 1 mailman mailman 311196 Aug 2 00:01 1754085715.0974422+7b9989730c03cff548e08961591887e17cc2324c.pck -rw-rw---- 1 mailman mailman 437 Aug 2 00:01 1754085715.1049805+27e5b8943991e847faf240e5e29f8d0c08af4503.pck -rw-rw---- 1 mailman mailman 29278 Aug 2 00:01 1754085715.1054037+b2fa0b025564ef9ba8d6917b8390cbe8fadf7b94.pck -rw-rw---- 1 mailman mailman 20878 Aug 2 00:01 1754085715.1229646+805071d12ddfe493768a1e83f05db70aa92fb3bb.pck -rw-rw---- 1 mailman mailman 7376 Aug 2 00:01 1754085715.2501519+16602d78716c922920e770b6d9e17609bdbc4adb.pck -rw-rw---- 1 mailman mailman 1407649 Aug 2 00:02 1754085734.1192129+c43923f712a65cc4a111ad09a8f610df855a8692.pck.tmp -rw-rw---- 1 mailman mailman 458 Aug 2 00:02 1754085734.1719563+31c0d3d15642b9c7c9f8fbfb94cc0ad8bfccd912.pck -rw-rw---- 1 mailman mailman 12960 Aug 2 00:02 1754085734.21608+0265fcd448c9fb539c801f3a19ddb18c4956aa0e.pck -rw-rw---- 1 mailman mailman 17254 Aug 2 15:26 1754141215.9892564+3b3dc4f6023516ad41b8a0430887f10701fccb3c.pck -rw-rw---- 1 mailman mailman 38992 Aug 3 00:00 1754172001.3344436+8a1c1a9b3703f0421263c2b24fc27c9a9bb9116d.pck -rw-rw---- 1 mailman mailman 1332842 Aug 3 00:00 1754172001.3540711+13acca94280dbcbb1a1cb8e730fbf87971b31aa9.pck -rw-rw---- 1 mailman mailman 752482 Aug 3 00:00 1754172001.3596764+d2d772c50ff9812632bd6b054dfb9973758ed5c4.pck -rw-rw---- 1 mailman mailman 18267 Aug 3 00:00 1754172026.4244561+b2e097258a1123092a199858a50f7d4cb4d5ca65.pck -rw-rw---- 1 mailman mailman 17241 Aug 3 00:00 1754172026.4761071+9818446a413647ef57fc5f60d8fa51da43d85da8.pck -rw-rw---- 1 mailman mailman 1407668 Aug 3 00:00 1754172026.4913034+c3055e1f383c0b3226169006e1cb79322e09846d.pck -rw-rw---- 1 mailman mailman 50244 Aug 3 00:00 1754172045.543335+521b1a4d393691e85c86fd3a313efc6d90008b30.pck -rw-rw---- 1 mailman mailman 32173 Aug 3 00:00 1754172045.5646617+ea2a4f2a232dfa9750e58c5b8cf9150c933e8ae7.pck -rw-rw---- 1 mailman mailman 37411 Aug 3 00:01 1754172102.5104077+c9cabab5571fb2db62cbc688b6cc0e05f2fa2bfb.pck -rw-rw---- 1 mailman mailman 25419 Aug 3 00:01 1754172102.5282032+20946f26d1adf7c481e504de4d589cdb1cc21edb.pck -rw-rw---- 1 mailman mailman 276542 Aug 3 00:02 1754172120.9135556+ddbc3a3e9ce4afc78b95d03d73b0c8737f3f6606.pck -rw-rw---- 1 mailman mailman 18018 Aug 3 00:02 1754172140.6559258+621dad8dc6a84e2ddc5413cedd7f8860b267c27c.pck -rw-rw---- 1 mailman mailman 12887 Aug 4 00:00 1754258401.069818+189eeede43be7771f2932f2fb0954f1b09cb9fb9.pck -rw-rw---- 1 mailman mailman 311196 Aug 4 00:00 1754258421.5529852+ca4e152d308b15023f5203309cc583929177cda2.pck -rw-rw---- 1 mailman mailman 66982 Aug 4 00:00 1754258441.022148+91e14500fe3abe2bdcf69e56f5f200ca851e3a7e.pck -rw-rw---- 1 mailman mailman 84863 Aug 4 00:00 1754258441.0448039+48b144d7abc1fe88b2fc6aef25398993c6bdc2e5.pck -rw-rw---- 1 mailman mailman 20951 Aug 4 00:00 1754258459.2980123+485ba78453a56be2211e9a02277087ad5cf12b22.pck
/opt/mailman/var/queue/virgin: total 0 drwxrwx--- 2 mailman mailman 6 Aug 4 08:47 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 ..
So let's start with bad queue:
mailman qfile /opt/mailman/var/queue/bad/1754061973.4885209+46b1ae3716439bf3ef98090296dfce0320fc3017.psv
Traceback (most recent call last): File "/opt/mailman/mailman-venv/bin/mailman", line 8, in <module> sys.exit(main()) ^^^^^^ File "/opt/mailman/mailman-venv/lib64/python3.11/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/mailman/mailman-venv/lib64/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/mailman/mailman-venv/lib64/python3.11/site-packages/mailman/bin/mailman.py", line 69, in invoke return super().invoke(ctx) ^^^^^^^^^^^^^^^^^^^ File "/opt/mailman/mailman-venv/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/mailman/mailman-venv/lib64/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/mailman/mailman-venv/lib64/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/mailman/mailman-venv/lib64/python3.11/site-packages/mailman/commands/cli_qfile.py", line 63, in qfile m.append(pickle.load(fp)) ^^^^^^^^^^^^^^^ UnicodeDecodeError: 'utf-8' codec can't decode byte 0x95 in position 25: invalid start byte
Seems to be an decoding error
mailman qfile /opt/mailman/var/queue/bad/1754061973.4885209+46b1ae3716439bf3ef98090296dfce0320fc3017.psv
First Object is the E-Mail with a lot of HTML (Outlook seems to be the client, so...)
The Second:
{ '_parsemsg': False, 'approved': True, 'envsender': 'noreply@lists.example.com, 'lang': 'de', 'listid': 'kennziffern.lists.example.com, 'member_moderation_action': 'hold', 'moderation_reasons': ['The message comes from a moderated member'], 'moderation_sender': '<Sender Address>, 'moderator_approved': True, 'original_sender': <Sender Address>', 'original_size': 28614, 'original_subject': '=?iso-8859-1?Q?=C4nderungen_im_Orgaverzeichnis?=', 'received_time': datetime.datetime(2025, 7, 31, 11, 16, 46, 475199), 'recipients': { <some recipients>}, 'rule_hits': ['member-moderation'], 'rule_misses': [ 'dmarc-mitigation', 'no-senders', 'approved', 'loop', 'banned-address', 'header-match-config-1', 'emergency'], 'stripped_subject': 'Änderungen im Orgaverzeichnis', 'to_list': True, 'type': 'data', 'verp': False, 'version': 3, 'whichq': 'out'}
I don't see here a problem. But the timestamp seems to be related to the restart of mailman. Can I skip this in the logrotate?
mailman qfile /opt/mailman/var/queue/bad/1754085604.0204346+ad485da0c45cb0ad17a5dc42613c3eb3f313c20e.psv
It's a digest:
[----- start pickle -----] <----- start object 1 ----->
<----- start object 2 -----> { '_parsemsg': False, 'digest_number': 7, 'digest_path': '/opt/mailman/var/lists/doc-infos.lists.example.com/digest.134.7.mmdf', 'listid': 'doc-infos.lists.example.com', 'version': 3, 'volume': 134} [----- end pickle -----]
Btw: The crontab is the following:
#####0 */2 * * * apache /opt/mailman/mailman-venv/bin/django-admin runjobs minutely --pythonpath /opt/mailman/mailman-suite/mailman-suite_project --settings settings #*/30 * * * * mailman /opt/mailman/mailman-venv/bin/django-admin runjobs minutely --pythonpath /etc/mailman3/ --settings settings @hourly mailman /opt/mailman/mailman-venv/bin/django-admin runjobs hourly --pythonpath /etc/mailman3/ --settings settings #####@daily apache /opt/mailman/mailman-venv/bin/django-admin runjobs daily --pythonpath /etc/mailman3/ --settings settings @monthly mailman /opt/mailman/mailman-venv/bin/django-admin runjobs monthly --pythonpath /etc/mailman3/ --settings settings @yearly mailman /opt/mailman/mailman-venv/bin/django-admin runjobs yearly --pythonpath /etc/mailman3/ --settings settings @daily mailman cd /opt/mailman; source /opt/mailman/mailman-venv/bin/activate; /opt/mailman/mailman-venv/bin/mailman digests --send > /dev/null 2>&1
mailman qfile /opt/mailman/var/queue/bad/1754085623.275817+f23139c8127c454b4fe65453af3db18e558b0e87.psv
That's the Mail which should be send to ~43000 Members.
Header says:
Received: from <ext. Server> (<IP>) by lists.example.com (Postfix) with ESMTPS id F16D88016D1C for <mm@lists.example.com>; Thu, 31 Jul 2025 09:58:38 +0200 (CEST)
The date of the file is
-rw-rw---- 1 mailman mailman 1407649 Aug 2 00:00 1754085623.275817+f23139c8127c454b4fe65453af3db18e558b0e87.psv
So i checked the mailman.log:
[2025-08-01 00:00:02 +0200] [324558] [INFO] Handling signal: term [2025-08-01 00:00:02 +0200] [324568] [INFO] Worker exiting (pid: 324568) [2025-08-01 00:00:02 +0200] [324571] [INFO] Worker exiting (pid: 324571) [2025-08-01 00:00:02 +0200] [324572] [INFO] Worker exiting (pid: 324572) [2025-08-01 00:00:02 +0200] [324574] [INFO] Worker exiting (pid: 324574) [2025-08-01 00:00:02 +0200] [324558] [ERROR] Worker (pid:324571) was sent SIGTERM! [2025-08-01 00:00:02 +0200] [324558] [ERROR] Worker (pid:324572) was sent SIGTERM! [2025-08-01 00:00:02 +0200] [324558] [ERROR] Worker (pid:324568) was sent SIGTERM! [2025-08-01 00:00:02 +0200] [324558] [ERROR] Worker (pid:324574) was sent SIGTERM! [2025-08-01 00:00:02 +0200] [324558] [INFO] Shutting down: Master Aug 01 00:00:11 2025 (567061) Task runner evicted 0 expired pendings [2025-08-01 00:00:12 +0200] [567059] [INFO] Starting gunicorn 22.0.0 [2025-08-01 00:00:12 +0200] [567059] [INFO] Listening at: http://127.0.0.1:8001 (567059) [2025-08-01 00:00:12 +0200] [567059] [INFO] Using worker: sync [2025-08-01 00:00:12 +0200] [567069] [INFO] Booting worker with pid: 567069 [2025-08-01 00:00:12 +0200] [567070] [INFO] Booting worker with pid: 567070 [2025-08-01 00:00:12 +0200] [567071] [INFO] Booting worker with pid: 567071 [2025-08-01 00:00:12 +0200] [567073] [INFO] Booting worker with pid: 567073 Aug 01 00:00:13 2025 (567061) Task runner deleted 0 orphaned workflows <Ommited GET Request> Aug 01 00:00:21 2025 (567061) Task runner deleted 0 orphaned requests [2025-08-01 00:00:23 +0200] [567059] [INFO] Handling signal: term [2025-08-01 00:00:23 +0200] [567073] [INFO] Worker exiting (pid: 567073) [2025-08-01 00:00:23 +0200] [567069] [INFO] Worker exiting (pid: 567069) [2025-08-01 00:00:23 +0200] [567070] [INFO] Worker exiting (pid: 567070) [2025-08-01 00:00:23 +0200] [567071] [INFO] Worker exiting (pid: 567071) [2025-08-01 00:00:23 +0200] [567059] [ERROR] Worker (pid:567073) was sent SIGTERM! [2025-08-01 00:00:23 +0200] [567059] [ERROR] Worker (pid:567070) was sent SIGTERM! [2025-08-01 00:00:23 +0200] [567059] [ERROR] Worker (pid:567071) was sent SIGTERM! [2025-08-01 00:00:23 +0200] [567059] [ERROR] Worker (pid:567069) was sent SIGTERM! [2025-08-01 00:00:23 +0200] [567059] [INFO] Shutting down: Master [2025-08-01 00:00:35 +0200] [567206] [INFO] Starting gunicorn 22.0.0 [2025-08-01 00:00:35 +0200] [567206] [INFO] Listening at: http://127.0.0.1:8001 (567206) [2025-08-01 00:00:35 +0200] [567206] [INFO] Using worker: sync [2025-08-01 00:00:35 +0200] [567246] [INFO] Booting worker with pid: 567246 [2025-08-01 00:00:35 +0200] [567250] [INFO] Booting worker with pid: 567250 [2025-08-01 00:00:35 +0200] [567252] [INFO] Booting worker with pid: 567252 [2025-08-01 00:00:35 +0200] [567253] [INFO] Booting worker with pid: 567253 <Ommited GET Requests> Aug 01 00:00:36 2025 (567208) Task runner evicted 0 expired pendings <Ommited GET Requests> Aug 01 00:00:38 2025 (567208) Task runner deleted 0 orphaned workflows [2025-08-01 00:00:42 +0200] [567206] [INFO] Handling signal: term [2025-08-01 00:00:42 +0200] [567246] [INFO] Worker exiting (pid: 567246) [2025-08-01 00:00:42 +0200] [567253] [INFO] Worker exiting (pid: 567253) [2025-08-01 00:00:42 +0200] [567250] [INFO] Worker exiting (pid: 567250) [2025-08-01 00:00:42 +0200] [567252] [INFO] Worker exiting (pid: 567252) [2025-08-01 00:00:42 +0200] [567206] [ERROR] Worker (pid:567250) was sent SIGTERM! [2025-08-01 00:00:42 +0200] [567206] [ERROR] Worker (pid:567252) was sent SIGTERM! [2025-08-01 00:00:42 +0200] [567206] [ERROR] Worker (pid:567246) was sent SIGTERM! [2025-08-01 00:00:42 +0200] [567206] [ERROR] Worker (pid:567253) was sent SIGTERM! [2025-08-01 00:00:42 +0200] [567206] [INFO] Shutting down: Master Aug 01 00:00:54 2025 (567280) Task runner evicted 2 expired pendings Aug 01 00:00:56 2025 (567280) Task runner deleted 0 orphaned workflows [2025-08-01 00:00:58 +0200] [567278] [INFO] Starting gunicorn 22.0.0 [2025-08-01 00:00:58 +0200] [567278] [INFO] Listening at: http://127.0.0.1:8001 (567278) [2025-08-01 00:00:58 +0200] [567278] [INFO] Using worker: sync [2025-08-01 00:00:58 +0200] [567327] [INFO] Booting worker with pid: 567327 [2025-08-01 00:00:58 +0200] [567328] [INFO] Booting worker with pid: 567328 [2025-08-01 00:00:58 +0200] [567329] [INFO] Booting worker with pid: 567329 [2025-08-01 00:00:58 +0200] [567330] [INFO] Booting worker with pid: 567330 <Ommited GET Requests> [2025-08-01 00:01:00 +0200] [567278] [INFO] Handling signal: term [2025-08-01 00:01:00 +0200] [567327] [INFO] Worker exiting (pid: 567327) [2025-08-01 00:01:00 +0200] [567328] [INFO] Worker exiting (pid: 567328) [2025-08-01 00:01:00 +0200] [567329] [INFO] Worker exiting (pid: 567329) [2025-08-01 00:01:00 +0200] [567330] [INFO] Worker exiting (pid: 567330) [2025-08-01 00:01:01 +0200] [567278] [ERROR] Worker (pid:567328) was sent SIGTERM! [2025-08-01 00:01:01 +0200] [567278] [ERROR] Worker (pid:567329) was sent SIGTERM! [2025-08-01 00:01:01 +0200] [567278] [ERROR] Worker (pid:567330) was sent SIGTERM! [2025-08-01 00:01:01 +0200] [567278] [ERROR] Worker (pid:567327) was sent SIGTERM! [2025-08-01 00:01:01 +0200] [567278] [INFO] Shutting down: Master Aug 01 00:01:12 2025 (567381) Task runner evicted 2 expired pendings [2025-08-01 00:01:13 +0200] [567379] [INFO] Starting gunicorn 22.0.0 [2025-08-01 00:01:13 +0200] [567379] [INFO] Listening at: http://127.0.0.1:8001 (567379) [2025-08-01 00:01:13 +0200] [567379] [INFO] Using worker: sync [2025-08-01 00:01:13 +0200] [567397] [INFO] Booting worker with pid: 567397 [2025-08-01 00:01:13 +0200] [567398] [INFO] Booting worker with pid: 567398 [2025-08-01 00:01:13 +0200] [567399] [INFO] Booting worker with pid: 567399 [2025-08-01 00:01:13 +0200] [567400] [INFO] Booting worker with pid: 567400 <Ommited GET Request> Aug 01 00:01:13 2025 (567381) Task runner deleted 0 orphaned workflows <Ommited GET Requests> Aug 01 00:01:19 2025 (567381) Task runner deleted 0 orphaned requests <Ommited GET Requesta> [2025-08-01 00:01:34 +0200] [567379] [INFO] Handling signal: term [2025-08-01 00:01:34 +0200] [567397] [INFO] Worker exiting (pid: 567397) [2025-08-01 00:01:34 +0200] [567399] [INFO] Worker exiting (pid: 567399) [2025-08-01 00:01:34 +0200] [567398] [INFO] Worker exiting (pid: 567398) [2025-08-01 00:01:34 +0200] [567400] [INFO] Worker exiting (pid: 567400) [2025-08-01 00:01:34 +0200] [567379] [ERROR] Worker (pid:567399) was sent SIGTERM! [2025-08-01 00:01:34 +0200] [567379] [ERROR] Worker (pid:567398) was sent SIGTERM! [2025-08-01 00:01:34 +0200] [567379] [ERROR] Worker (pid:567397) was sent SIGTERM! [2025-08-01 00:01:34 +0200] [567379] [ERROR] Worker (pid:567400) was sent SIGTERM! [2025-08-01 00:01:34 +0200] [567379] [INFO] Shutting down: Master [2025-08-01 00:01:46 +0200] [567516] [INFO] Starting gunicorn 22.0.0 [2025-08-01 00:01:46 +0200] [567516] [INFO] Listening at: http://127.0.0.1:8001 (567516) [2025-08-01 00:01:46 +0200] [567516] [INFO] Using worker: sync [2025-08-01 00:01:46 +0200] [567525] [INFO] Booting worker with pid: 567525 [2025-08-01 00:01:46 +0200] [567526] [INFO] Booting worker with pid: 567526 [2025-08-01 00:01:46 +0200] [567527] [INFO] Booting worker with pid: 567527 [2025-08-01 00:01:46 +0200] [567528] [INFO] Booting worker with pid: 567528 Aug 01 00:01:47 2025 (567518) Task runner evicted 2 expired pendings Aug 01 00:01:48 2025 (567518) Task runner deleted 0 orphaned workflows <Ommited GET Request> [2025-08-01 00:01:52 +0200] [567516] [INFO] Handling signal: term [2025-08-01 00:01:52 +0200] [567526] [INFO] Worker exiting (pid: 567526) [2025-08-01 00:01:52 +0200] [567525] [INFO] Worker exiting (pid: 567525) [2025-08-01 00:01:52 +0200] [567527] [INFO] Worker exiting (pid: 567527) [2025-08-01 00:01:52 +0200] [567528] [INFO] Worker exiting (pid: 567528) [2025-08-01 00:01:52 +0200] [567516] [ERROR] Worker (pid:567526) was sent SIGTERM! [2025-08-01 00:01:52 +0200] [567516] [ERROR] Worker (pid:567525) was sent SIGTERM! [2025-08-01 00:01:52 +0200] [567516] [ERROR] Worker (pid:567527) was sent SIGTERM! [2025-08-01 00:01:52 +0200] [567516] [ERROR] Worker (pid:567528) was sent SIGTERM! [2025-08-01 00:01:52 +0200] [567516] [INFO] Shutting down: Master Aug 01 00:02:06 2025 (567648) Task runner evicted 2 expired pendings [2025-08-01 00:02:06 +0200] [567646] [INFO] Starting gunicorn 22.0.0 [2025-08-01 00:02:06 +0200] [567646] [INFO] Listening at: http://127.0.0.1:8001 (567646) [2025-08-01 00:02:06 +0200] [567646] [INFO] Using worker: sync [2025-08-01 00:02:06 +0200] [567688] [INFO] Booting worker with pid: 567688 [2025-08-01 00:02:06 +0200] [567689] [INFO] Booting worker with pid: 567689 [2025-08-01 00:02:06 +0200] [567690] [INFO] Booting worker with pid: 567690 [2025-08-01 00:02:06 +0200] [567691] [INFO] Booting worker with pid: 567691 Aug 01 00:02:07 2025 (567648) Task runner deleted 0 orphaned workflows [2025-08-01 00:02:11 +0200] [567646] [INFO] Handling signal: term [2025-08-01 00:02:11 +0200] [567689] [INFO] Worker exiting (pid: 567689) [2025-08-01 00:02:11 +0200] [567688] [INFO] Worker exiting (pid: 567688) [2025-08-01 00:02:11 +0200] [567690] [INFO] Worker exiting (pid: 567690) [2025-08-01 00:02:11 +0200] [567691] [INFO] Worker exiting (pid: 567691) [2025-08-01 00:02:11 +0200] [567646] [ERROR] Worker (pid:567688) was sent SIGTERM! [2025-08-01 00:02:11 +0200] [567646] [ERROR] Worker (pid:567689) was sent SIGTERM! [2025-08-01 00:02:11 +0200] [567646] [ERROR] Worker (pid:567690) was sent SIGTERM! [2025-08-01 00:02:11 +0200] [567646] [ERROR] Worker (pid:567691) was sent SIGTERM! [2025-08-01 00:02:11 +0200] [567646] [INFO] Shutting down: Master [2025-08-01 00:02:24 +0200] [567717] [INFO] Starting gunicorn 22.0.0 [2025-08-01 00:02:24 +0200] [567717] [INFO] Listening at: http://127.0.0.1:8001 (567717) [2025-08-01 00:02:24 +0200] [567717] [INFO] Using worker: sync [2025-08-01 00:02:24 +0200] [567786] [INFO] Booting worker with pid: 567786 [2025-08-01 00:02:24 +0200] [567789] [INFO] Booting worker with pid: 567789 [2025-08-01 00:02:24 +0200] [567792] [INFO] Booting worker with pid: 567792 [2025-08-01 00:02:24 +0200] [567794] [INFO] Booting worker with pid: 567794 <Ommited GET Requests> Aug 01 00:02:25 2025 (567719) Task runner evicted 2 expired pendings Aug 01 00:02:26 2025 (567719) Task runner deleted 0 orphaned workflows <Ommited GET Requests> Aug 01 00:02:33 2025 (567719) Task runner deleted 0 orphaned requests [01/Aug/2025:00:02:35 +0200] "GET /3.1/lists/ifip-tc6@lists.rwth-aachen.de HTTP/1.1" 200 423 "-" "GNU Mailman REST client v3.3.5" [01/Aug/2025:00:02:35 +0200] "GET /3.1/lists/smartlist@lists.rwth-aachen.de HTTP/1.1" 200 438 "-" "GNU Mailman REST client v3.3.5" Aug 01 00:02:42 2025 (567719) Task runner deleted 2 orphaned messages Aug 01 00:02:42 2025 (567719) Task runner deleted 0 orphaned message files Aug 01 00:02:42 2025 (567719) Task runner evicted 2 expired bounce events Aug 01 00:02:42 2025 (567719) Task runner evicted expired cache entries
Well...i will stop the restart after the log rotate today.
IIRC all of the shunted messages that Stephen looked at with qfiles
were those special digest messages (ie, message component empty,
pointer to lists/$LIST/something.mmdf in the msg_data component). So something is going wrong in the to-digest handler.
And for every one of those shunted messages there should be an exception with traceback logged in mailman.log. Those tracebacks should be helpful.
If there were any. Maybe the "debug" level should be "info". But for which logs?
Maybe the restart at night after the lograte maybe the issue.
-- Stephan Krinetzki
IT Center Gruppe: Anwendungsbetrieb und Cloud Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24866 Fax: +49 241 80-22134 krinetzki@itc.rwth-aachen.de www.itc.rwth-aachen.de
Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
-----Original Message----- From: Mark Sapiro <mark@msapiro.net> Sent: Saturday, August 2, 2025 6:11 PM To: Stephen J. Turnbull <steve@turnbull.jp> Cc: mailman-users@mailman3.org Subject: [MM3-users] Re: Held messages not delivered after approval
On 8/2/25 01:51, Stephen J. Turnbull wrote:
IIRC all of the shunted messages that Stephen looked at with qfiles were those special digest messages (ie, message component empty, pointer to lists/$LIST/something.mmdf in the msg_data component). So something is going wrong in the to-digest handler.
And for every one of those shunted messages there should be an exception with traceback logged in mailman.log. Those tracebacks should be helpful.
The number of .tmp files lying around bothers me. AFA grep CS, the only place that can happen is in switchboard.py:136 in .enqueue:
with open(tmpfile, 'wb') as fp: fp.write(msgsave) pickle.dump(data, fp, protocol) fp.flush() os.fsync(fp.fileno()) os.rename(tmpfile, filename)
where
msgsave
is already a pickled object. So either pickle.dump is choking on something in data (the metadata, which I believe is all primitive Python data types), or something (OOM kill?) is happening at the OS level. A crash in pickle.dump should leave an exception log and backtrace in the logs.AFAIK, Mailman does not clean up .tmp files at startup, right?
That is correct. The *.pck.tmp file is created by the above code and immediately after writing is renamed to *.pck. It is done this way to prevent another process picking up a partially written *.pck.
If a *.pck.tmp file is somehow left behind, it is never looked at or deleted by any Mailman code.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
This message sent to krinetzki@itc.rwth-aachen.de

Krinetzki, Stephan writes:
/opt/mailman/var/queue/bad: -rw-rw---- 1 mailman mailman 221723 Aug 1 17:26 1754061973.4885209+46b1ae3716439bf3ef98090296dfce0320fc3017.psv
This one might be spam, but it's weird that it managed to get pickled but can't be read.
-rw-rw---- 1 mailman mailman 32912 Aug 2 00:00 1754085602.191851+3576cf33232db110fa7761233f67245564553652.psv -rw-rw---- 1 mailman mailman 416 Aug 2 00:00 1754085604.0204346+ad485da0c45cb0ad17a5dc42613c3eb3f313c20e.psv -rw-rw---- 1 mailman mailman 1407649 Aug 2 00:00 1754085623.275817+f23139c8127c454b4fe65453af3db18e558b0e87.psv -rw-rw---- 1 mailman mailman 1407634 Aug 2 00:02 1754085729.3529432+1643f907bac39a22a7d71e50b031c4f8a574082c.psv
I have no clue about these four (see below for comments on cron).
/opt/mailman/var/queue/out:
Looks normal for your configuration.
/opt/mailman/var/queue/shunt:
I don't understand why on August 1st you see shunts at intervals throughout the working day, then suddenly on the 2nd they all happen at midnight.
Have you tried "mailman unshunt"? If not what happens when you do? If the shunts are happening because of the restart, then they should go through on unshunt. If they don't, there's some other problem.
You can also try renaming the .psvs to .pck, and check the metadata in the pickle for which queue to move it to. That's more risky, and you shouldn't try it if the output of "mailman qfile" isn't as expected.
I don't see here a problem. But the timestamp seems to be related to the restart of mailman. Can I skip this in the logrotate?
As I mentioned before, there was (and may still be) a bug in Mailman's logging such that Mailman fails to reopen the logs, and typically after a couple of days you end up with a nameless open file collecting the logs and uselessly consuming more and more disk space. The restart is intended to work around this problem.
Btw: The crontab is the following: @daily mailman cd /opt/mailman; source /opt/mailman/mailman-venv/bin/activate; /opt/mailman/mailman-venv/bin/mailman digests --send > /dev/null 2>&1
The django-admin commands aren't directly related, I'm going to ignore them for now. The only thing I know for *sure* runs at midnight daily is "mailman digests --send". On my Debian Linode, the default (which I left alone) is for logrotate's cron job to live in /etc/cron.daily, which is run at 06:25 daily using "run-parts". (This is quite a common setup on Linux.) So we need to know where the logrotate job is specified (crontab, cron.d, or cron.daily) and at what time (@daily = midnight) to be sure that the mailman restart is related to the bad and shunt queue files.
So i checked the mailman.log:
[2025-08-01 00:00:02 +0200] [324558] [INFO] Shutting down: Master [2025-08-01 00:00:23 +0200] [567059] [INFO] Shutting down: Master [2025-08-01 00:00:42 +0200] [567206] [INFO] Shutting down: Master [2025-08-01 00:01:01 +0200] [567278] [INFO] Shutting down: Master [2025-08-01 00:01:34 +0200] [567379] [INFO] Shutting down: Master [2025-08-01 00:01:52 +0200] [567516] [INFO] Shutting down: Master [2025-08-01 00:02:11 +0200] [567646] [INFO] Shutting down: Master
That is not normal. Your control process is crashing every 15-20 seconds. I think it probably is a problem with the digests, not with the restart. What appears to be happening is that the digest process gets triggered, it creates a message and queues it, then fails to send it so nastily that Mailman restarts (or stops and something like systemd restarts it). On restart, Mailman finds the digest message (probably in the out queue), tries to send it again, crashes again, and eventually decides that isn't going to work, sends it to bad, and stops crashing.
There's normally lot more chatter at startup and shutdown, for example about runners being started. That's probably because you have that redirected to a separate log file, or maybe that information doesn't get output with a log level of "warn". Maybe the crash information is in the runner.log.
According to the config you posted earlier, you're sending most channels to separate log files. Have you checked any of them other than mailman.log and smtp.log? Also, note that httpd.log and error.log are normally used by Mailman core's gunicorn (ie, the REST API). I'm not sure what effect directing Mailman's error channel to error.log will have, but I suspect you could end up losing logs or having text from different sources mixed.
I haven't thought about it carefully, but I would have separate logs for bounces, subscriptions, smtp, and nntp because they are quite separate. Everything else would go into mailman.log, because that makes it easier to trace a single message through the whole process. Until you know that you don't need it, I would have most channels at the info level. The debug level is almost never useful unless you're a developer trying to fix something (vs a troubleshooter trying to diagnose the problem). The logs compress very well (often 70% reduction), so it's generally a good idea to include the extra information at info level. Remember, the real explosion is logging is that outgoing mail gets logged up to 43k times per incoming post. Of course you can do quite a bit better if you can sacrifice the personalized footers, but most sites don't anymore because there are strict rules about convenience of unsubscription.
Well...i will stop the restart after the log rotate today.
You can do that if you want, but it's likely that you'll end up losing logs.
And for every one of those shunted messages there should be an exception with traceback logged in mailman.log. Those tracebacks should be helpful.
If there were any. Maybe the "debug" level should be "info". But for which logs?
Setting the channel to "debug" gives maximum verbosity, and unhandled exceptions are logged at "warn" or "error" level (maximum severity).
Maybe the restart at night after the lograte maybe the issue.
Not with Mailman bouncing up and down pretty much as fast as it can. The restart can only account for one restart, the other 6 were caused by something else.
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

The django-admin commands aren't directly related, I'm going to ignore them for now. The only thing I know for *sure* runs at midnight daily is "mailman digests --send". On my Debian Linode, the default (which I left alone) is for logrotate's cron job to live in >/etc/cron.daily, which is run at 06:25 daily using "run-parts". (This is quite a common setup on Linux.) So we need to know where the logrotate job is specified (crontab, cron.d, or cron.daily) and at what time (@daily = midnight) to be sure that the mailman restart is related to the bad and shunt queue files.
The logrotate is executed by a system Timer (Rocky 9 OS btw) and is planned for:
Tue 2025-08-05 00:00:00 CEST 7h left Mon 2025-08-04 00:00:00 CEST 16h ago logrotate.timer logrotate.service
So every day at midnight.
That is not normal. Your control process is crashing every 15-20 seconds. I think it probably is a problem with the digests, not with the restart. What appears to be happening is that the digest process gets triggered, it creates a message and queues it, then fails to >send it so nastily that Mailman restarts (or stops and something like systemd restarts it). On restart, Mailman finds the digest message (probably in the out queue), tries to send it again, crashes again, and eventually decides that isn't going to work, sends it to bad, >and stops crashing.
I saw this but I don't have any idea how this happens. Currently there are ~42 Mails after 'mailman unshunt' and I think, mailman loops over them (queue doesn't get shorter). But mails are delivered for a lot of lists.
According to the config you posted earlier, you're sending most channels to separate log files. Have you checked any of them other than mailman.log and smtp.log? Also, note that httpd.log and error.log are normally used by Mailman core's gunicorn (ie, the REST >API). I'm not sure what effect directing Mailman's error channel to error.log will have, but I suspect you could end up losing logs or having text from different sources mixed.
So I should update my logging config. Do you have a good example or maybe even the dist?
-- Stephan Krinetzki
IT Center Gruppe: Anwendungsbetrieb und Cloud Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24866 Fax: +49 241 80-22134 krinetzki@itc.rwth-aachen.de www.itc.rwth-aachen.de
Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
-----Original Message----- From: Stephen J. Turnbull <steve@turnbull.jp> Sent: Monday, August 4, 2025 1:51 PM To: Krinetzki, Stephan <Krinetzki@itc.rwth-aachen.de> Cc: Mark Sapiro <mark@msapiro.net>; mailman-users@mailman3.org Subject: RE: [MM3-users] Re: Held messages not delivered after approval
Krinetzki, Stephan writes:
/opt/mailman/var/queue/bad: -rw-rw---- 1 mailman mailman 221723 Aug 1 17:26 1754061973.4885209+46b1ae3716439bf3ef98090296dfce0320fc3017.psv
This one might be spam, but it's weird that it managed to get pickled but can't be read.
-rw-rw---- 1 mailman mailman 32912 Aug 2 00:00 1754085602.191851+3576cf33232db110fa7761233f67245564553652.psv -rw-rw---- 1 mailman mailman 416 Aug 2 00:00 1754085604.0204346+ad485da0c45cb0ad17a5dc42613c3eb3f313c20e.psv -rw-rw---- 1 mailman mailman 1407649 Aug 2 00:00 1754085623.275817+f23139c8127c454b4fe65453af3db18e558b0e87.psv -rw-rw---- 1 mailman mailman 1407634 Aug 2 00:02 1754085729.3529432+1643f907bac39a22a7d71e50b031c4f8a574082c.psv
I have no clue about these four (see below for comments on cron).
/opt/mailman/var/queue/out:
Looks normal for your configuration.
/opt/mailman/var/queue/shunt:
I don't understand why on August 1st you see shunts at intervals throughout the working day, then suddenly on the 2nd they all happen at midnight.
Have you tried "mailman unshunt"? If not what happens when you do? If the shunts are happening because of the restart, then they should go through on unshunt. If they don't, there's some other problem.
You can also try renaming the .psvs to .pck, and check the metadata in the pickle for which queue to move it to. That's more risky, and you shouldn't try it if the output of "mailman qfile" isn't as expected.
I don't see here a problem. But the timestamp seems to be related > to the restart of mailman. Can I skip this in the logrotate?
As I mentioned before, there was (and may still be) a bug in Mailman's logging such that Mailman fails to reopen the logs, and typically after a couple of days you end up with a nameless open file collecting the logs and uselessly consuming more and more disk space. The restart is intended to work around this problem.
Btw: The crontab is the following: @daily mailman cd /opt/mailman; source /opt/mailman/mailman-venv/bin/activate; /opt/mailman/mailman-venv/bin/mailman digests --send > /dev/null 2>&1
The django-admin commands aren't directly related, I'm going to ignore them for now. The only thing I know for *sure* runs at midnight daily is "mailman digests --send". On my Debian Linode, the default (which I left alone) is for logrotate's cron job to live in /etc/cron.daily, which is run at 06:25 daily using "run-parts". (This is quite a common setup on Linux.) So we need to know where the logrotate job is specified (crontab, cron.d, or cron.daily) and at what time (@daily = midnight) to be sure that the mailman restart is related to the bad and shunt queue files.
So i checked the mailman.log:
[2025-08-01 00:00:02 +0200] [324558] [INFO] Shutting down: Master > [2025-08-01 00:00:23 +0200] [567059] [INFO] Shutting down: Master > [2025-08-01 00:00:42 +0200] [567206] [INFO] Shutting down: Master > [2025-08-01 00:01:01 +0200] [567278] [INFO] Shutting down: Master > [2025-08-01 00:01:34 +0200] [567379] [INFO] Shutting down: Master > [2025-08-01 00:01:52 +0200] [567516] [INFO] Shutting down: Master > [2025-08-01 00:02:11 +0200] [567646] [INFO] Shutting down: Master
That is not normal. Your control process is crashing every 15-20 seconds. I think it probably is a problem with the digests, not with the restart. What appears to be happening is that the digest process gets triggered, it creates a message and queues it, then fails to send it so nastily that Mailman restarts (or stops and something like systemd restarts it). On restart, Mailman finds the digest message (probably in the out queue), tries to send it again, crashes again, and eventually decides that isn't going to work, sends it to bad, and stops crashing.
There's normally lot more chatter at startup and shutdown, for example about runners being started. That's probably because you have that redirected to a separate log file, or maybe that information doesn't get output with a log level of "warn". Maybe the crash information is in the runner.log.
According to the config you posted earlier, you're sending most channels to separate log files. Have you checked any of them other than mailman.log and smtp.log? Also, note that httpd.log and error.log are normally used by Mailman core's gunicorn (ie, the REST API). I'm not sure what effect directing Mailman's error channel to error.log will have, but I suspect you could end up losing logs or having text from different sources mixed.
I haven't thought about it carefully, but I would have separate logs for bounces, subscriptions, smtp, and nntp because they are quite separate. Everything else would go into mailman.log, because that makes it easier to trace a single message through the whole process. Until you know that you don't need it, I would have most channels at the info level. The debug level is almost never useful unless you're a developer trying to fix something (vs a troubleshooter trying to diagnose the problem). The logs compress very well (often 70% reduction), so it's generally a good idea to include the extra information at info level. Remember, the real explosion is logging is that outgoing mail gets logged up to 43k times per incoming post. Of course you can do quite a bit better if you can sacrifice the personalized footers, but most sites don't anymore because there are strict rules about convenience of unsubscription.
Well...i will stop the restart after the log rotate today.
You can do that if you want, but it's likely that you'll end up losing logs.
And for every one of those shunted messages there should be an > >exception with traceback logged in mailman.log. Those tracebacks > >should be helpful.
If there were any. Maybe the "debug" level should be "info". But > for which logs?
Setting the channel to "debug" gives maximum verbosity, and unhandled exceptions are logged at "warn" or "error" level (maximum severity).
Maybe the restart at night after the lograte maybe the issue.
Not with Mailman bouncing up and down pretty much as fast as it can. The restart can only account for one restart, the other 6 were caused by something else.
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

Krinetzki, Stephan writes:
The logrotate is executed by a system Timer (Rocky 9 OS btw) and is planned for: So every day at midnight.
OK, that's confirmed then.
That is not normal. Your control process is crashing every 15-20 seconds.
I saw this but I don't have any idea how this happens.
Well, neither do we. As I say it's not normal.
To investigate, first, you can move either the logrotate job or the digest-sending job to a different time. That would make it clear which event (or both) is connected to the problem.
Second, find those exception tracebacks. They may be in the logs somewhere. Reenable email reports on the send-digest cron job, that's now a high-probability source of problem information. There is no top-level logging to files in that job, although there may be some internal modules that do logging. Most likely it just dumps exception information to stdout or stderr, which is currently going to /dev/null.
Third, you can try moving the $mailman_home/lists/$list/digest.mmdf files for the problem lists. Don't delete them, just get them out of the way. If mail starts flowing to the lists we can be pretty sure content in the digest files is at least part of the issue.
Currently there are ~42 Mails after 'mailman unshunt' and I think, mailman loops over them (queue doesn't get shorter).
That's what I expected. Pretty sure there's an issue with the content of some messages and/or the digest files. Your situation may be different, but in the past that's almost always been the cause of inconsistency across time or across lists.
But mails are delivered for a lot of lists.
That would be the case if the problem lists in question have invalid content and the ones being delivered don't.
So I should update my logging config. Do you have a good example or maybe even the dist?
You quoted it:
I haven't thought about it carefully, but I would have separate logs for bounces, subscriptions, smtp, and nntp because they are quite separate. Everything else would go into mailman.log, because that makes it easier to trace a single message through the whole process. Until you know that you don't need it, I would have most channels at the info level. The debug level is almost never useful unless you're a developer trying to fix something (vs a troubleshooter trying to diagnose the problem). The logs compress very well (often 70% reduction), so it's generally a good idea to include the extra information at info level. Remember, the real explosion is logging is that outgoing mail gets logged up to 43k times per incoming post. Of course you can do quite a bit better if you can sacrifice the personalized footers, but most sites don't anymore because there are strict rules about convenience of unsubscription.
Well...i will stop the restart after the log rotate today.
You can do that if you want, but it's likely that you'll end up losing logs.
By "losing logs", I mean whole files.
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

On 8/4/25 00:21, Krinetzki, Stephan wrote:
And for every one of those shunted messages there should be an exception with traceback logged in mailman.log. Those tracebacks should be helpful.
If there were any. Maybe the "debug" level should be "info". But for which logs?
The standard logging levels from lowest to highest are
debug info warning error critical
Whatever level is set for a log results in all messages of that level or higher being logged. I.e. if the log's level is debug, all messages for that log of any level should be logged.
For every shunted message, a message like
SHUNTING: <file name without the .pck extention>
preceded by the exception and traceback is logged to error.log with
level error. See
https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/core/runner.py?...
Maybe the restart at night after the lograte maybe the issue.
As I said before, blindly restarting Mailman is a bad idea. On servers that I maintain, I always verify that all queues are empty before stopping or restarting Mailman. If necessary, I'll kill the incoming runner and wait for the out queue to empty and then stop mailman. If you want to do this daily, you could automate that., e.g.
if queues empty:
restart Mailman
else:
when in queue is empty, sigterm incoming runner
when out queue is empty, stop Mailman
when Mailman stopped, start Mailman
The stop/start is needed because a simple restart at that point won't start the sigtermed incoming runner.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi Mark, Hi Stephen,
so I updated now my configuration a bit. The logging is now default:
[logging.archiver] datefmt: %b %d %H:%M:%S %Y [logging.archiver] format: %(asctime)s (%(process)d) %(message)s [logging.archiver] level: info [logging.archiver] path: mailman.log [logging.archiver] propagate: no [logging.bounce] datefmt: %b %d %H:%M:%S %Y [logging.bounce] format: %(asctime)s (%(process)d) %(message)s [logging.bounce] level: info [logging.bounce] path: bounce.log [logging.bounce] propagate: no [logging.config] datefmt: %b %d %H:%M:%S %Y [logging.config] format: %(asctime)s (%(process)d) %(message)s [logging.config] level: info [logging.config] path: mailman.log [logging.config] propagate: no [logging.database] datefmt: %b %d %H:%M:%S %Y [logging.database] format: %(asctime)s (%(process)d) %(message)s [logging.database] level: warn [logging.database] path: mailman.log [logging.database] propagate: no [logging.debug] datefmt: %b %d %H:%M:%S %Y [logging.debug] format: %(asctime)s (%(process)d) %(message)s [logging.debug] level: info [logging.debug] path: debug.log [logging.debug] propagate: no [logging.error] datefmt: %b %d %H:%M:%S %Y [logging.error] format: %(asctime)s (%(process)d) %(message)s [logging.error] level: info [logging.error] path: mailman.log [logging.error] propagate: no [logging.fromusenet] datefmt: %b %d %H:%M:%S %Y [logging.fromusenet] format: %(asctime)s (%(process)d) %(message)s [logging.fromusenet] level: info [logging.fromusenet] path: mailman.log [logging.fromusenet] propagate: no [logging.gunicorn] datefmt: %b %d %H:%M:%S %Y [logging.gunicorn] format: %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s" [logging.gunicorn] level: info [logging.gunicorn] path: mailman.log [logging.gunicorn] propagate: no [logging.http] datefmt: %b %d %H:%M:%S %Y [logging.http] format: %(asctime)s (%(process)d) %(message)s [logging.http] level: info [logging.http] path: mailman.log [logging.http] propagate: no [logging.locks] datefmt: %b %d %H:%M:%S %Y [logging.locks] format: %(asctime)s (%(process)d) %(message)s [logging.locks] level: info [logging.locks] path: mailman.log [logging.locks] propagate: no [logging.mischief] datefmt: %b %d %H:%M:%S %Y [logging.mischief] format: %(asctime)s (%(process)d) %(message)s [logging.mischief] level: info [logging.mischief] path: mailman.log [logging.mischief] propagate: no [logging.plugins] datefmt: %b %d %H:%M:%S %Y [logging.plugins] format: %(asctime)s (%(process)d) %(message)s [logging.plugins] level: info [logging.plugins] path: plugins.log [logging.plugins] propagate: no [logging.root] datefmt: %b %d %H:%M:%S %Y [logging.root] format: %(asctime)s (%(process)d) %(message)s [logging.root] level: info [logging.root] path: mailman.log [logging.root] propagate: no [logging.runner] datefmt: %b %d %H:%M:%S %Y [logging.runner] format: %(asctime)s (%(process)d) %(message)s [logging.runner] level: info [logging.runner] path: mailman.log [logging.runner] propagate: no [logging.smtp] datefmt: %b %d %H:%M:%S %Y [logging.smtp] every: $msgid smtp to $listname for $recip recips, completed in $time seconds [logging.smtp] failure: $msgid delivery to $recip failed with code $smtpcode, $smtpmsg [logging.smtp] format: %(asctime)s (%(process)d) %(message)s [logging.smtp] level: info [logging.smtp] path: smtp.log [logging.smtp] propagate: no [logging.smtp] refused: $msgid post to $listname from $sender, $size bytes, $refused failures [logging.smtp] success: $msgid post to $listname from $sender, $size bytes [logging.subscribe] datefmt: %b %d %H:%M:%S %Y [logging.subscribe] format: %(asctime)s (%(process)d) %(message)s [logging.subscribe] level: info [logging.subscribe] path: mailman.log [logging.subscribe] propagate: no [logging.task] datefmt: %b %d %H:%M:%S %Y [logging.task] format: %(asctime)s (%(process)d) %(message)s [logging.task] level: info [logging.task] path: mailman.log [logging.task] propagate: no [logging.vette] datefmt: %b %d %H:%M:%S %Y [logging.vette] format: %(asctime)s (%(process)d) %(message)s [logging.vette] level: info [logging.vette] path: mailman.log [logging.vette] propagate: no
Further, I edited the logrotate:
/var/log/mailman/mailman-logs/*.log {
missingok
daily
compress
delaycompress
nomail
notifempty
rotate 14
dateext
su mailman mailman
olddir /var/log/mailman/mailman-logs/oldlogs
postrotate
/bin/kill -HUP cat /opt/mailman/var/master.pid 2>/dev/null
2>/dev/null || true
# Don't run "mailman3 reopen" with SELinux on here in the logrotate
# context, it will be blocked
/opt/mailman/mailman-venv/bin/mailman reopen >/dev/null 2>&1 || true
endscript
}
It is now more like the Fedora one and seems to be better.
Now in the mailman.log I see the HOLD messages and the approved messages. The smtp.log logs just the incoming Mail, which seems to be fine.
So, here a complete Trace:
smtp.log:
Aug 05 10:18:40 2025 (217984) Available AUTH mechanisms: LOGIN(builtin) PLAIN(builtin) Aug 05 10:18:40 2025 (217984) Peer: ('127.0.0.1', 57074) Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) handling connection Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) >> b'LHLO lists.rwth-aachen.de' Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) >> b'MAIL FROM:<SENDER> SIZE=15187' Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) sender: SENDER Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) >> b'RCPT TO:<stephansmodliste@lists.example.com>' Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) recip: stephansmodliste@lists.example.com Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) >> b'DATA' Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) >> b'QUIT' Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) connection lost Aug 05 10:18:40 2025 (217984) ('127.0.0.1', 57074) Connection lost during _handle_client()
The mailman.log:
Aug 05 10:18:40 2025 (217983) HOLD: stephansmodliste@lists.example.com post from SENDER held, message-id=<b93e46d8342f42039c99e1d2d036c711@SENDERDOMAIN>: The message is not from a list member Aug 05 10:21:04 2025 (218015) held message approved, message-id: <b93e46d8342f42039c99e1d2d036c711@SENDERDOMAIN> [05/Aug/2025:10:21:04 +0200] "POST /3.1/lists/stephansmodliste@lists.example.com/held/211056 HTTP/1.1" 204 0 "-" "GNU Mailman REST client v3.3.5" [05/Aug/2025:10:21:04 +0200] "GET /3.1/lists/stephansmodliste@lists.example.com/held?count=0&page=1 HTTP/1.1" 200 90 "-" "GNU Mailman REST client v3.3.5" [05/Aug/2025:10:21:04 +0200] "GET /3.1/lists/stephansmodliste@lists.example.com/held?count=10&page=1 HTTP/1.1" 200 90 "-" "GNU Mailman REST client v3.3.5" [05/Aug/2025:10:21:04 +0200] "GET /3.1/lists/stephansmodliste@lists.example.com/requests/count?token_owner=moderator HTTP/1.1" 200 73 "-" "GNU Mailman REST client v3.3.5" [05/Aug/2025:10:21:04 +0200] "GET /3.1/lists/stephansmodliste@lists.example.com/held/count HTTP/1.1" 200 73 "-" "GNU Mailman REST client v3.3.5" Aug 05 10:21:05 2025 (217986) Cannot connect to SMTP server localhost on port 25
The last error is a very often in the mailman.log: Aug 05 09:46:12 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 09:46:13 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 09:46:13 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 09:46:14 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 09:46:24 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 09:50:22 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 09:50:24 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 09:53:30 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 09:55:01 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 09:55:01 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 09:55:09 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 09:57:44 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 09:57:46 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 09:58:50 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 09:58:52 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 09:58:52 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 09:58:53 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 09:58:53 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 09:58:53 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 10:01:18 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:02:02 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 10:07:11 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 10:10:36 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:11:35 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:16:04 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:16:05 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 10:16:07 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:16:58 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 10:17:42 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:18:30 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:20:13 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:21:05 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:08 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:08 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:31 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:37 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:44 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:47 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:50 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:28:57 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:29:22 2025 (217988) Cannot connect to SMTP server localhost on port 25 Aug 05 10:29:40 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:29:41 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:30:37 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:30:37 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:30:50 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:31:15 2025 (217987) Cannot connect to SMTP server localhost on port 25 Aug 05 10:33:42 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:33:53 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:34:27 2025 (217985) Cannot connect to SMTP server localhost on port 25 Aug 05 10:34:28 2025 (217986) Cannot connect to SMTP server localhost on port 25 Aug 05 10:38:25 2025 (230250) Cannot connect to SMTP server lists.example.com on port 25 Aug 05 10:38:25 2025 (230247) Cannot connect to SMTP server lists.example.com on port 25
Even after changing the smtp_host in mailman.cfg this error appears randomly.
Sadly the mail does not get delivered.
-- Stephan Krinetzki
IT Center Gruppe: Anwendungsbetrieb und Cloud Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24866 Fax: +49 241 80-22134 krinetzki@itc.rwth-aachen.de www.itc.rwth-aachen.de
Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
-----Original Message----- From: Mark Sapiro <mark@msapiro.net> Sent: Tuesday, August 5, 2025 12:51 AM To: mailman-users@mailman3.org Subject: [MM3-users] Re: Held messages not delivered after approval
On 8/4/25 00:21, Krinetzki, Stephan wrote:
And for every one of those shunted messages there should be an exception with traceback logged in mailman.log. Those tracebacks should be helpful.
If there were any. Maybe the "debug" level should be "info". But for which logs?
The standard logging levels from lowest to highest are
debug info warning error critical
Whatever level is set for a log results in all messages of that level or higher being logged. I.e. if the log's level is debug, all messages for that log of any level should be logged.
For every shunted message, a message like
SHUNTING: <file name without the .pck extention>
preceded by the exception and traceback is logged to error.log with level error. See
https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/core/runner.py?...
Maybe the restart at night after the lograte maybe the issue.
As I said before, blindly restarting Mailman is a bad idea. On servers that I maintain, I always verify that all queues are empty before stopping or restarting Mailman. If necessary, I'll kill the incoming runner and wait for the out queue to empty and then stop mailman. If you want to do this daily, you could automate that., e.g.
if queues empty:
restart Mailman
else:
when in queue is empty, sigterm incoming runner
when out queue is empty, stop Mailman
when Mailman stopped, start Mailman ``` The stop/start is needed because a simple restart at that point won't start the sigtermed incoming runner.
--
Mark Sapiro <mark@msapiro.net> The highway is for gamblers,
San Francisco Bay Area, California better use your sense - B. Dylan
_______________________________________________
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/ONWPQZTOPHCVJ6DWAUCN5Y6MBHU7GBLD/
This message sent to krinetzki@itc.rwth-aachen.de

Krinetzki, Stephan writes:
Now in the mailman.log I see the HOLD messages and the approved messages. The smtp.log logs just the incoming Mail, which seems to be fine.
By that you mean the incoming mail is fine? Or smtp.log is fine? smtp.log should include two (or more) lines for each successful post, like
Jan 02 16:09:17 2025 (42985) <43007@turnbull> smtp to list-1@turnbull.jp for 1 recips, completed in 0.6027507781982422 seconds Jan 02 16:09:17 2025 (42985) <43007@turnbull> post to list-1@turnbull.jp from list-1-owner@turnbull.jp, 992 bytes, 1 failures Jan 02 16:09:17 2025 (42985) <43007@turnbull> delivery to list-1-owner@turnbull.jp failed with code 550, b'5.1.1 <list-1-owner@turnbull.jp>: Recipient address rejected: User unknown in local recipient table'
(That last line shows an example of a failure report.)
Aug 05 10:21:05 2025 (217986) Cannot connect to SMTP server localhost on port 25
Postfix for some reason doesn't know about the alias "localhost" (it has something to do with trusting the DNS and not reading the /etc/hosts file). 127.0.0.1 works.
I don't understand how any lists' posts get delivered at all with that. If it's postfix it's possible that the connection pool is exhausted intermittently (but that wouldn't explain why it's consistently impossible to post to a subset of lists). You can specify that connections from the loopback interfaces are less restricted. That might help.
Even after changing the smtp_host in mailman.cfg this error appears randomly.
Note that
Sadly the mail does not get delivered.
:-(
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

On 8/5/25 02:06, Krinetzki, Stephan wrote:
Aug 05 10:21:05 2025 (217986) Cannot connect to SMTP server localhost on port 25
The last error is a very often in the mailman.log:
That's a misleading message. It's been improved by
https://gitlab.com/mailman/mailman/-/merge_requests/1362, however
usually the underlying issue is inability to access a header or footer
template when decorating the message. This in turn is most likely caused
by some template(s) in Mailman's var/templates/ directory being owned by
root
or some other non-mailman user and not readable by the Mailman
user which in turn most likely results from running mailman import21
as root.
Bottom line - check that all sub-directories and files in Mailman's var/templates/ directory are searchable and readable by the Mailman user.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Hi Mark,
That's a misleading message. It's been improved by https://gitlab.com/mailman/mailman/-/merge_requests/1362, however usually the underlying issue is inability to access a header or footer template when decorating the message. This in turn is most likely >caused by some template(s) in Mailman's var/templates/ directory being owned by
root
or some other non-mailman user and not readable by the Mailman user which in turn most likely results from runningmailman import21
as root.
You are my hero! After "patching" the outgoing.py file with the error messages I got
Aug 06 09:52:51 2025 (443278) Cannot connect to SMTP server 127.0.0.1 on port 25: HTTPSConnectionPool(host='lists.example.com', port=443): Max retries exceeded with url: /postorius/api/templates/list/hpc-admintreffen-info.lists.example.com/list:member:regular:footer (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)')))
And was shocked. Why is there an SSL Certificate Error? Why can't python requests verify the certificate?
So I googled a bit and found, that we can fix this be editing the cacert.pem of certifi. I did, but the error was still there. So I checked my https VHost (Apache btw) and there was a wrong chain in it. Oof. An error, that a browser does not see, but any ssl client. I fixed it and boom - everything works again.
Thank you again and thank you Stephen for trying to resolve the problem 😊
-- Stephan Krinetzki
IT Center Gruppe: Anwendungsbetrieb und Cloud Abteilung: Systeme und Betrieb RWTH Aachen University Seffenter Weg 23 52074 Aachen Tel: +49 241 80-24866 Fax: +49 241 80-22134 krinetzki@itc.rwth-aachen.de www.itc.rwth-aachen.de
Social Media Kanäle des IT Centers: https://blog.rwth-aachen.de/itc/ https://www.facebook.com/itcenterrwth https://www.linkedin.com/company/itcenterrwth https://twitter.com/ITCenterRWTH https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
-----Original Message----- From: Mark Sapiro <mark@msapiro.net> Sent: Tuesday, August 5, 2025 6:19 PM To: mailman-users@mailman3.org Subject: [MM3-users] Re: Held messages not delivered after approval
On 8/5/25 02:06, Krinetzki, Stephan wrote:
Aug 05 10:21:05 2025 (217986) Cannot connect to SMTP server localhost on port 25
The last error is a very often in the mailman.log:
That's a misleading message. It's been improved by https://gitlab.com/mailman/mailman/-/merge_requests/1362, however usually the underlying issue is inability to access a header or footer template when decorating the message. This in turn is most likely caused by some template(s) in Mailman's var/templates/ directory being owned by root
or some other non-mailman user and not readable by the Mailman user which in turn most likely results from running mailman import21
as root.
Bottom line - check that all sub-directories and files in Mailman's var/templates/ directory are searchable and readable by the Mailman user.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
This message sent to krinetzki@itc.rwth-aachen.de

I see Mark answered some of the same questions already, but it would be really painstaking to avoid duplication (it took more than an hour to write this :-), so I'm just gonna scan it quickly and then send.
Stephan Krinetzki writes:
My queues:
(Omitting the empty ones.)
/opt/mailman/var/queue/out: total 2868 drwxrwx--- 2 mailman mailman 4096 Jul 31 13:26 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 221708 Jul 30 13:16 1753874180.0845337+0cc7849043859a79dc3678a0d8b63c1c66df0c66.pck.tmp -rw-rw---- 1 mailman mailman 733425 Jul 31 00:00 1753912841.8847518+da55f8789ae41b75d19a02b595e3fd6d45983ade.pck.tmp -rw-rw---- 1 mailman mailman 17758 Jul 31 13:26 1753961185.0481033+6adf966467266567275f1146f5054c95e4365c13.pck
Those two .pck.tmp files are bad news. They indicate that Mailman was trying to do something with those messages, and the process was interrupted. You should check whether there was a Mailman restart at those times (although it should shutdown gracefully and not leave .tmp files behind), or if a runner crashed.
The .tmp files *may* be deliverable, but you'd have to look at them to be sure that they are complete. It's possible that they have been delivered already and the .tmp files just need to be removed. You can look at them with "mailman qfile" same as always, "qfile" doesn't check the filename extension. If they haven't been delivered and a careful check shows they're intact, just renaming without the .tmp will cause them to go to the head of the queue.
It's also odd that the .pck above precedes the ,bak below (unless you have multiple slices for the out queue?)
The rest of the queue looks normal, except that it seems rather long. I only see queues that long when the outgoing MTA is borked. (Although my experience with high-traffic systems is restricted to helping a couple of folks for whom there was zero cost to adding CPUs and memory to their VMs, I feel better that Mark picked up on this too.) You might want to reconfigure Mailman to use more out slices, but that depends on what else your MTA should be using its bandwidth for. Because of the way the slicing algorithm works, the number of slices needs to be a power of 2, so the number of simultaneous connections Mailman makes to the MTA will double. (I don't think there's any point to more than 4 slices unless you're doing more than one incoming post/second.)
-rw-rw---- 1 mailman mailman 31122 Jul 31 13:26 1753961185.1064982+b1f502e2af56b9b11680135c6de5fcc5285d967e.bak
This .bak file is currently being processed by Mailman, it's normal. The rest of the .pck files are also normal, just waiting. (Omitting the rest of the out queue listing.)
/opt/mailman/var/queue/shunt: total 3304 drwxrwx--- 2 mailman mailman 4096 Jul 31 11:44 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 451 Jul 31 00:00 1753912822.2651796+28eceef7e18eb70393377b88dc7117af8f9362a0.pck -rw-rw---- 1 mailman mailman 490 Jul 31 00:00 1753912838.4197352+ea531cf0262c1faa58b1679b907fee92bc16822c.pck -rw-rw---- 1 mailman mailman 1407870 Jul 31 00:00 1753912841.9177196+ccea15bdefce3a54301281c8eddf86e8230244a6.pck -rw-rw---- 1 mailman mailman 86108 Jul 31 00:00 1753912841.9197443+7dcef4febc71e44c6d9309a24a08b08753e1ff42.pck -rw-rw---- 1 mailman mailman 1407668 Jul 31 00:00 1753912841.9849963+a3f1869b750060c97262ece38737480d91652828.pck -rw-rw---- 1 mailman mailman 38992 Jul 31 00:01 1753912860.5167956+940584c4f361cbd8c29e390b2f60590558effe40.pck -rw-rw---- 1 mailman mailman 440 Jul 31 00:01 1753912860.6972685+635c065bac8dff5f9d562275d707001d773b84c1.pck -rw-rw---- 1 mailman mailman 445 Jul 31 00:01 1753912868.7903054+befa066f254d7a3529a8555a6c942a554715d837.pck -rw-rw---- 1 mailman mailman 33494 Jul 31 00:01 1753912878.7562895+82b724fd93260ab9a2bb49709d3a42a2f32f2c80.pck -rw-rw---- 1 mailman mailman 217073 Jul 31 00:01 1753912878.9337828+de73a65d9c6febfa80275853921f4b53fd1d9e2a.pck -rw-rw---- 1 mailman mailman 85888 Jul 31 00:02 1753912950.303359+8a87bce0be63ac1df8493c6b1ad6ae154fcedba7.pck -rw-rw---- 1 mailman mailman 50244 Jul 31 00:02 1753912950.4970112+d44d493912bb3024547b8a5112f86f035dcb352f.pck -rw-rw---- 1 mailman mailman 12887 Jul 31 00:02 1753912970.038427+31bcbd7fb2ebdf81f6de24b7283b50bcda6ded21.pck.tmp -rw-rw---- 1 mailman mailman 443 Jul 31 11:44 1753955094.900898+67bc76525412da66a7c76363f65f583989716305.pck
/opt/mailman/var/queue/virgin: total 32 drwxrwx--- 2 mailman mailman 81 Jul 31 13:17 . drwxr-xr-x 14 mailman mailman 165 Jun 27 2024 .. -rw-rw---- 1 mailman mailman 32013 Jan 11 2025 1736550035.6163204+472f81ece5e45a2651a4499bef418f611b43c619.pck.tmp
Nothing special there (shunt should be checked, but not in correlation with my mail).
I tend to disagree, as the first series of shunt files ends with a .tmp. There's another one of those .tmp files in virgin, and it's 6 months old. Hmmm, that one is *also* on the hour. You got lots of cron jobs that run on the hour, maybe?
You're probably right that there's no correlation, but you can't trust the dates from ls -l or stat because when running "mailman unshunt" all of the queue files in shunt will get "touched" if they're not sent. (If I recall correctly.) The fact that a spate of timestamps occur right at 00:00 means either there's a cron job running unshunt then, or you have a spammer or similar sending a bunch of broken mail to you on the hour. (I say you're probably right because the time stamp in the name decodes to the same time, and I don't think that changes when unshunt is run.) And again, you have a stale .tmp file there, which means something bad happened, most likely not under Mailman's control.
There is (maybe was. by now?) a bug in the logging such that logs did not get properly rotated. Many sites dealt with this by restarting Mailman with the same period of the log rotation. Do you do that? (I'm just fishing, I don't know how it could cause the main issue you are seeing.)
mailq is empty, so my postfix works as expected.
Hm. Those "high traffic" sites I mentioned, it was the other way around: with 4 (or 8) "out" slices, the out queue would be clear >80% of the time (according to "while 1; do ls -l $OUTQUEUE; sleep 5; done", nothing sophisticated). But the MTA's mail queue would typically backlog many minutes. As I said, the Mailman hosts at those sites were insanely overpowered, so your mileage will vary.
I have to think that there is a problem in the handoff between Mailman and the MTA. Why Mailman is not preserving the queuefile or alternatively logging a successful delivery to the MTA I don't have any idea off hand. I have to think the queue runner is crashing, but that doesn't explain why this happens only to certain lists.
Is Postfix delivering to the final destination itself, or does it pass on the messages to a smarthost? Is Mailman talking to the local MTA, or is it possibly talking to an MTA on a different node? I did have to diagnose a problem once where a system was misconfigured, and Mailman was talking not to the local Postfix but to a Postfix in a datacenter a megameter or so away! (That didn't lose any mail, but the connection would occasionally freeze and not time out, leading to a huge build up in the out queue.) Anyway, if Mailman isn't talking to an MTA with a <50ms ping time, you could try changing the configuration so it does.
I don't put much stock in any of the above ideas. I hope that you or somebody come up with better ones!
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

First: Thank you all for your input! I hope that we will get down to the root of the problem.
I've checked all cronjobs and there is no cronjob that starts at midnight. All jobs are running later. So i looked into /var/log/messages. The earliest entry there is
Jul 31 23:56:16 iris-s01 bash[324554]: Jul 31 23:56:16 2025 (324554) Starting new HTTPS connection (1): lists.example.com:443 Jul 31 23:56:16 iris-s01 bash[324553]: Jul 31 23:56:16 2025 (324553) Starting new HTTPS connection (1): lists.example.com:443 Jul 31 23:56:16 iris-s01 bash[324556]: Jul 31 23:56:16 2025 (324556) Starting new HTTPS connection (1): lists.example.com:443 Jul 31 23:56:16 iris-s01 bash[324554]: Jul 31 23:56:16 2025 (324554) Starting new HTTP connection (1): localhost:8000 Jul 31 23:56:16 iris-s01 bash[324553]: Jul 31 23:56:16 2025 (324553) Starting new HTTP connection (1): localhost:8000 Jul 31 23:56:16 iris-s01 bash[324556]: Jul 31 23:56:16 2025 (324556) Starting new HTTP connection (1): localhost:8000 Jul 31 23:56:16 iris-s01 bash[324554]: Jul 31 23:56:16 2025 (324554) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=rzcluster%40lists.example.com&msgid=c6140b6e649f46ee8a97aec5c5b80584%40sub.example.com HTTP/11" 200 126 Jul 31 23:56:16 iris-s01 bash[324553]: Jul 31 23:56:16 2025 (324553) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=ww10%40lists.example.com&msgid=1f86b244cce9479490b4bffe184f6c03%40sub.example.com HTTP/11" 200 121 Jul 31 23:56:16 iris-s01 bash[324554]: Jul 31 23:56:16 2025 (324554) Starting new HTTPS connection (1): lists.example.com:443 Jul 31 23:56:16 iris-s01 bash[324556]: Jul 31 23:56:16 2025 (324556) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=fg-informatik-ma-wiss%40lists.example.com&msgid=914dbace411247d7983b5dff8fa34bda%40i3.informatik.rwth-aachen.de HTTP/11" 200 138 Jul 31 23:56:16 iris-s01 bash[324553]: Jul 31 23:56:16 2025 (324553) Starting new HTTPS connection (1): lists.example.com:443 Jul 31 23:56:16 iris-s01 bash[324556]: Jul 31 23:56:16 2025 (324556) Starting new HTTPS connection (1): lists.example.com:443 Jul 31 23:56:16 iris-s01 bash[324553]: Jul 31 23:56:16 2025 (324553) Starting new HTTP connection (1): localhost:8000 Jul 31 23:56:16 iris-s01 bash[324553]: Jul 31 23:56:16 2025 (324553) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=ww10%40lists.example.com&msgid=bd99715d619744699ddf806fd3bd67b1%40sub.example.com HTTP/11" 200 121 Aug 1 00:02:48 iris-s01 bash[567713]: Aug 01 00:02:48 2025 (567713) Starting new HTTPS connection (1): lists.example.com:443 Aug 1 00:02:48 iris-s01 bash[567713]: Aug 01 00:02:48 2025 (567713) Starting new HTTP connection (1): localhost:8000 Aug 1 00:02:48 iris-s01 bash[567713]: Aug 01 00:02:48 2025 (567713) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=ww10%40lists.example.com&msgid=1f86b244cce9479490b4bffe184f6c03%40sub.example.com HTTP/11" 200 121 Aug 1 00:02:48 iris-s01 bash[567713]: Aug 01 00:02:48 2025 (567713) Starting new HTTPS connection (1): lists.example.com:443 Aug 1 00:02:48 iris-s01 bash[567715]: Aug 01 00:02:48 2025 (567715) Starting new HTTP connection (1): localhost:8000 Aug 1 00:02:48 iris-s01 bash[567715]: Aug 01 00:02:48 2025 (567715) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=ww10%40lists.example.com&msgid=1f86b244cce9479490b4bffe184f6c03%40sub.example.come HTTP/11" 200 121 Aug 1 00:02:48 iris-s01 bash[567715]: Aug 01 00:02:48 2025 (567715) Starting new HTTPS connection (1): lists.example.com:443 Aug 1 00:02:48 iris-s01 bash[567712]: Aug 01 00:02:48 2025 (567712) Starting new HTTPS connection (1): lists.example.com:443 Aug 1 00:02:48 iris-s01 bash[567712]: Aug 01 00:02:48 2025 (567712) Starting new HTTP connection (1): localhost:8000 Aug 1 00:02:48 iris-s01 bash[567712]: Aug 01 00:02:48 2025 (567712) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=ww10%40lists.example.com&msgid=1f86b244cce9479490b4bffe184f6c03%40sub.example.com HTTP/11" 200 121 Aug 1 00:02:48 iris-s01 bash[567712]: Aug 01 00:02:48 2025 (567712) Starting new HTTPS connection (1): lists.example.com:443 Aug 1 00:02:49 iris-s01 bash[567712]: Aug 01 00:02:49 2025 (567712) Starting new HTTP connection (1): localhost:8000 Aug 1 00:02:49 iris-s01 bash[567712]: Aug 01 00:02:49 2025 (567712) http://localhost:8000 "GET /archives/api/mailman/urls?mlist=ww10%40lists.example.com&msgid=1f86b244cce9479490b4bffe184f6c03%40sub.example.com HTTP/11" 200 121
And then there is a logrotate:
/var/log/mailman/mailman-logs/* { missingok daily compress delaycompress nomail notifempty rotate 14 dateext su mailman mailman olddir /var/log/mailman/mailman-logs/oldlogs create 664 mailman mailman postrotate if [ -f /opt/mailman/var/master.pid ]; then /bin/systemctl restart mailman.service > /dev/null; fi endscript } /var/log/mailman/* { missingok daily compress delaycompress nomail notifempty rotate 14 dateext su mailman mailman olddir /var/log/mailman/oldlogs create 664 mailman mailman }
And yes, this restarts mailman at midnight. Maybe i should optimize the logrotate.
Now to the files:
mailman qfile /opt/mailman/var/queue/virgin/1736550035.6163204+472f81ece5e45a2651a4499bef418f611b43c619.pck.tmp
in object 1 there is a complete mail from 11 January 2025, it's a digest.
In object 2:
<----- start object 2 -----> { '_parsemsg': False, 'isdigest': True, 'listid': '<Name of the List>>', 'recipients': { '<Rec1>', '<Rec2>', '<Rec3>', '<Rec4>'}, 'version': 3}
So i would say there is no error in the mail. But i will delete this file now, because it's older then 6 months.
Now let's move on to the shunt queue: Here is the fole form the 31 July 2025:
mailman qfile /opt/mailman/var/queue/shunt/1753912822.2651796+28eceef7e18eb70393377b88dc7117af8f9362a0.pck
object 1 is empty object 2:
{ '_parsemsg': False, 'digest_number': 1, 'digest_path': '/opt/mailman/var/lists/it-besteller.lists.example.com/digest.11.1.mmdf', 'lang': 'de', 'listid': 'it-besteller.lists.example.com', 'version': 3, 'volume': 11, 'whichq': 'digest'}
Well - this is a digest too. Maybe the restart through the logrotate kills the digest generation?
After looking at all midnights mails: It's the smae: No Mail body and a digest.
Then we have a special mail from 31 July 2025 at 11:44:
mailman qfile /opt/mailman/var/queue/shunt/1753955094.900898+67bc76525412da66a7c76363f65f583989716305.pck <Database line omitted> [----- start pickle -----] <----- start object 1 ----->
<----- start object 2 -----> { '_parsemsg': False, 'digest_number': 6, 'digest_path': '/opt/mailman/var/lists/slcm_fak.lists.example.com/digest.47.6.mmdf', 'lang': 'en', 'listid': 'slcm_fak.lists.example.com', 'version': 3, 'volume': 47, 'whichq': 'digest'} [----- end pickle -----]
Again - an digest. Seems to be a problem with digests. But that does not explain my problem. It's 'just another problem to be fixed' :D
Questions about my setup:
- my local Postfix is connected to a smarthost. In the maillog i can see successfully delivered mails to a lot of people. If there is a problem, i would see it
Maybe there is something problematic with my mailman.cfg?
# For example, uncomment the following lines to run Mailman in developer mode. # # [devmode] # enabled: yes # recipient: your.address@your.domain
[mailman] site_owner: mailman@lists.example.com layout: here pending_request_life: 3d default_language: de
[paths.here] var_dir: /opt/mailman/var log_dir: /var/log/mailman/mailman-logs
[database] class: mailman.database.postgresql.PostgreSQLDatabase url: postgresql://mailman:<Pass>@db01.example.com/mailman debug: no
[runner.out] instances: 4
[archiver.hyperkitty] class: mailman_hyperkitty.Archiver enable: yes configuration: /opt/mailman/mailman-hyperkitty.cfg
[archiver.prototype] enable: yes
[mta] verp_probes: yes verp_confirmations: yes verp_personalized_deliveries: yes verp_delivery_interval: 0 remove_dkim_headers: yes
[webservice] hostname: localhost port: 8001 use_https: no admin_user: mailmanapiuser admin_pass: <PASS> api_version: 3.1 workers: 4 configuration: /opt/mailman/gunicorn.conf
#log_dir: $var_dir/logs [logging.root] level: debug path: mailman.log [logging.archiver] level: debug path: archiver.log [logging.bounce] path: bounce.log level: warn [logging.config] level: debug path: mailman.log [logging.database] level: warn path: database.log [logging.debug] path: debug.log level: warn [logging.error] level: debug path: error.log [logging.fromusenet] level: warn path: mailman.log [logging.http] level: warn path: httpd.log [logging.locks] level: warn path: locks.log [logging.mischief] level: warn path: mailman.log [logging.plugins] path: mailman.log level: warn [logging.runner] level: warn path: runner.log [logging.smtp] path: smtp.log level: debug [logging.subscribe] path: subscribe.log level: debug [logging.vette] path: vette.log level: debug
# Some list posts and mail to the -owner address may contain DomainKey or # DomainKeys Identified Mail (DKIM) signature headers <http://www.dkim.org/>. # Various list transformations to the message such as adding a list header or # footer or scrubbing attachments or even reply-to munging can break these # signatures. It is generally felt that these signatures have value, even if # broken and even if the outgoing message is resigned. However, some sites # may wish to remove these headers by setting this to 'yes'.
[bounces] # How often should the bounce runner process queued detected bounces? register_bounces_every: 15m
[antispam] header_checks: X-Spam-Flag: YES jump_chain: discard
There are no memory peaks on the os side, as far as my monitoring tells me.
So - any other ideas? Hopefully i didn't missed an important detail in your questions.

Stephan Krinetzki writes:
And then there is a logrotate: And yes, this restarts mailman at midnight. Maybe i should optimize the logrotate.
Not sure what you mean by "optimize". Because of the bug I mentioned earlier, in older versions of Mailman 3 the master process would fail to close some of logfiles. After rotation, the master process would continue writing to the still-open file handle, defeating the purpose of logrotate. Unless that bug has been fixed in your version, you should leave the restart stanza in the logrotate for configuring mailman.
That the shunt queue is collecting digests seems weird. It's not surprising that the message object in the queue file is empty, that's by design. It seems like the digest process is happening at the same time as the restart. This shouldn't be a problem, but it might clarify things if you make sure the periodic digest delivery is offset from the midnight restart in the /etc/cron.d/mailman3 file. Eg
# cron time is UTC, put stuff on desks at start of day JST # core sends mail 22 5 * * * mailman /opt/mailman3/.v/bin/mailman notify 22 6 * * * mailman /opt/mailman3/.v/bin/mailman digests --periodic
(note 22:00 UTC is 07:00 JST).
My only other comment on your mail is that I see that you do have multiple out slices. So that explains the case of the .pck.bak (in process) file being younger than a .pck file (waiting in queue). It's not unusual in your configuration.
-- GNU Mailman consultant (installation, migration, customization) Sirius Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

On 8/1/25 01:22, Stephan Krinetzki wrote:
Now to the files:
mailman qfile /opt/mailman/var/queue/virgin/1736550035.6163204+472f81ece5e45a2651a4499bef418f611b43c619.pck.tmp
...
mailman qfile /opt/mailman/var/queue/shunt/1753912822.2651796+28eceef7e18eb70393377b88dc7117af8f9362a0.pck ... mailman qfile /opt/mailman/var/queue/shunt/1753955094.900898+67bc76525412da66a7c76363f65f583989716305.pck
What about
-rw-rw---- 1 mailman mailman 221708 Jul 30 13:16 1753874180.0845337+0cc7849043859a79dc3678a0d8b63c1c66df0c66.pck.tmp and -rw-rw---- 1 mailman mailman 733425 Jul 31 00:00 1753912841.8847518+da55f8789ae41b75d19a02b595e3fd6d45983ade.pck.tmp
from the out queue? And is the out queue being processed, i.e. messages in .pck files being delivered and the files being removed.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On August 2025 1:22 a.m. Stephan Krinetzki wrote:
And then there is a logrotate:
...
postrotate if [ -f /opt/mailman/var/master.pid ]; then /bin/systemctl restart mailman.service > /dev/null; fi
You should be aware that blindly restarting Mailman can result in duplicate messages to list members and possibly other issues. In particular if outgoing runner is processing a message and has already delivered one or more chunks to the MTA, a restart will cause outgoing runner to stop and upon restart, deliver to the entire recipient list resulting in duplicates for those recipients already sent to the MTA.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 7/31/25 00:18, Stephan Krinetzki wrote:
mailman members --regular --nomail enabled LISTSPEC
Shows a lot of addresses - so all the members of the list don't recieve an mail, which explains the vanished mail. Next step: Enable the the mailing for them.
--nomail enabled
lists members with delivery enabled, not those with
delivery disabled. See mailman members --help
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (5)
-
Krinetzki, Stephan
-
Mark Sapiro
-
Odhiambo Washington
-
Stephan Krinetzki
-
Stephen J. Turnbull