Can't deliver mails in queue
Hello,
I have stumbled upon some weird issues with my migration to mailman3.. I've followed the step by step instructions for installing mm3 manually in a venv with postfix and virtual aliases (if that is of any significance).
Now the Problem is, that some lists don't deliver EMails at all. The list was one which I migrated with the mailman21 command. But the Mail I sent did not get through to no one and the mailman logs did not show anything.
I then created a completely new List which did not exists in mm2.1, subscribed two addresses and sent myself an EMail. That one worked as intended. I then deleted the whole broken list and created it cleanly, added the same two people, sent an email and nothing. Did not show up in the Moderation Queue or anywhere. it also doesn't get shunted. It just never arrives. I then turned the log level to debug on the smtp module. And I only get the following statement, ever repeating, never ending:
[...] May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb49b143780> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb4974541d0> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb4978db438> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb497f05f28> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb4978adb38> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net [...]
I have no clue what to do with this. I then turned the debug module loglevel to debug. Also nothing useful. Just an endless list of
[...] May 19 18:17:29 2021 (7209) [outgoing] <function deliver at 0x7f2faf384c80>: <B9531076-2480-4925-9A2B-57802B53D5ED@gmx.de> May 19 18:17:29 2021 (7203) [ArchiveRunner] starting oneloop May 19 18:17:29 2021 (7203) [ArchiveRunner] ending oneloop: 0 May 19 18:17:29 2021 (7209) [OutgoingRunner] finishing filebase: 1621441049.4089153+6dff56673dc64b890dd352941267782c264586dd May 19 18:17:29 2021 (7209) [OutgoingRunner] doing periodic May 19 18:17:29 2021 (7209) [OutgoingRunner] committing transaction May 19 18:17:29 2021 (7209) [OutgoingRunner] checking short circuit May 19 18:17:29 2021 (7209) [OutgoingRunner] ending oneloop: 1 May 19 18:17:29 2021 (7209) [OutgoingRunner] starting oneloop May 19 18:17:29 2021 (7209) [OutgoingRunner] processing filebase: 1621441049.4628382+6a0437ea387a2508ad947c0a51712075cbdb6d2e May 19 18:17:29 2021 (7209) [OutgoingRunner] processing onefile May 19 18:17:29 2021 (7209) [outgoing] <function deliver at 0x7f2faf384c80>: <B9531076-2480-4925-9A2B-57802B53D5ED@gmx.de> [...]
Which also tells me nothing I can understand. I have no clue what's going wrong. (I've turned on debugging both on the original member list and the reduced member list. Both hang on the first delivery they do). If I restart mailman, the message gets correctly shunted as it is interrupted mid-delivery. My maillogs don't register any activity beside the listpost. Any ideas what I need to do to figure out what's not working here or where the culprit could be? Not sure if it is related to the mailinglist being in mailman2.1 before.
-- Nathanael Schneider
On 5/19/21 10:22 AM, Nathanael Schneider wrote:
I then created a completely new List which did not exists in mm2.1, subscribed two addresses and sent myself an EMail. That one worked as intended. I then deleted the whole broken list and created it cleanly, added the same two people, sent an email and nothing.
This seems strange. If I understand, you are essentially saying the two lists are the same. I.e., you created a completely new List which did not exists in mm2.1, and that one worked. Then you deleted the whole broken list and created it cleanly ... What was different between this cleanly created list and the completely new List?
May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb49b143780> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb4974541d0> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net
The fact that these are the only messages says that the attempt to send that message via smtp raised no exception as exceptions would have been logged, yet you say there's nothing in the maillog so there was no connect to Postfix. What are your settings for smtp_host: and smtp_port: in the [mta] section of mailman.cfg?
...
I have no clue what to do with this. I then turned the debug module loglevel to debug. Also nothing useful. Just an endless list of
[...] May 19 18:17:29 2021 (7209) [outgoing] <function deliver at 0x7f2faf384c80>: <B9531076-2480-4925-9A2B-57802B53D5ED@gmx.de> May 19 18:17:29 2021 (7203) [ArchiveRunner] starting oneloop May 19 18:17:29 2021 (7203) [ArchiveRunner] ending oneloop: 0 May 19 18:17:29 2021 (7209) [OutgoingRunner] finishing filebase: 1621441049.4089153+6dff56673dc64b890dd352941267782c264586dd May 19 18:17:29 2021 (7209) [OutgoingRunner] doing periodic May 19 18:17:29 2021 (7209) [OutgoingRunner] committing transaction May 19 18:17:29 2021 (7209) [OutgoingRunner] checking short circuit May 19 18:17:29 2021 (7209) [OutgoingRunner] ending oneloop: 1 May 19 18:17:29 2021 (7209) [OutgoingRunner] starting oneloop May 19 18:17:29 2021 (7209) [OutgoingRunner] processing filebase: 1621441049.4628382+6a0437ea387a2508ad947c0a51712075cbdb6d2e May 19 18:17:29 2021 (7209) [OutgoingRunner] processing onefile May 19 18:17:29 2021 (7209) [outgoing] <function deliver at 0x7f2faf384c80>: <B9531076-2480-4925-9A2B-57802B53D5ED@gmx.de> [...]
Which also tells me nothing I can understand.
It says the message in queue entry 1621441049.4089153+6dff56673dc64b890dd352941267782c264586dd is handled but gets requeued as 1621441049.4628382+6a0437ea387a2508ad947c0a51712075cbdb6d2e which presumably is handled and requeued again ...
I have no clue what's going wrong. (I've turned on debugging both on the original member list and the reduced member list. Both hang on the first delivery they do). If I restart mailman, the message gets correctly shunted as it is interrupted mid-delivery. My maillogs don't register any activity beside the listpost. Any ideas what I need to do to figure out what's not working here or where the culprit could be? Not sure if it is related to the mailinglist being in mailman2.1 before.
I have no real understanding of why this isn't working. I don't think it's likely that it has to do with the mailing list being imported from MM 2.1, but I suppose it could be.
How did you create and import the list? When you say you deleted the whole broken list and created it cleanly, what did you actually do?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
I then created a completely new List which did not exists in mm2.1, subscribed two addresses and sent myself an EMail. That one worked as intended. I then deleted the whole broken list and created it cleanly, added the same two people, sent an email and nothing.
This seems strange. If I understand, you are essentially saying the two lists are the same. I.e., you created a completely new List which did not exists in mm2.1, and that one worked. Then you deleted the whole broken list and created it cleanly ... What was different between this cleanly created list and the completely new List?
They were in fact not identical, as I changed some stuff in hopes to get it up and running again quickly. I now changed the completely new List to be identical with the cleanly created one (as far as postorius shows me) and it still works just fine.
May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb49b143780> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb4974541d0> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net
The fact that these are the only messages says that the attempt to send that message via smtp raised no exception as exceptions would have been logged, yet you say there's nothing in the maillog so there was no connect to Postfix. What are your settings for smtp_host: and smtp_port: in the [mta] section of mailman.cfg?
I set nothing explicitly as the default options fit my needs, so localhost:25 it is to my knowledge. Postfix itself was configured by Plesk and the config for mailman3 was manually added as prescribed. A manual telnet on localhost 25 gives me the expected 220 message from postfix.
I have no clue what to do with this. I then turned the debug module loglevel to debug. Also nothing useful. Just an endless list of
[...] May 19 18:17:29 2021 (7209) [outgoing] <function deliver at 0x7f2faf384c80>: <B9531076-2480-4925-9A2B-57802B53D5ED@gmx.de> May 19 18:17:29 2021 (7203) [ArchiveRunner] starting oneloop May 19 18:17:29 2021 (7203) [ArchiveRunner] ending oneloop: 0 May 19 18:17:29 2021 (7209) [OutgoingRunner] finishing filebase: 1621441049.4089153+6dff56673dc64b890dd352941267782c264586dd May 19 18:17:29 2021 (7209) [OutgoingRunner] doing periodic May 19 18:17:29 2021 (7209) [OutgoingRunner] committing transaction May 19 18:17:29 2021 (7209) [OutgoingRunner] checking short circuit May 19 18:17:29 2021 (7209) [OutgoingRunner] ending oneloop: 1 May 19 18:17:29 2021 (7209) [OutgoingRunner] starting oneloop May 19 18:17:29 2021 (7209) [OutgoingRunner] processing filebase: 1621441049.4628382+6a0437ea387a2508ad947c0a51712075cbdb6d2e May 19 18:17:29 2021 (7209) [OutgoingRunner] processing onefile May 19 18:17:29 2021 (7209) [outgoing] <function deliver at 0x7f2faf384c80>: <B9531076-2480-4925-9A2B-57802B53D5ED@gmx.de> [...]
Which also tells me nothing I can understand. It says the message in queue entry 1621441049.4089153+6dff56673dc64b890dd352941267782c264586dd is handled but gets requeued as 1621441049.4628382+6a0437ea387a2508ad947c0a51712075cbdb6d2e which presumably is handled and requeued again ...
Ok. Then I just need to figure out why it gets requeued. That also explains why the out-queue folder keeps creating a single message with different filenames over and over. But the logs keep silent about the reason..
I have no clue what's going wrong. (I've turned on debugging both on the original member list and the reduced member list. Both hang on the first delivery they do). If I restart mailman, the message gets correctly shunted as it is interrupted mid-delivery. My maillogs don't register any activity beside the listpost. Any ideas what I need to do to figure out what's not working here or where the culprit could be? Not sure if it is related to the mailinglist being in mailman2.1 before.
I have no real understanding of why this isn't working. I don't think it's likely that it has to do with the mailing list being imported from MM 2.1, but I suppose it could be.
How did you create and import the list? When you say you deleted the whole broken list and created it cleanly, what did you actually do?
My steps where basically what the migration guide told me to do. First, I created the new list in postorius, I then copied the config.pck over to the mailman directory, chowned it to mailman and imported it in the venv using
mailman import21 list@example.com config.pck
Does import21 expect anything besides the config.pck? Maybe it couldn't find the required files and botched something? I recreated the list by deleting it in postorius and creating it anew. It could be that there is some configuration that does not get cleaned properly? If it helps, I've also made a document containing all the specific steps I used for my server as I've tested the setup process first before rolling it out (not the migration though) on a different server to get it done without a hitch.
-- Nathanael Schneider
On 5/19/21 2:45 PM, Nathanael Schneider wrote:
They were in fact not identical, as I changed some stuff in hopes to get it up and running again quickly. I now changed the completely new List to be identical with the cleanly created one (as far as postorius shows me) and it still works just fine.
Do I understand correctly that you deleted the MM 2.1 imported list and created a new list in Postorius with just two members and it doesn't work.
If that understanding is not correct, please outline all the steps you took to arrive at the current list that doesn't work.
Also, please capture one of the .pck or .bak files from the out queue and send it to me off list. I will treat it confidentially, but there might be clues in it.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb49b143780> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net May 19 18:54:40 2021 (8280) Using agent: <mailman.mta.deliver.Deliver object at 0x7fb4974541d0> May 19 18:54:40 2021 (8280) IndividualDelivery to: nathanael@lumenliquid.net
The fact that these are the only messages says that the attempt to send that message via smtp raised no exception as exceptions would have been logged, yet you say there's nothing in the maillog so there was no connect to Postfix. What are your settings for smtp_host: and smtp_port: in the [mta] section of mailman.cfg?
I set nothing explicitly as the default options fit my needs, so localhost:25 it is to my knowledge. Postfix itself was configured by Plesk and the config for mailman3 was manually added as prescribed. A manual telnet on localhost 25 gives me the expected 220 message from postfix.
It says the message in queue entry 1621441049.4089153+6dff56673dc64b890dd352941267782c264586dd is handled but gets requeued as 1621441049.4628382+6a0437ea387a2508ad947c0a51712075cbdb6d2e which presumably is handled and requeued again ...
Ok. Then I just need to figure out why it gets requeued. That also explains why the out-queue folder keeps creating a single message with different filenames over and over. But the logs keep silent about the reason..
Ok I figured out what happened.. First I tried to get more infos by running the runner manually once. And indeed, now I see that the runner "can't connect to SMTP":
(venv) mailman@h2831057:~$ runner -C /etc/mailman3/mailman.cfg -o -v --runner=out:0:1 May 20 12:17:24 2021 (20303) out runner started. May 20 12:17:24 2021 (20303) out runner started. May 20 12:17:24 2021 (20303) Using agent: <mailman.mta.deliver.Deliver object at 0x7f8a51649be0> May 20 12:17:24 2021 (20303) Using agent: <mailman.mta.deliver.Deliver object at 0x7f8a51649be0> May 20 12:17:24 2021 (20303) IndividualDelivery to: business@lumenliquid.net May 20 12:17:24 2021 (20303) IndividualDelivery to: business@lumenliquid.net May 20 12:17:24 2021 (20303) Starting new HTTP connection (1): localhost:8000 May 20 12:17:24 2021 (20303) Starting new HTTP connection (1): localhost:8000 May 20 12:17:24 2021 (20303) Cannot connect to SMTP server localhost on port 25 May 20 12:17:24 2021 (20303) Cannot connect to SMTP server localhost on port 25 May 20 12:17:24 2021 (20303) out runner exiting. May 20 12:17:24 2021 (20303) out runner exiting.
Which turns out is the completely wrong error message. SMTP works fine, it is the connection before (localhost:8000) that fails. After some more digging on that particular error I found a thread that linked this error with the use of templates. And indeed, if I add a footer template to my working list, it stops working. If I remove the footer from the broken list, it starts working. So there is something wrong with the templates.
After some more digging I looked into the database how my footers were saved and indeed, if I did a wget on the url it timed out because it didn't work. I then dug a little bit more and found the setting "POSTORIUS_TEMPLATE_BASE_URL" which apparently you also have to set. I didn't know because it was mentioned nowhere in the starting guide. So I set it to the url of my frontend and bang, it works (after fixing the urls in the database aswell).
It would have been nice though to have a more refined error to figure this out. But its solved now and hopefully this will help someone else as well.
-- Nathanael Schneider
On 5/20/21 4:01 AM, Nathanael Schneider wrote:
It would have been nice though to have a more refined error to figure this out. But its solved now and hopefully this will help someone else as well.
I'm glad you resolved it, and I'm sorry I didn't consider the template
issue myself. You actually had two separate template issues. If you
created templates in Postorius, they were inaccessible because of the
POSTORIUS_TEMPLATE_BASE_URL issue, but your initial problem with
imported lists is different. It probably resulted from running the
mailman import21
command as root which you should never do. You should
always run mailman
commands as the Mailman user. This is mentioned at
https://docs.mailman3.org/en/latest/migration.html#id1
The mailman import21
command can create list specific templates in
Mailman's var/templates/lists/LIST_ID/LC/ directory and if run as root
these will be user:group root:root and not world readable and therefore
can't be read by Mailman. Also, deleting and recreating the list with
the same LIST_ID doesn't remove those templates, so the list still
doesn't work.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Mark Sapiro
-
Nathanael Schneider