Hello, all
I have migrated a multi-list server from MM2 to MM3, on Debian 11, and migrated about half a dozen lists with fairly large archives. Absolutely everything has been working for a couple of months, but I am still unable to get one specific list to send mail to its subscribers. Of course, the first place I started was the logs, but that's the problem -- even with logging turned up, it appears to be failing silently as it connects to the MTA.
Here's a summary of the environment:
OS: Debian Linux 11, 64-bit, on a virtual machine MTA: Exim 4.94.2 List server: Mailman 3.3.3 (installed from Debian repo) Database: MariaDB 10.5.15 All daemons are running on the same host, and communication is either UNIX socket or loopback network interface, for security. Web server: Apache 2.4.53, with proxy to WSGI per the MM3 setup guide
Symptom: There are six lists on the server. All six can accept inbound traffic and will post the messages to the (always private, in this environment) archives. The web interface is working perfectly. Five of the lists also can send out messages to subscribers, but the sixth list cannot send traffic to the MTA, or is not attempting to do so.
Tests run so far (all resulting in successful posting to archives, but no outbound mail):
- Originate mail from inbound SMTP to the list address.
- Originate mail from the web UI as a member/moderator.
- Originate mail from the web UI as the site admin (also the list owner).
- Originate mail from both inbound SMTP and web UI by multiple users.
Log data for the inbound traffic (typical test message, some data redacted):
2022-08-27 15:11:57 1oS1Dn-003AWH-Hz <= REDACTED H=REDACTED:25 P=esmtps X=TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_128_GCM:128 CV=no K S=2552 DKIM=REDACTED id=REDACTED T="TEST 8.27.001" 2022-08-27 15:11:57 1oS1Dn-003AWH-Hz => MY_LIST@lists.MY_DOMAIN R=mailman_main_director T=mailman3_transport H=127.0.0.1 [127.0.0.1] I=[127.0.0.1] C="250 Ok" 2022-08-27 15:11:57 1oS1Dn-003AWH-Hz Completed
Log data for MM3 smtp.log (again, partially redacted):
Aug 27 15:11:57 2022 (690182) Available AUTH mechanisms: LOGIN(builtin) PLAIN(builtin) Aug 27 15:11:57 2022 (690182) Peer: ('127.0.0.1', 48170) Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) handling connection Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) Data: b'LHLO REDACTED' Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) Data: b'MAIL FROM:<REDACTED>' Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) sender: REDACTED Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) Data: b'RCPT TO:<erevnite@lists.4th.com>' Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) recip: MY_LIST@lists.MY_DOMAIN Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) Data: b'DATA' Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) Data: b'QUIT' Aug 27 15:11:57 2022 (690182) ('127.0.0.1', 48170) connection lost Aug 27 15:11:57 2022 (690182) Connection lost during _handle_client()
Originally I thought the "connection lost" indicated a failure, but that is happening for all the lists, and also on successful traffic. Since it occurs after the QUIT command in the SMTP session, and based on some online searching, my current understanding is that this is harmless and simply indicates the MTA closed the socket after the QUIT. (Please correct me if that is wrong.)
Log data for mailman.log, with partial redaction:
Aug 27 15:11:57 2022 (690181) ACCEPT: <REDACTED> Aug 27 15:11:59 2022 (690178) HyperKitty archived message <REDACTED> to https://lists.MY_DOMAIN/hyperkitty/list/MY_LIST@lists.MY_DOMAIN/message/G3DU... 19:11:59 [Q] INFO Enqueued 518 19:11:59 [Q] INFO Enqueued 519 19:11:59 [Q] INFO Enqueued 520 19:11:59 [Q] INFO Enqueued 521 19:11:59 [Q] INFO Enqueued 522 19:11:59 [Q] INFO Enqueued 523 19:11:59 [Q] INFO Enqueued 524 INFO 2022-08-27 19:11:59,648 690164 hyperkitty.views.mailman Archived message <REDACTED> to https://lists.MY_DOMAIN/hyperkitty/list/MY_LIST@lists.MY_DOMAIN/message/G3DU... [pid: 690164|app: 0|req: 4688541/4688541] 2600:3c04::f03c:91ff:fe02:9cc1 () {64 vars in 1255 bytes} [Sat Aug 27 19:11:59 2022] POST /hyperkitty/api/mailman/archive?key=xombsjgLDqe6d1oxvBaQWH4ZT81uCEjH => generated 113 bytes in 110 msecs (HTTP/1.1 200) 5 headers in 160 bytes (1 switches on core 0) 19:12:00 [Q] INFO Process-1:4 processing [update_from_mailman] 19:12:00 [Q] INFO Process-1:3 processing [sender_mailman_id] 19:12:00 [Q] INFO Process-1:2 processing [rebuild_thread_cache_new_email] 19:12:00 [Q] INFO Process-1:1 processing [compute_thread_positions] 19:12:00 [Q] INFO Processed [rebuild_thread_cache_new_email] 19:12:00 [Q] INFO Process-1:2 processing [rebuild_mailinglist_cache_recent] 19:12:00 [Q] INFO Processed [sender_mailman_id] 19:12:00 [Q] INFO Process-1:3 processing [rebuild_mailinglist_cache_for_month] 19:12:00 [Q] INFO Process-1:1 processing [check_orphans] 19:12:00 [Q] INFO Processed [compute_thread_positions] 19:12:00 [Q] INFO Processed [rebuild_mailinglist_cache_for_month] 19:12:00 [Q] INFO Processed [check_orphans] 19:12:00 [Q] INFO Processed [rebuild_mailinglist_cache_recent] 19:12:00 [Q] INFO Processed [update_from_mailman] [pid: 690164|app: 0|req: 4693691/4693691] 184.94.240.88 () {60 vars in 929 bytes} [Sat Aug 27 19:21:50 2022] HEAD / => generated 0 bytes in 2 msecs (HTTP/1.1 301) 6 headers in 204 bytes (1 switches on core 0) [pid: 690164|app: 0|req: 4693694/4693694] 184.94.240.88 () {60 vars in 1009 bytes} [Sat Aug 27 19:21:50 2022] HEAD /postorius/lists/ => generated 7136 bytes in 142 msecs (HTTP/1.1 200) 5 headers in 163 bytes (1 switches on core 1)
There is nothing at all in the Exim log indicating an outbound transaction was attempted by Mailman. I know Exim and Mailman are communicating because all the lists use the same SMTP configuration for Exim and Mailman (i.e., the non-working list is not in any way exceptional).
I've done a lot of searching online to find a solution, and I did see that a couple of others had asked about this type of problem, but the only one I found that got a response had an issue that didn't apply in my small environment.
Absent any other ideas, I'm going to try inserting some trace logging into the Python code, but I can't figure out what's wrong here. I've even gone to the point of looking at the SQL rows to verify this list is exactly parallel to another working list, except its name and integer ID numbers for various unique auto-increment keys.
Any suggestions are appreciated. Thanks!