Hello-
We’re seeing a seemingly random issue with message acceptance on our mailman instance where some messages sent from postfix to the lmtp listener timeout. It’s not consistent to any specific list or time of day, and messages are accepted on the next attempt by postfix to deliver the message to the lmtp.
Additionally, there seems to be a similarly random delay in accepting and processing messages before they are sent to the list moderators or list members.
The mailman smtp log snippet shows it stop receiving data prior to the DATA portion of the message after a 15min timeout:
Jan 02 07:56:36 2025 (3601529) Available AUTH mechanisms: LOGIN(builtin) PLAIN(builtin)
Jan 02 07:56:36 2025 (3601529) Peer: ('127.0.0.1', 45574)
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) handling connection
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) << b'220 lists.renci.org GNU Mailman LMTP runner 2.0'
Jan 02 07:56:36 2025 (3601529) _handle_client readline: b'LHLO lists.renci.org\r\n'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) >> b'LHLO lists.renci.org'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) << b'250-lists.renci.org'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) << b'250-SIZE 33554432'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) << b'250-8BITMIME'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) << b'250 HELP'
Jan 02 07:56:36 2025 (3601529) _handle_client readline: b'MAIL FROM:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws<mailto:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws> SIZE=20254\r\n'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) >> b'MAIL FROM:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws<mailto:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws> SIZE=20254'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) sender: 010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000(a)mail.verify.signin.aws<mailto:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws>
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) << b'250 OK'
Jan 02 07:56:36 2025 (3601529) _handle_client readline: b'RCPT TO:adcircaws@lists.RENCI.org<mailto:adcircaws@lists.RENCI.org>\r\n'
Jan 02 07:56:36 2025 (3601529) ('127.0.0.1', 45574) >> b'RCPT TO:adcircaws@lists.RENCI.org<mailto:adcircaws@lists.RENCI.org>'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 45574) recip: adcircaws(a)lists.RENCI.org<mailto:adcircaws@lists.RENCI.org>
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 45574) << b'250 Ok'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 45574) EOF received
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 45574) connection timeout
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 45574) Connection lost during _handle_client()
Jan 02 08:12:26 2025 (3601529) Available AUTH mechanisms: LOGIN(builtin) PLAIN(builtin)
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 45574) connection lost
Jan 02 08:12:26 2025 (3601529) Peer: ('127.0.0.1', 33138)
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) handling connection
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'220 lists.renci.org GNU Mailman LMTP runner 2.0'
Jan 02 08:12:26 2025 (3601529) _handle_client readline: b'LHLO lists.renci.org\r\n'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) >> b'LHLO lists.renci.org'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'250-lists.renci.org'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'250-SIZE 33554432'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'250-8BITMIME'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'250 HELP'
Jan 02 08:12:26 2025 (3601529) _handle_client readline: b'MAIL FROM:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws<mailto:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws> SIZE=20254\r\n'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) >> b'MAIL FROM:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws<mailto:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws> SIZE=20254'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) sender: 010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000(a)mail.verify.signin.aws<mailto:010001942716aa05-637d21f8-9794-473f-9e8b-06ca08025ef1-000000@mail.verify.signin.aws>
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'250 OK'
Jan 02 08:12:26 2025 (3601529) _handle_client readline: b'RCPT TO:adcircaws@lists.RENCI.org<mailto:adcircaws@lists.RENCI.org>\r\n'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) >> b'RCPT TO:adcircaws@lists.RENCI.org<mailto:adcircaws@lists.RENCI.org>'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) recip: adcircaws(a)lists.RENCI.org<mailto:adcircaws@lists.RENCI.org>
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'250 Ok'
Jan 02 08:12:26 2025 (3601529) _handle_client readline: b'DATA\r\n'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) >> b'DATA'
Jan 02 08:12:26 2025 (3601529) ('127.0.0.1', 33138) << b'354 End data with <CR><LF>.<CR><LF>'
Anyone have any thoughts on next troubleshooting steps on this one?
Thanks
Hey all,
last week we caught some AI bots causing high LOAD on our mailman3
servers archive, that we blocked with nginx rules.
I'm not sure if it is related to this incident, but we noticed this when
analyzing the log files at the time. In the mailman.log hyperkitty is
listed 10k times and more to archive a single mail over the time of 2h,
seemingly all with the same process ID
> (1186) HyperKitty archived message [...]
The mailman3-web.log has matching log entries that the mail was archived
> hyperkitty.views.mailman Archived message
every 0.5 seconds.
In the web interface to the archive, the mail is only listed once though.
My suspicion is, that these archiving processes are now overloading the
system, preventing a timely catching up work of the queued messages. I
could not find anything in the Mailman3 nor Hyperkitty documentation to
better distribute the system resources to the Mailman runner or prevent
Hyperkitty to restart archiving a mail over and over again.
Any hint about where to look for the reason of this is highly appreciated.
Best,
Tobias
--
Tobias Diekershoff >>> System Hacker
Free Software Foundation Europe
Schönhauser Allee 6/7, 10119 Berlin, Germany | t +49-30-27595290
Registered at Amtsgericht Hamburg, VR 17030 | fsfe.org/support
OpenPGP-Key ID ... 0x25FE376FF17694A1
Fingerprint ...... 23EE F484 FDF8 291C BA09
A406 25FE 376F F176 94A1