Detectors and Bounces
Hello All,
I've recently upgraded a Mailman 3 server from Debian 10.12 to Debian 11.4 [1].
In that process, I've also upgraded Mailman 3 from version 3.3.4 to version 3.3.5. I think the following guide was originally used when setting up Mailman 3 on the present server: https://wiki.list.org/DOC/Howto_Install_Mailman3_On_Debian10 .
After that upgrade process, I've noticed that at least one mailman
process and one postgres
process were at something like 40 %CPU or higher for extended periods of time (seen using top
command). Upon further inspection, and through some research online, it seemed like the issue might have to do with bounces (maybe not processing or something) [2][3][4][5][6][7][8].
In my logcheck logs, I was seeing this message multiple times a second:
Running detector: <class 'flufl.bounce._detectors.dsn.DSN'>
I ended up trying to clear out the /opt/mailman/mm/var/queue/bounces
and /opt/mailman/mm/var/locks
directories to see if that would help.
There seemed to be some sort of positive change, because the log message "spamming" seemed to decrease. The high %CPU for the mailman
and postgres
processes in question also seemed to go down over time. But the related mailman
process would still run at a high %CPU from time to time. I tried checking the mailman
process using strace
at some point, but I didn't see an easily evident cause for what was going on—from what I remember, there was something about a "POLLIN" or something that kept looping. When I go to check the same mailman
process with strace
today, I just see occasional messages like this:
select(0, NULL, NULL, NULL, {tv_sec=45, tv_usec=43171}) = 0 (Timeout)
My Questions:
Is it normal to see large amounts of Running detector: <class 'flufl.bounce._detectors.dsn.DSN'>
lines in logs?
Should there normally be other detectors listed in the logs beyond the DSN
one?
If so, is there a setting I can change somewhere to make this happen?
For reference, I can only find one other logcheck log that shows a Running detector: <class 'flufl.bounce._detectors.dsn.DSN'>
line from before the recent Mailman 3 upgrade. In that past logcheck log, there are other detectors listed as well, which makes me wonder if something is still stuck with the new Mailman 3 install (or if maybe something needs to be changed in a setting file somewhere):
Here is an example of previous log lines: Running detector: <class 'flufl.bounce._detectors.sina.Sina'> Running detector: <class 'flufl.bounce._detectors.llnl.LLNL'> Running detector: <class 'flufl.bounce._detectors.exim.Exim'> Running detector: <class 'flufl.bounce._detectors.qmail.Qmail'> Running detector: <class 'flufl.bounce._detectors.exchange.Exchange'> Running detector: <class 'flufl.bounce._detectors.netscape.Netscape'> Running detector: <class 'flufl.bounce._detectors.dsn.DSN'> Running detector: <class 'flufl.bounce._detectors.caiwireless.Caiwireless'> Running detector: <class 'flufl.bounce._detectors.aol.AOL'> Running detector: <class 'flufl.bounce._detectors.yale.Yale'> Running detector: <class 'flufl.bounce._detectors.smtp32.SMTP32'> Running detector: <class 'flufl.bounce._detectors.groupwise.GroupWise'> Running detector: <class 'flufl.bounce._detectors.simplewarning.SimpleWarning'> Running detector: <class 'flufl.bounce._detectors.microsoft.Microsoft'> Running detector: <class 'flufl.bounce._detectors.postfix.Postfix'> Running detector: <class 'flufl.bounce._detectors.simplematch.SimpleMatch'> Running detector: <class 'flufl.bounce._detectors.yahoo.Yahoo'>
Thank you for your time, Andy
Web References:
[1] https://www.debian.org/News/2022/2022032602 [2] https://mail.python.org/archives/list/mailman-developers@python.org/thread/3... [3] https://phabricator.wikimedia.org/T282348 [4] https://mail.python.org/archives/list/mailman-users@python.org/thread/MISUMX... [5] https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/M... [6] https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/... [7] https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/U... [8] https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/L...
On 8/5/22 09:17, summersan@nclack.k12.or.us wrote:
My Questions:
Is it normal to see large amounts of
Running detector: <class 'flufl.bounce._detectors.dsn.DSN'>
lines in logs?Should there normally be other detectors listed in the logs beyond the
DSN
one?If so, is there a setting I can change somewhere to make this happen?
Prior to flufl-bounce 4.0, it would run all detectors on every message
to the -bounces address. Beginning with 4.0, it runs the detectors in a
defined order from most specific/authoritative to least. If you are
seeing only or mostly the DSN
one, you have 4.0 and the DSN
detector
recognizes a RFC 3464 compliant bounce.
For reference, I can only find one other logcheck log that shows a
Running detector: <class 'flufl.bounce._detectors.dsn.DSN'>
line from before the recent Mailman 3 upgrade. In that past logcheck log, there are other detectors listed as well, which makes me wonder if something is still stuck with the new Mailman 3 install (or if maybe something needs to be changed in a setting file somewhere):
Probably the upgrade also upgraded flufl.bounce to 4.0.
So the real question is why are you seeing large numbers of bounces.
Since it appears they are recognized by DSN
, they probably aren't spam
to the -bounces address. Do you have verp probes enabled (verp_probes =
yes in the mta section of mailman.cfg)? If so, it's possible that you
are getting bounces for spam and the like, but the probes don't bounce
so the user's delivery is never disabled.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi Mark,
Thank you for your reply!
As you brought up, running pip list installed
shows this:
flufl.bounce 4.0
Thank you for refocusing the issue in terms of the "large numbers of bounces", too.
Here is what I have in mailman.cfg
for the [mta]
section:
[mta]
verp_confirmations: yes
verp_personalized_deliveries: yes
verp_delivery_interval: 1
Does this help narrow down the problem any further?
Thank you, Andy
On 8/5/22 11:12, summersan@nclack.k12.or.us wrote:
Thank you for refocusing the issue in terms of the "large numbers of bounces", too.
Here is what I have in
mailman.cfg
for the[mta]
section:
[mta]
verp_confirmations: yes
verp_personalized_deliveries: yes
verp_delivery_interval: 1
Does this help narrow down the problem any further?
Not really. You need to see why you are getting these bounces. What's in Mailman's bounce.log? Are bouncing users getting their delivery disabled?
What are your list(s) Bounce Processing
settings.
In particular, if you set all the Notify owner ...
settings to Yes
and Forward unrecognized bounces
to either List Admins
or Site Admin
, what do you see?
Also, if there are queued bounces in
/opt/mailman/mm/var/queue/bounces
, you can examine them with
/opt/mailman/mm/bin/mailman qfile
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Okay, thank you for the feedback!
What's in Mailman's bounce.log?
There are lots of log lines in this file, but here is an edited example line for the purpose of debugging:
Aug 04 16:59:56 2022 (70838) Member disabled_user@example.local.domain on list example_list@mailman_server.example.local.domain, bounce score = 1.
Are bouncing users getting their delivery disabled?
From looking at the settings for the example user "aliased" above, it appears that bouncing users **may not** be getting their delivery disabled. The Delivery status
is still set to Enabled
.
What are your list(s)
Bounce Processing
settings.
Here are the Bounce Processing
settings (at the time of this writing) from the example list in question (the list is referred to with a fake name for anonymity: example_list@mailman_server.example.local.domain
):
Notify owner on bounce increment: No
Notify owner on disable: Yes
Notify owner on removal: Yes
Forward unrecognized bounces: List Admins
I might be able to try turning the Notify owner on bounce increment
setting to Yes
on a different example mailing list. In the meantime, I have not seen any new bounce files in the /opt/mailman/mm/var/queue/bounces
directory. I'm not sure if that's a sign that things are working like usual now, or if something is wrong.
Previously, there were 34775
files in the /opt/mailman/mm/var/queue/bounces
directory, ranging from June 7 to August 4. For background, there's typically a time in the summer where many user accounts are purposely disabled (generally near July-August, I think). From the bounce file dates, I wonder if this bounce-related issue has been happening for longer than I first realized.
Thank you for letting me know about the bin/mailman qfile
feature! I've tried examining one of the latest past bounce files from a previous day.
Here is an edited version of that bounce file (this file relates to the example "disabled_user" from the bounce.log
), below, if it helps.
Does the example bounce file below give any clues as to what might be happening in this situation?
[----- start pickle -----]
<----- start object 1 ----->
Received: by example_list.example.local.domain (Postfix)
id F41722406EC; Thu, 4 Aug 2022 11:34:08 -0700 (PDT)
Date: Thu, 4 Aug 2022 11:34:08 -0700 (PDT)
From: MAILER-DAEMON@example_list.example.local.domain (Mail Delivery System)
Subject: Undelivered Mail Returned to Sender
To: example_list-bounces@mailman_server.example.local.domain
Auto-Submitted: auto-replied
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status;
boundary="CF79C2404AD.1659638048/example_list.example.local.domain"
Content-Transfer-Encoding: 8bit
Message-Id: <20220804183408.F41722406EC@example_list.example.local.domain>
Message-ID-Hash: U47YLWMSX2KGN2H327NOLPXJIBK2BEO6
X-Message-ID-Hash: U47YLWMSX2KGN2H327NOLPXJIBK2BEO6
X-MailFrom: <>
This is a MIME-encapsulated message.
--CF79C2404AD.1659638048/example_list.example.local.domain
Content-Description: Notification
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
This is the mail system at host example_list.example.local.domain.
I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.
For further assistance, please send mail to postmaster.
If you do so, please include this problem report. You can
delete your own text from the attached returned message.
The mail system
<disabled_user@example.local.domain>: host aspmx.l.google.com[74.125.195.27] said:
550-5.2.1 The email account that you tried to reach is disabled. Learn more
at 550 5.2.1 https://support.google.com/mail/?p=DisabledUser
q14-20020a17090311ce00b0016f145ca846si1920603plh.98 - gsmtp (in reply to
RCPT TO command)
--CF79C2404AD.1659638048/example_list.example.local.domain
Content-Description: Delivery report
Content-Type: message/delivery-status
Reporting-MTA: dns; example_list.example.local.domain
X-Postfix-Queue-ID: CF79C2404AD
X-Postfix-Sender: rfc822; example_list-bounces@mailman_server.example.local.domain
Arrival-Date: Thu, 4 Aug 2022 11:34:06 -0700 (PDT)
Final-Recipient: rfc822; disabled_user@example.local.domain
Original-Recipient: rfc822;disabled_user@example.local.domain
Action: failed
Status: 5.2.1
Remote-MTA: dns; aspmx.l.google.com
Diagnostic-Code: smtp; 550-5.2.1 The email account that you tried to reach is
disabled. Learn more at 550 5.2.1
https://support.google.com/mail/?p=DisabledUser
q14-20020a17090311ce00b0016f145ca846si1920603plh.98 - gsmtp
Final-Recipient: rfc822; disabled_user@example.local.domain
Original-Recipient: rfc822;disabled_user@example.local.domain
Action: failed
Status: 5.2.1
Remote-MTA: dns; aspmx.l.google.com
Diagnostic-Code: smtp; 550-5.2.1 The email account that you tried to reach is
disabled. Learn more at 550 5.2.1
https://support.google.com/mail/?p=DisabledUser
q14-20020a17090311ce00b0016f145ca846si1920603plh.98 - gsmtp
Final-Recipient: rfc822; disabled_user@example.local.domain
Original-Recipient: rfc822;disabled_user@example.local.domain
Action: failed
Status: 5.2.1
Remote-MTA: dns; aspmx.l.google.com
Diagnostic-Code: smtp; 550-5.2.1 The email account that you tried to reach is
disabled. Learn more at 550 5.2.1
https://support.google.com/mail/?p=DisabledUser
q14-20020a17090311ce00b0016f145ca846si1920603plh.98 - gsmtp
Final-Recipient: rfc822; disabled_user@example.local.domain
Original-Recipient: rfc822;disabled_user@example.local.domain
Action: failed
Status: 5.2.1
Remote-MTA: dns; aspmx.l.google.com
Diagnostic-Code: smtp; 550-5.2.1 The email account that you tried to reach is
disabled. Learn more at 550 5.2.1
https://support.google.com/mail/?p=DisabledUser
q14-20020a17090311ce00b0016f145ca846si1920603plh.98 - gsmtp
--CF79C2404AD.1659638048/example_list.example.local.domain
Content-Description: Undelivered Message
Content-Type: message/rfc822
Content-Transfer-Encoding: 8bit
Return-Path: <example_list-bounces@mailman_server.example.local.domain>
Received: from example_list.example.local.domain (localhost [127.0.0.1])
by example_list.example.local.domain (Postfix) with ESMTP id CF79C2404AD;
Thu, 4 Aug 2022 11:34:06 -0700 (PDT)
Received: from example_list.example.local.domain (localhost [127.0.0.1])
by example_list.example.local.domain (Postfix) with ESMTP id 3C56324021E
for <example_list@mailman_server.example.local.domain>; Thu, 4 Aug 2022 11:30:01 -0700 (PDT)
Received: from example_list.example.local.domain (localhost [127.0.0.1])
by example_list.example.local.domain (Postfix) with ESMTP id 2DBBD2406DA
for <elem_cert@mailman_server.example.local.domain>; Thu, 4 Aug 2022 11:21:16 -0700 (PDT)
Received: from mail.applitrack.com (mail.applitrack.com [65.79.190.186])
by example_list.example.local.domain (Postfix) with SMTP id 314042404AD
for <all_cert@mailman_server.example.local.domain>; Thu, 4 Aug 2022 11:17:54 -0700 (PDT)
dkim-signature: v=1; a=rsa-sha256; d=external_location.somewhere; s=frontline5819;
c=relaxed/relaxed; q=dns/txt; h=From:Reply-To:Subject:Date:Message-ID:To:MIME-Version:Content-Type:Content-Transfer-Encoding;
bh=/BQy8uOjo6VP5i27AYv2tM47vx2ChfBLnBPKOKwXD4c=;
b=kGRiFywmQoL9T76olCFdOyvReOdBLSbBtgKFwaje1w6qh4y2UJD9xLyqPQ/FrZtKHmyB22EKofBinaDrgalvjcN2g8Otj88/OppGRVWEjsATTsf7QCgZ6eqEuduVswIfdOOzGEPXDn8GNHb95oGT2jCQpimuwTsog6M9OrJp/1k=
Received: from PHLAPTWEB21 (Unknown [172.25.51.49])
by mail.external_location.somewhere
; Thu, 4 Aug 2022 13:17:53 -0500
Message-ID: <AEF0FB74-B893-4AB1-BA9A-203F1B941B74@mail.applitrack.com>
MIME-Version: 1.0
From: "example_sender@example.local.domain" <MailBot@external_location.somewhere>
To: example_list@mailman_server.example.local.domain
Date: 4 Aug 2022 13:17:52 -0500
X-MailFrom: example@external_location.somewhere
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
X-Mailman-Version: 3.3.5
Precedence: list
Content-Type: multipart/mixed; boundary="===============6481302630449704969=="
X-MailFrom: all_cert-bounces+elem_cert=mailman_server.example.local.domain@mailman_server.example.local.domain
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
Message-ID-Hash: Z3TDTO5IXUSVTGVAYF4TVT4NGUW5IHAN
X-Message-ID-Hash: Z3TDTO5IXUSVTGVAYF4TVT4NGUW5IHAN
X-MailFrom: elem_cert-bounces+example_list=mailman_server.example.local.domain@mailman_server.example.local.domain
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
Reply-To: example_sender@example.local.domain
Subject: [example_list] Example Subject
List-Id: Example List <example_list.mailman_server.example.local.domain>
List-Help: <mailto:example_list-request@mailman_server.example.local.domain?subject=help>
List-Owner: <mailto:example_list-owner@mailman_server.example.local.domain>
List-Post: <mailto:example_list@mailman_server.example.local.domain>
List-Subscribe: <mailto:example_list-join@mailman_server.example.local.domain>
List-Unsubscribe: <mailto:example_list-leave@mailman_server.example.local.domain>
--===============6481302630449704969==
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
Example Body Text
_______________________________________________
example_list mailing list -- example_list@mailman_server.example.local.domain
To unsubscribe send an email to example_list-leave@mailman_server.example.local.domain
--===============6481302630449704969==--
--CF79C2404AD.1659638048/example_list.example.local.domain--
<----- start object 2 ----->
{ '_parsemsg': False,
'listid': 'example_list.mailman_server.example.local.domain',
'original_size': 16103,
'received_time': datetime.datetime(2022, 8, 4, 18, 44, 7, 891417),
'subaddress': 'bounces',
'version': 3}
[----- end pickle -----]
Thank you! Andy
On 8/5/22 14:03, summersan@nclack.k12.or.us wrote:
Previously, there were
34775
files in the/opt/mailman/mm/var/queue/bounces
directory, ranging from June 7 to August 4. For background, there's typically a time in the summer where many user accounts are purposely disabled (generally near July-August, I think). From the bounce file dates, I wonder if this bounce-related issue has been happening for longer than I first realized.
The issue is that sometimes the bounce runner dies. This can be from an OOM error or some other reason, but I have observed it even on systems I admin. The master SHOULD restart it, but it doesn't. See https://gitlab.com/mailman/mailman/-/issues/898.
Anyway, it appears your bounce runner died sometime on or before June 7 (if you have syslog from then it might have a clue), and you didn't stop and start Mailman core until August 4 when the bounce runner started again and began chugging through it's queue.
I.e., I think everything in your installation is OK and the excessive load was just due to processing the 34775 queued bounces that should have been processed gradually over the previous 2 months. Although 500 bounces per day seems like a lot, but if you have a lot of lists, and bouncing users aren't being disabled because bounce runner isn't running, this may be normal.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Okay, that makes more sense (if that's the case)!
Thank you for the background about the bounce runner.
Okay, I've tried updating that Notify owner on bounce increment: No
setting to Notify owner on bounce increment: Yes
on a different, testing mailing list.
When I tried subscribing two example users to the list, disabled the email account of one of the example, and tried sending an email to the testing mailing list, I got the following type of email (edited):
Subject:
example_user_2@local.domain's bounce score incremented on example_list
Body:
example_user_2@local.domain's bounce score on example_list@lists.local.domain has been incremented.
The triggering DSN if available is attached.
---------- Forwarded message ----------
From: Mail Delivery System <MAILER-DAEMON@mailman_server.local.domain>
To: example_list-bounces+example_user_2=local.domain@lists.local.domain
Cc:
Bcc:
Date: Fri, 5 Aug 2022 14:13:42 -0700 (PDT)
Subject: Undelivered Mail Returned to Sender
This is the mail system at host mailman_server.local.domain.
I'm sorry to have to inform you that your message could not
be delivered to one or more recipients. It's attached below.
For further assistance, please send mail to postmaster.
If you do so, please include this problem report. You can
delete your own text from the attached returned message.
The mail system
<example_user_2@local.domain>: host aspmx.l.google.com[173.194.202.27] said:
550-5.2.1 The email account that you tried to reach is disabled. Learn more
at 550 5.2.1 https://support.google.com/mail/?p=DisabledUser
k26-20020a63561aexample_listexample_listb0041296bc96b6si3982888pgb.268 - gsmtp (in reply to
RCPT TO command)
Also, when I checked the /opt/mailman/mm/var/queue/bounces/
directory, I saw a new file there related to this test bounce. When I checked the directory a little while later, the file was no longer available.
This makes it seem like things are processing, but now I see the extended > 70 %CPU thing happening again with the related mailman
python3
command via top
.
Here's an example of strace
output from that high CPU time:
poll([{fd=21, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=21, revents=POLLIN}])
recvfrom(21, "\27\3\3\0R", 5, 0, NULL, NULL) = 5
recvfrom(21, "%u\340\0\337W\375I1\204P\363\237\272\253\22\16\220\260Z\207\267>\350\0\217\24\333Em\345\253"..., 82, 0, NULL, NULL) = 82
sendto(21, "\27\3\3\0\334\24a\334F\16\223\207\317Zp3\25Md\275\237\302\323m\263K\346\353\223W(1"..., 225, MSG_NOSIGNAL, NULL, 0) = 225
But now, a few minutes later, the %CPU is down to something more expected < 10 %CPU.
When I check strace
again, here is some example output:
lseek(22, 0, SEEK_CUR) = 0
read(22, "\200\5\225\25o\0\0\0\0\0\0\214\25mailman.email.messa"..., 4096) = 4096
read(22, "ChtmRrvIKnCG61VyFUDF+Rg9QFohd5og"..., 20480) = 20480
read(22, "gin-bottom:=\r\n10.0pt;=0D=0A\tmarg"..., 4096) = 3872
close(22) = 0
openat(AT_FDCWD, "/opt/mailman/mm/var/messages/BQ/SK/BQSK2L2B6GAF7W6MXGLTTGDUUXU5WHKT", O_RDONLY|O_CLOEXEC) = 22
fstat(22, {st_mode=S_IFREG|0660, st_size=28485, ...}) = 0
ioctl(22, TCGETS, 0x7ffcb2617920) = -1 ENOTTY (Inappropriate ioctl for device)
I don't know if this info helps, but I wonder if things are working as expected at this point?
On 8/5/22 14:30, summersan@nclack.k12.or.us wrote:
I don't know if this info helps, but I wonder if things are working as expected at this point?
As I said in my prior reply, I think so.
You may consider setting up a cron that runs, say hourly to check if bounce runner is running or if the bounce queue has more than say 10 entries, and to notify you if so.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thank you for the cron-related ideas for checking the bounce runner and/or the bounce queue!
Thanks again for your help in this thread, Andy
participants (2)
-
Mark Sapiro
-
summersan@nclack.k12.or.us