messages stuck in the bad queue
I am working on a mail server that was migrated from mailman V2 to mailman V3 in February. I just noticed that there are over 1000 messages in the bad queue. All of the messages in the bad queue are dated 2/2/23 and all have a timestamp within a one-hour window of each other.
I used mailman qfile to inspect five random emails in the bad queue and so far they are all "successfully subscribed" emails, but I presume there are no guarantees there aren't others mixed in there that might be legitimate emails (ie. I just unshunted 140 emails in the shunt queue and they were all good emails and all got processed).
I figure that the easiest way to inspect these 1000 emails is just to have them re-delivered. I tried moving one from the bad queue to the shunt queue and I ran "mailman unshunt" but nothing happened.
Is there a way to reprocess the bad queue?
Also, what exactly is the bad queue?
Thank you, Ken KA6KEN (happy field day weekend)
On 6/24/23 9:54 PM, Ken Alker wrote:
I am working on a mail server that was migrated from mailman V2 to mailman V3 in February. I just noticed that there are over 1000 messages in the bad queue. All of the messages in the bad queue are dated 2/2/23 and all have a timestamp within a one-hour window of each other.
I used mailman qfile to inspect five random emails in the bad queue and so far they are all "successfully subscribed" emails,
This is a bit strange. See below.
but I presume there are no guarantees there aren't others mixed in there that might be legitimate emails (ie. I just unshunted 140 emails in the shunt queue and they were all good emails and all got processed).
Presumably these were shunted due to some issue that was subsequently fixed.
I figure that the easiest way to inspect these 1000 emails is just to have them re-delivered. I tried moving one from the bad queue to the shunt queue and I ran "mailman unshunt" but nothing happened.
What does nothing happened mean? "mailman unshunt" should move the message to the original queue which was stored in the 'whichq' attribute in the msgdata when the message was shunted. Since this wasn't a shunted message, there's no 'whichq' attribute so it goes to the 'in' queue. I.e., "mailman unshunt" would have moved the message from the 'shunt' queue to the 'in' queue. If the message wound up back in the shunt queue, there should be messages in mailman.log indicating why.
Is there a way to reprocess the bad queue?
You could just move the messages to the 'in' queue.
Also, what exactly is the bad queue?
If Mailman's content filtering is enabled and Filter Action is Preserve, messages which have no remaining content after content filtering are put in the 'bad' queue. These are the only messages put there. When this happens there should a log message like
<message-id> preserved in file base <queue_file>
It is unclear to me how these 1000+ messages wound up in the 'bad' queue. If you have logs from Feb 2, they might help. My guess is there was some MTA misrouting that caused these list welcome messages (from some mass subscribe?) to be rerouted to the list posting address combined with some bad content filtering settings that removed all the content, but that seems pretty far fetched.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
--On Sunday, June 25, 2023 9:18 AM -0700 Mark Sapiro <mark@msapiro.net> wrote:
On 6/24/23 9:54 PM, Ken Alker wrote:
I am working on a mail server that was migrated from mailman V2 to mailman V3 in February. I just noticed that there are over 1000 messages in the bad queue. All of the messages in the bad queue are dated 2/2/23 and all have a timestamp within a one-hour window of each other.
I used mailman qfile to inspect five random emails in the bad queue and so far they are all "successfully subscribed" emails,
This is a bit strange. See below.
but I presume there are no guarantees there aren't others mixed in there that might be legitimate emails (ie. I just unshunted 140 emails in the shunt queue and they were all good emails and all got processed).
Presumably these were shunted due to some issue that was subsequently fixed.
That is my assumption. I didn't unshunt them until I had V3.3.8 completed, figuring the previous version/install had some bug/mistake. Fortunately, they all got processed (with one problem, which I brought up in another thread).
I figure that the easiest way to inspect these 1000 emails is just to have them re-delivered. I tried moving one from the bad queue to the shunt queue and I ran "mailman unshunt" but nothing happened.
What does nothing happened mean? "mailman unshunt" should move the message to the original queue which was stored in the 'whichq' attribute in the msgdata when the message was shunted. Since this wasn't a shunted message, there's no 'whichq' attribute so it goes to the 'in' queue. I.e., "mailman unshunt" would have moved the message from the 'shunt' queue to the 'in' queue. If the message wound up back in the shunt queue, there should be messages in mailman.log indicating why.
I moved the .psv file from the bad queue into the shunt queue. I then ran "mailman unshunt" (as user 'mailman' while in the virtual environment). I tailed mailman.log during this process and no logs were spit out. The date stamp on the .psv file never changed (maybe it does not when being moved between queues?) and, AFAICT, the file never moved from the shunt queue. I waited maybe five minutes, tops.
Is there a way to reprocess the bad queue?
You could just move the messages to the 'in' queue.
I just now tried moving the same message into the 'in' queue but, again, nothing happened. I left it in there for five minutes. Do I have to run a program to get it to act on the 'in' queue (I presume that there is a "runner" that is always looking and taking care of this already as I presume this is the queue where all 'normal' traffic is handled).
Here are the (obfusacted) results of "mailman qfile /opt/mailman/mm/var/queue/shunt/1675389793.6945386+2aeaf0015558c9d8380c96142c3fa9d03a8142bc.psv" (the message I was experimenting with):
[----- start pickle -----] <----- start object 1 -----> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Subject: THE-list subscription notification From: email@obfuscated.com To: the-list-owner@lists.obfuscated.com Message-ID: <167538979369.2390432.521344217673230839@obfuscated.net> Date: Thu, 02 Feb 2023 18:03:13 -0800 Precedence: bulk
user@domain.tld has been successfully subscribed to THE-list.
<----- start object 2 -----> { '_parsemsg': False, 'envsender': 'email@obfuscated.com', 'listid': 'the-list.obfusacted.com', 'nodecorate': True, 'recipients': {'person@domain.tld'}, 'reduced_list_headers': True, 'version': 3} [----- end pickle -----]
Also, what exactly is the bad queue?
If Mailman's content filtering is enabled and Filter Action is Preserve, messages which have no remaining content after content filtering are put in the 'bad' queue. These are the only messages put there. When this happens there should a log message like
<message-id> preserved in file base <queue_file>
It is unclear to me how these 1000+ messages wound up in the 'bad' queue. If you have logs from Feb 2, they might help. My guess is there was some MTA misrouting that caused these list welcome messages (from some mass subscribe?) to be rerouted to the list posting address combined with some bad content filtering settings that removed all the content, but that seems pretty far fetched.
Unfortunately, I don't have logs going that far back. I don't think your concept is far fetched. I'm 99% sure this was the day that the migration from V2 to V3 took place so this was certainly the result of some type of mass-subscription-import into V3. The V3 that was installed or the install itself definitely had problems, which is why I just did an upgrade/overhaul to V3.3.8 this past week. Many/most of the strange issues that were occurring seem to be cleared up.
On 6/25/23 1:31 PM, Ken Alker wrote:
I moved the .psv file from the bad queue into the shunt queue. I then ran "mailman unshunt" (as user 'mailman' while in the virtual environment). I tailed mailman.log during this process and no logs were spit out. The date stamp on the .psv file never changed (maybe it does not when being moved between queues?) and, AFAICT, the file never moved from the shunt queue. I waited maybe five minutes, tops.
I forgot that the extension in the bad
queue is .psv
(for preserve).
You have to rename it to .pck
.
Is there a way to reprocess the bad queue?
You could just move the messages to the 'in' queue.
Changing .psv to .pck in the process. this can be done, e.g., by
for file in `ls var/queue/bad/*.psv;do mv /var/queue/bad/$file
/var/queue/in/${file/psv/pck};done
I just now tried moving the same message into the 'in' queue but, again, nothing happened. I left it in there for five minutes. Do I have to run a program to get it to act on the 'in' queue (I presume that there is a "runner" that is always looking and taking care of this already as I presume this is the queue where all 'normal' traffic is handled).
Again, you need to change the extension to .pck.
Here are the (obfusacted) results of "mailman qfile /opt/mailman/mm/var/queue/shunt/1675389793.6945386+2aeaf0015558c9d8380c96142c3fa9d03a8142bc.psv" (the message I was experimenting with):
[----- start pickle -----] <----- start object 1 -----> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Subject: THE-list subscription notification From: email@obfuscated.com To: the-list-owner@lists.obfuscated.com Message-ID: <167538979369.2390432.521344217673230839@obfuscated.net> Date: Thu, 02 Feb 2023 18:03:13 -0800 Precedence: bulk
user@domain.tld has been successfully subscribed to THE-list.
<----- start object 2 -----> { '_parsemsg': False, 'envsender': 'email@obfuscated.com', 'listid': 'the-list.obfusacted.com', 'nodecorate': True, 'recipients': {'person@domain.tld'}, 'reduced_list_headers': True, 'version': 3} [----- end pickle -----]
I was assuming these were list welcome messages which they aren't. They
are owner notifications, so they were sent to the -owner address and
processed through the owner-pipeline, but unless the Debian package
changed the default-owner-pipeline, it doesn't contain mime-delete so
how they ended up in the bad
queue is still a mystery.
However, what would be the point of resending these at this time?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
--On Sunday, June 25, 2023 2:47 PM -0700 Mark Sapiro <mark@msapiro.net> wrote:
On 6/25/23 1:31 PM, Ken Alker wrote:
I moved the .psv file from the bad queue into the shunt queue. I then ran "mailman unshunt" (as user 'mailman' while in the virtual environment). I tailed mailman.log during this process and no logs were spit out. The date stamp on the .psv file never changed (maybe it does not when being moved between queues?) and, AFAICT, the file never moved from the shunt queue. I waited maybe five minutes, tops.
I forgot that the extension in the
bad
queue is.psv
(for preserve). You have to rename it to.pck
.
Ah! I noticed the file extension difference but only for a fleeting moment, and I convinced myself during that moment that I must have looked at the original one (.pck) incorrectly (even though I knew it stood for "pickle"... and so the vaguely-noted discrepancy never made it to the stop of the stack... but it eventually may have, but I still wouldn't have know how that would affect the queued file. That solves that issue.
Is there a way to reprocess the bad queue?
You could just move the messages to the 'in' queue.
Changing .psv to .pck in the process. this can be done, e.g., by
for file in `ls var/queue/bad/*.psv;do mv /var/queue/bad/$file /var/queue/in/${file/psv/pck};done
I moved one message by hand, and, wow.. the runner just gobbles it up... fast!
I just now tried moving the same message into the 'in' queue but, again, nothing happened. I left it in there for five minutes. Do I have to run a program to get it to act on the 'in' queue (I presume that there is a "runner" that is always looking and taking care of this already as I presume this is the queue where all 'normal' traffic is handled).
Again, you need to change the extension to .pck.
Here are the (obfusacted) results of "mailman qfile /opt/mailman/mm/var/queue/shunt/1675389793.6945386+2aeaf0015558c9d8380c9 6142c3fa9d03a8142bc.psv" (the message I was experimenting with):
[----- start pickle -----] <----- start object 1 -----> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Subject: THE-list subscription notification From: email@obfuscated.com To: the-list-owner@lists.obfuscated.com Message-ID: <167538979369.2390432.521344217673230839@obfuscated.net> Date: Thu, 02 Feb 2023 18:03:13 -0800 Precedence: bulk
user@domain.tld has been successfully subscribed to THE-list.
<----- start object 2 -----> { '_parsemsg': False, 'envsender': 'email@obfuscated.com', 'listid': 'the-list.obfusacted.com', 'nodecorate': True, 'recipients': {'person@domain.tld'}, 'reduced_list_headers': True, 'version': 3} [----- end pickle -----]
I was assuming these were list welcome messages which they aren't. They are owner notifications, so they were sent to the -owner address and processed through the owner-pipeline, but unless the Debian package changed the default-owner-pipeline, it doesn't contain mime-delete so how they ended up in the
bad
queue is still a mystery.However, what would be the point of resending these at this time?
Because I didn't know if they were ALL subscribe notices or if there were other people's emails mixed in. I felt, at the time, that it would be easier to requeue them and then be able to manipulate the results in my email client (ie. sort by subject) than to try to do it at the Unix level. That said, since there is more than one queue I'd have to move them into, the sorting process would require manipulating/viewing the messages at the Unix level first anyway and if I have to do that I might as well just inspect them all at the Unix level and skip the re-injection, as you noted. Also, after countless hours of learning over the past two days, I'm more qualified to do that now.
So, I modified your suggested script thusly:
for file in ls var/queue/bad/*.psv
do
mailman qfile $file >>results
done
and then compared the number of times "subscription notification" appeared in the results file (grep subscription\ notification results | wc) to the total numbers of "bad" messages (ls var/queue/bad | wc) and they matched (1142 of each). And just for fun and as a second check, I did a "grep subscription results | sort | uniq" and ended up with three lines of output; one for each mailing list import, and nothing more. It's things like that that really make me appreciate and enjoy the Unix shell.
Thanks for the wisdom and for a slice of your time; I'm learning a lot and I appreciate the help.
Ken Alker KA6KEN
participants (2)
-
Ken Alker
-
Mark Sapiro