--On Sunday, June 25, 2023 9:18 AM -0700 Mark Sapiro <mark@msapiro.net> wrote:
On 6/24/23 9:54 PM, Ken Alker wrote:
I am working on a mail server that was migrated from mailman V2 to mailman V3 in February. I just noticed that there are over 1000 messages in the bad queue. All of the messages in the bad queue are dated 2/2/23 and all have a timestamp within a one-hour window of each other.
I used mailman qfile to inspect five random emails in the bad queue and so far they are all "successfully subscribed" emails,
This is a bit strange. See below.
but I presume there are no guarantees there aren't others mixed in there that might be legitimate emails (ie. I just unshunted 140 emails in the shunt queue and they were all good emails and all got processed).
Presumably these were shunted due to some issue that was subsequently fixed.
That is my assumption. I didn't unshunt them until I had V3.3.8 completed, figuring the previous version/install had some bug/mistake. Fortunately, they all got processed (with one problem, which I brought up in another thread).
I figure that the easiest way to inspect these 1000 emails is just to have them re-delivered. I tried moving one from the bad queue to the shunt queue and I ran "mailman unshunt" but nothing happened.
What does nothing happened mean? "mailman unshunt" should move the message to the original queue which was stored in the 'whichq' attribute in the msgdata when the message was shunted. Since this wasn't a shunted message, there's no 'whichq' attribute so it goes to the 'in' queue. I.e., "mailman unshunt" would have moved the message from the 'shunt' queue to the 'in' queue. If the message wound up back in the shunt queue, there should be messages in mailman.log indicating why.
I moved the .psv file from the bad queue into the shunt queue. I then ran "mailman unshunt" (as user 'mailman' while in the virtual environment). I tailed mailman.log during this process and no logs were spit out. The date stamp on the .psv file never changed (maybe it does not when being moved between queues?) and, AFAICT, the file never moved from the shunt queue. I waited maybe five minutes, tops.
Is there a way to reprocess the bad queue?
You could just move the messages to the 'in' queue.
I just now tried moving the same message into the 'in' queue but, again, nothing happened. I left it in there for five minutes. Do I have to run a program to get it to act on the 'in' queue (I presume that there is a "runner" that is always looking and taking care of this already as I presume this is the queue where all 'normal' traffic is handled).
Here are the (obfusacted) results of "mailman qfile /opt/mailman/mm/var/queue/shunt/1675389793.6945386+2aeaf0015558c9d8380c96142c3fa9d03a8142bc.psv" (the message I was experimenting with):
[----- start pickle -----] <----- start object 1 -----> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Subject: THE-list subscription notification From: email@obfuscated.com To: the-list-owner@lists.obfuscated.com Message-ID: <167538979369.2390432.521344217673230839@obfuscated.net> Date: Thu, 02 Feb 2023 18:03:13 -0800 Precedence: bulk
user@domain.tld has been successfully subscribed to THE-list.
<----- start object 2 -----> { '_parsemsg': False, 'envsender': 'email@obfuscated.com', 'listid': 'the-list.obfusacted.com', 'nodecorate': True, 'recipients': {'person@domain.tld'}, 'reduced_list_headers': True, 'version': 3} [----- end pickle -----]
Also, what exactly is the bad queue?
If Mailman's content filtering is enabled and Filter Action is Preserve, messages which have no remaining content after content filtering are put in the 'bad' queue. These are the only messages put there. When this happens there should a log message like
<message-id> preserved in file base <queue_file>
It is unclear to me how these 1000+ messages wound up in the 'bad' queue. If you have logs from Feb 2, they might help. My guess is there was some MTA misrouting that caused these list welcome messages (from some mass subscribe?) to be rerouted to the list posting address combined with some bad content filtering settings that removed all the content, but that seems pretty far fetched.
Unfortunately, I don't have logs going that far back. I don't think your concept is far fetched. I'm 99% sure this was the day that the migration from V2 to V3 took place so this was certainly the result of some type of mass-subscription-import into V3. The V3 that was installed or the install itself definitely had problems, which is why I just did an upgrade/overhaul to V3.3.8 this past week. Many/most of the strange issues that were occurring seem to be cleared up.