On 7/26/24 4:01 PM, Mark Sapiro wrote:
Yes, I suspect this bad date is part of the issue.
I would do a few things. First,
SELECT message_id FROM hyperkitty_email WHERE thread_id = 35670;
mailmanweb=> SELECT message_id FROM hyperkitty_email WHERE thread_id = 35670; message_id
200204151506953.SM01412@sunchar.com
Yep, that matches the bad date message ID.
If this returns just the one message_id, I would then
DELETE FROM hyperkitty_thread WHERE id = 35670; DELETE FROM hyperkitty_email WHERE id = 150555;
It's giving an error due to a fk constraint on another table. I'm not familiar with the db structure enough to be certain how to clean/fix it.
mailmanweb=> DELETE FROM hyperkitty_thread WHERE id = 35670; ERROR: update or delete on table "hyperkitty_thread" violates foreign key constraint "hyperkitty_lastview_thread_id_5bd4f0ad_fk_hyperkitty_thread_id" on table "hyperkitty_lastview" DETAIL: Key (id)=(35670) is still referenced from table "hyperkitty_lastview". mailmanweb=> DELETE FROM hyperkitty_email WHERE id = 150555; ERROR: update or delete on table "hyperkitty_email" violates foreign key constraint "hyperkitty_thread_starting_email_id_fa7c55f5_fk_hyperkitt" on table "hyperkitty_thread" DETAIL: Key (id)=(150555) is still referenced from table "hyperkitty_thread".
and see if that fixes the internal server error when accessing the archive. I would also
SELECT * FROM hyperkitty_email WHERE id = 136872;
mailmanweb=> SELECT * FROM hyperkitty_email WHERE id = 136872; id | message_id | message_id_hash | subject | content | date | timezone | in_reply_to | archived_date | thread_depth | thread_order | mailinglist_id | parent_id | sender_id | thread_id | sender_name --------+------------------------------------------+----------------------------------+------------------+-------------------------------------------------------------------+------------------------+----------+-------------+------------------------+--------------+--------------+----------------+-----------+-------------------------+-----------+--------------- 136872 | 200001010102.BAA05112@genesis.domino.org | SGZJVBCSRYSHGHR4BB2632INFGYWOJYH | UK GMT roll over | +| 0100-01-01 01:02:34+00 | 0 | | 1999-12-31 20:20:13+00 | 0 | 0 | 3 | | neil@genesis.domino.org | 31067 | Neil J. McRae | | | | GMT and CET rolled over without any major incidents. The cellular+| | | | | | | | | | | | | | | networks were busy but thats to be expected. +| | | | | | | | | | | | | | | +| | | | | | | | | | | | | | | Regards, +| | | | | | | | | | | | | | | Neil. +| | | | | | | | | | | | | | | -- +| | | | | | | | | | | | | | | Neil J. McRae - Alive and Kicking. +| | | | | | | | | | | | | | | neil@DOMINO.ORG +| | | | | | | | | | | | | | | +| | | | | | | | | | | | | | | | | | | | | | | | | |
That message had "Date: Sat, 1 Jan 100 01:02:34 +0000 (GMT)" in the mbox. It looks like it should be year 2000 from the rest of the headers
This message was found by your script too: Date: changed at line 13785764 Date: Sat, 1 Jan 100 01:02:34 +0000 (GMT) Date: Fri, 31 Dec 1999 20:20:13 -0000
and maybe update the
date
in that entry.
I'm actually thinking I need to fix the archive mbox, then delete them and re-import them once they are cleaned.
Also, FYI, assuming those messages with very old Date: headers had reasonable unix from dates, the hyperkitty/contrib/cleanarch3 script would fix them, except you have to get the script from https://gitlab.com/mailman/hyperkitty/-/blob/master/hyperkitty/contrib/clean... or https://www.msapiro.net/scripts/cleanarch3 because the script in the latest release has a bug.
I tried the dry run option on your script and it gave a bunch of output for bad dates. This was absent on the one shipping with the source. What's interesting is that pipermail seems to have no issue with this, detecting the date correctly. https://mailman.nanog.org/pipermail/nanog/2000-January/137630.html
One other thing, I did notice a number of archives not showing up in sequence due to missing "In-Reply-To:" headers from the source. The first message in this is one like this, and pipermail in mmm2 seems to handle this by referencing the subject, https://mailman.nanog.org/pipermail/nanog/2002-April/151325.html, where as in hyperkitty it's an orphaned thread.
Is the archive tool in pipermail's import more robust in this manner? I'd argue it's common to have missing In-Reply-To: headers where the subject and time would need to be used to infer the likely thread. I'll agree this is a major violation of the relevant RFC's to be missing this, but many MUA's (M$) are famous for doing just this.
Thank you,
Bryan Fields
727-409-1194 - Voice http://bryanfields.net