On 7/26/24 18:53, Bryan Fields wrote:
On 7/26/24 4:01 PM, Mark Sapiro wrote:
If this returns just the one message_id, I would then
DELETE FROM hyperkitty_thread WHERE id = 35670; DELETE FROM hyperkitty_email WHERE id = 150555;
It's giving an error due to a fk constraint on another table. I'm not familiar with the db structure enough to be certain how to clean/fix it.
Yes, you need to do these first.
DELETE FROM hyperkitty_favorite WHERE thread_id = 35670; DELETE FROM hyperkitty_tagging WHERE thread_id = 35670; DELETE FROM hyperkitty_lastview WHERE thread_id = 35670; DELETE FROM hyperkitty_vote WHERE email_id = 150555;
Not all those will exist, but then you should be able to
DELETE FROM hyperkitty_thread WHERE id = 35670; DELETE FROM hyperkitty_email WHERE id = 150555;
I'm actually thinking I need to fix the archive mbox, then delete them and re-import them once they are cleaned.
Actually, that would be safest. Fix the mbox, delete the entire archive and re-run the import.
Also, FYI, assuming those messages with very old Date: headers had reasonable unix from dates, the hyperkitty/contrib/cleanarch3 script would fix them, except you have to get the script from https://gitlab.com/mailman/hyperkitty/-/blob/master/hyperkitty/contrib/clean... or https://www.msapiro.net/scripts/cleanarch3 because the script in the latest release has a bug.
I tried the dry run option on your script and it gave a bunch of output for bad dates. This was absent on the one shipping with the source. What's interesting is that pipermail seems to have no issue with this, detecting the date correctly.
If I recall correctly, pipermail checks date skew and 'fixes' the dates in the archive, but they remain off in the mbox.
https://mailman.nanog.org/pipermail/nanog/2000-January/137630.html One other thing, I did notice a number of archives not showing up in sequence due to missing "In-Reply-To:" headers from the source. The first message in this is one like this, and pipermail in mmm2 seems to handle this by referencing the subject, https://mailman.nanog.org/pipermail/nanog/2002-April/151325.html, where as in hyperkitty it's an orphaned thread.
Yes, HyperKitty does not do threading by subject matching. It only threads based on In-Reply-To: and absent In-Reply-To:, only the last entry in References: if any.
Is the archive tool in pipermail's import more robust in this manner?
I'd argue it's common to have missing In-Reply-To: headers where the subject and time would need to be used to infer the likely thread. I'll agree this is a major violation of the relevant RFC's to be missing this, but many MUA's (M$) are famous for doing just this.
Yes, it's arguably a defect in HyperKitty to not implement Jamie Zawinski's threading algorithm <https://www.jwz.org/doc/threading.html>, but it doesn't.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan