bulk deletion of messages from Hyperkitty
Hi, community!
After importing from Mailman 2, thousands of badly compiled messages from the past appeared with empty content on the same date (date of import).
Is there a safe way to delete them all? If we delete them from the database
- hyperkitty_email table - do we need to do some clean-up of hyperkitty_thread table as well?
Any advice?
BR, Danil Smirnov
On 2024-07-10 12:05, Danil Smirnov via Mailman-users wrote:
After importing from Mailman 2, thousands of badly compiled messages from the past appeared with empty content on the same date (date of import).
Is there a safe way to delete them all? If we delete them from the database - hyperkitty_email table - do we need to do some clean-up of hyperkitty_thread table as well?
Per https://docs.mailman3.org/en/latest/migration.html, there's a tool with mailman2 that can clean messages *prior* to import:
all mailboxes should be checked for defects before importing. Certain defects such as missing Message-ID: headers or missing or unparseable Date: headers will be corrected or ignored by the import process. The one defect that will definitely cause problems is lines beginning with From in message bodies. These will be seen as the start of a new message. There is a Mailman 2 script at $prefix/bin/cleanarch.
I'd try deleting the corrupt list, running cleanarch
on the old list,
and re-importing.
If the new list has messages that are not in the old list (i.e. it's in live production), then I'm not sure of how to proceed.
Note the linked page also mentions a tool called check_hk_import script in hyperkitty/contrib folder.
Ron,
I'm aware of the tools you mentioned and they were properly used.
Please let us focus on the question I asked - how do we delete messages of a particular date if there are too many of them so using Hyperkitty UI isn't an option?
Thank you.
On Wed, Jul 10, 2024 at 10:18 PM Ron <admin@bclug.ca> wrote:
On 2024-07-10 12:05, Danil Smirnov via Mailman-users wrote:
After importing from Mailman 2, thousands of badly compiled messages from the past appeared with empty content on the same date (date of import).
Is there a safe way to delete them all? If we delete them from the database - hyperkitty_email table - do we need to do some clean-up of hyperkitty_thread table as well?
Per https://docs.mailman3.org/en/latest/migration.html, there's a tool with mailman2 that can clean messages *prior* to import:
all mailboxes should be checked for defects before importing. Certain defects such as missing Message-ID: headers or missing or unparseable Date: headers will be corrected or ignored by the import process. The one defect that will definitely cause problems is lines beginning with From in message bodies. These will be seen as the start of a new message. There is a Mailman 2 script at $prefix/bin/cleanarch.
I'd try deleting the corrupt list, running
cleanarch
on the old list, and re-importing.If the new list has messages that are not in the old list (i.e. it's in live production), then I'm not sure of how to proceed.
Note the linked page also mentions a tool called check_hk_import script in hyperkitty/contrib folder.
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
This message sent to danil@smirnov.la
On 7/10/24 12:27 PM, Danil Smirnov via Mailman-users wrote:
I'm aware of the tools you mentioned and they were properly used.
Your issue is almost certainly caused by unescaped From
lines in
message bodies and preprocessing the mbox with the
hyperkitty/contrib/cleanarch3 script should fix those.
Please let us focus on the question I asked - how do we delete messages of a particular date if there are too many of them so using Hyperkitty UI isn't an option?
You can do
DELETE FROM hyperkitty_email WHERE archived_date = <your datetime>;
but as you note there are entries in the hyperkitty_thread table that need to be removed too. So, before doing the above, maybe something like
DELETE FROM hyperkitty_thread WHERE id IN (SELECT thread_id FROM
hyperkitty_email WHERE archived_date = <your datetime>);
but there are also possibly entries relating to these messages/threads in the hyperkitty_favorite, hyperkitty_tagging, hyperkitty_lastview and hyperkitty_vote tables. It seems unlikely that there would be such entries in your case, but if you are concerned about them, here in order are what you can do in the database.
DELETE FROM hyperkitty_favorite WHERE thread_id IN (SELECT id FROM
hyperkitty_thread WHERE id IN (SELECT thread_id FROM hyperkitty_email
WHERE archived_date = <your datetime>));
DELETE FROM hyperkitty_tagging WHERE thread_id IN (SELECT id FROM
hyperkitty_thread WHERE id IN (SELECT thread_id FROM hyperkitty_email
WHERE archived_date = <your datetime>));
DELETE FROM hyperkitty_lastview WHERE thread_id IN (SELECT id FROM
hyperkitty_thread WHERE id IN (SELECT thread_id FROM hyperkitty_email
WHERE archived_date = <your datetime>));
DELETE FROM hyperkitty_vote WHERE email_id IN (SELECT id FROM
hyperkitty_email WHERE archived_date = <your datetime>);
DELETE FROM hyperkitty_thread WHERE id IN (SELECT thread_id FROM
hyperkitty_email WHERE archived_date = <your datetime>);
DELETE FROM hyperkitty_email WHERE archived_date = <your datetime>;
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi Mark!
Mark Sapiro wrote:
On 7/10/24 12:27 PM, Danil Smirnov via Mailman-users wrote:
I'm aware of the tools you mentioned and they were properly used. Your issue is almost certainly caused by unescaped From lines in message bodies and preprocessing the mbox with the hyperkitty/contrib/cleanarch3 script should fix those.
Could you take a look at my MR for this script? Right now I have to use a locally modified version of it, as the one in contrib/ doesn't work out of the box.
Please let us focus on the question I asked - how do we delete messages of a particular date if there are too many of them so using Hyperkitty UI isn't an option? You can do DELETE FROM hyperkitty_email WHERE archived_date = <your datetime>;
Thanks a lot for this detailed walk-through, it is very helpful and I'm going to use it to clean up my migrated archives.
With best regards, Timur Bakeev.
On 7/12/24 7:17 AM, Timour Bakeev wrote:
Could you take a look at my MR for this script? Right now I have to use a locally modified version of it, as the one in contrib/ doesn't work out of the box.
I have seen https://gitlab.com/mailman/hyperkitty/-/merge_requests/631 and intend to merge it. Currently there is an unrelated CI issue with a failing test with Python >= 3.12.4, and as soon as that is resolved, I will merge !631.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Danil Smirnov
-
Mark Sapiro
-
Ron
-
Timour Bakeev