Hi Stephen
On 19/09/2023 19:59, Stephen J. Turnbull wrote:
Did you intend to send this to me only?
Nope, apologies. I'm happy to add the list back in - I thought I hit reply-to-list.
Alex Schuilenburg writes:
Yes, I retested with from the git head as well. Same result. It sees the line
From 1st April 2019, our turnover has exceeded the threshold for
as the start of a new message and reports no anomolies although there is clearly one.
If there is an empty line preceding that one, and the "F" is in the first column (no marginal whitespace), that is considered correct behavior for processing an mbox file.
As an email separator agreed, provided the body has all "First " (preceded by a blank line and nothing preceding on the same line) suitably escaped.
If you don't like it, don't use the mbox format. The maildir message-per-file format is widely supported, and if you want a folder-per-file rather than message-per-file format, MMDF is fairly widely supported, and does not suffer from the ambiguities of mbox. All three (with variations) are supported by recent Python 3.
Understood. My issue is the lists were imported from MM2.1 in 2020 into MM3.2.1 onto hyperkitty+mariadb/mysql.
So the current installation has no mbox files - archives are stored in a mysql database.
The old MM3 installation I am migrating away from is obviously broken in that its mbox archive when downloaded is not escaped,
That's interesting. I presume that's because HyperKitty saves messages in a database rather than as mbox files, and the old mbox exporter just flattened and concatenated the messages. Or perhaps that also uses mailbox.mbox and old enough versions of that didn't From-stuff.
Spot on.
I have to move the lists onto a new Debian 12 server using the native mailman 3.3.8 & mailman-web 0+20200530-2 packages. I tried dropping in the old mailman3 database under the new software but that did not work. Instead I manually imported the old mailman3 data directly into the new mailman3 database as, after inspections, there were no new tables and only a couple of additional fields which have suitable defaults. So dumped the mailman3 data with
mysqldump --no-create-info --no-create-db --disable-keys --complete-insert --ignore-table=mailman3.alembic_version mailman3
and simply imported the dump into the new schema. OK so far. The lists showed up in postorius, obviously without archives in hyperkitty.
The normal way to upgrade a HyperKitty archive is to do nothing, just upgrade the software. I guess you moved to a new host and deleted the database? The preferred way is to dump the database to SQL, and then load it in to the new database directly rather than downloading the mbox files and importing. No ambiguity and much faster.
Thats what I thought initially, but that failed as per https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/....
As my old installation appears to have the django_migrations table inconsistent with the state of the database, and Debian have been unresponsive so far.
Unfortunately the same mailman3-style manual import of mailman3web from old mailman3web db into the new db was not possible. There were additional tables and fields that needed values rather than defaults. Instead I opted to download the mbox exports from the old installation via the web interface and import them into the new installation for expediency.This is where I was when I posted the message.
However, the old installation's mbox archive export (from the database) was problematic (e.g. 5min timeout) but after getting around those issues I ended up with some broken mbox files (e.g. the "From 1st April ..." line within a message, resulting in hyperkitty_import failures).
although I tested downloading this thread archive and it is clearly escaped. Unfortunately I cannot tell if hyperkitty_import works using the git head since I am restricted to using the version provided with Debian 12.
In July it worked for me on a site with 5300 archived lists. I assume some of them had From-stuffed lines. Can't be sure, most of the posts are machine-generated, but lines beginning with "From " are pretty common in natural English.
I'd be surprised if there were not any. I have 2003 lists and several occurrences.
There are indeed two versions though of mailbox.py on the VM:
/usr/lib/python3/dist-packages/mailman/utilities/mailbox.py: from mailman3 (3.3.8-2~deb12u1) /usr/lib/python3.11/mailbox.py from libpython3.11-stdlib (3.11.2-6)
The first applies one small change to the second (a so-called "monkey-patch") for use by Mailman core. It should only be visible to Python by the name "mailman.utilities.mailbox". The HyperKitty utilities import the name "mailbox". If overwriting the first with the second changes HyperKitty's behavior, something is wrong with sys.path.
Then I guess that is the case in Debian 12.
the latter of which appears far more substantial , so I dropped that module over the former and reran hyperkitty_import over the '>' escaped mbox, and while it did import without error, it did leave the escape in place (i.e. the 'From ...' was quoted when viewed).
This is considered correct behavior. It is not possible to determine whether the escape was in the original, or added by a receiving MTA. Better to leave it.
I thought that ">From " would be escaped to ">>From ", and so on, so the escape could easily be reversed when imported. I tested exports from the mailman-users lists and lines beginning "From " (preceded by a blank line) are escaped to ">From ", so incorrectly figured this would be unescaped by hyperkitty_import. After all, I would expect that an export of the archive to mbox, followed by a delete of the archive, followed by a hyperkitty_import of the archive, should leave you at the same place. Not with ">From " escapes in the new archives. In fact I also had a number of messages with "Message-ID: <>" and worse: all messages with attachments had the text/plain content empty.
So mbox exports from MM 3.2.1 on Debian 10 (using hyperkitty+mysql) are broken.
> The unescaped mbox import died in the same way.
As expected.
Anyhow, thanks for your suggestion. For now I can stick with a manual repair and spaced escape of From_.
If the old database is still available, I recommend dumping that and loading it into a fresh version of the DBMS. ...
Thanks for the pointer. As I had already done the same with mailman3, so repeated the excercise. The following dump and import worked.
oldhost> mysqldump --no-create-info --no-create-db --disable-keys --complete-insert mailman3web > mailman3web.sql
newhost> mysql MariaDB [(none]> use mailman3web MariaDB [mailman3web]> source mailman3web.sql
Thanks again
-- Alex