Speed up importing mailman2 lists

We are currently planning to migrate from mailman2 to mailman3 as a container, have around 1500 lists and would like to complete the migration ideally in a few hours. We would like to switch off mailman2 to have a defined state of the lists and import them to mailman3. The import of all lists via script takes about 3 days. Parallelizing the imports does not work due to lockings in the database. However, we cannot accept a downtime of 3 days. Does anyone have tips on how we can speed up the procedure or are there best practices for larger environments?
Thanks and best regards, Thomas

On 2/19/25 06:08, t.maintz@fz-juelich.de wrote:
We are currently planning to migrate from mailman2 to mailman3 as a container, have around 1500 lists and would like to complete the migration ideally in a few hours. We would like to switch off mailman2 to have a defined state of the lists and import them to mailman3. The import of all lists via script takes about 3 days. Parallelizing the imports does not work due to lockings in the database. However, we cannot accept a downtime of 3 days. Does anyone have tips on how we can speed up the procedure or are there best practices for larger environments?
What Mailman 3 version are you importing to? I ask because prior to Mailman core 3.3.1, user's passwords were imported and the encryption of the password was quite time consuming. If in fact you are importing to Mailman <3.3.1, see https://gitlab.com/mailman/mailman/-/merge_requests/565 for a patch that will speed this up considerably and see https://mail.python.org/archives/list/mailman-developers@python.org/thread/4... for discussion of this change.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro writes:
On 2/19/25 06:08, t.maintz@fz-juelich.de wrote:
We are currently planning to migrate from mailman2 to mailman3 as a container, have around 1500 lists and would like to complete the migration ideally in a few hours.
I've done two massive migrations in the last year. In the first (~20k lists), the lists were down for maybe two hours, but it took 22 hours for HyperKitty to populate, mostly in Xapian indexing we realized in post-mortem analysis. In the second (~1k lists), no perceptible downtime because they have their own bespoke archiver that has a list-manager-agnostic API.
The trick to zero delivery downtime is that you can configure your MTA to route to Mailman 3 if the list exists there, if not route to Mailman 2 if the list exists there, and if not continue to any lower priority routes. It worked as designed (mops sweat off brow ;-). We did take Postorius and the Mailman 2 management CGIs and email command addresses offline for the duration (3 hours in the first case, 30 mminutes in the second). This is sraightforward if Mailman 2 and Mailman 3 are running on the same host. Life is more complex if you're spinning up Mailman 3 in a separate node but it should be possible.
I think if you're migrating to HyperKitty you can speed up the migration by shutting off indexing, and doing that later at the cost of confusing people who expect the lists to be indexed. I'm not sure if it's possible to migrate the archives concurrently with accepting new posts, or maybe to migrate archives in advance and backfill any posts that arrive during the list migration. And there's no reason why you can't leave the legacy Mailman 2 archives up for browsing as a backup for as long as needed.
Steve

We are on the latest Mailman version.
I will test adjusting the mail routing during the migration. Thank you very much for the tips, that helps me a lot. The 3 days for the import only refer to the import of the list configurations. We will migrate the Hyperkitty archives later and only selectively after feedback from the list owners, as the import of all archives took significantly longer (weeks) in the test.
Best regards, Thomas

On 2025-02-20 07:15:57 +0000 (+0000), Stephen J. Turnbull wrote: [...]
I've done two massive migrations in the last year. In the first (~20k lists), the lists were down for maybe two hours, but it took 22 hours for HyperKitty to populate, mostly in Xapian indexing we realized in post-mortem analysis. In the second (~1k lists), no perceptible downtime because they have their own bespoke archiver that has a list-manager-agnostic API.
Our sites didn't have nearly that many mailing lists, but we were migrating from multiple side-by-side MM2 installs for domains hosted on one server to a multi-domain MM3 deployment on another. In our case we did piecemeal outages one domain at a time rather than having one mass migration for everything all at once, and that kept the impact to each individual domain/site minimal.
The trick to zero delivery downtime is that you can configure your MTA to route to Mailman 3 if the list exists there, if not route to Mailman 2 if the list exists there, and if not continue to any lower priority routes. It worked as designed (mops sweat off brow ;-). We did take Postorius and the Mailman 2 management CGIs and email command addresses offline for the duration (3 hours in the first case, 30 mminutes in the second). This is sraightforward if Mailman 2 and Mailman 3 are running on the same host. Life is more complex if you're spinning up Mailman 3 in a separate node but it should be possible.
Yes, our downtime included finalizing a warmed rsync over the Internet and waiting for DNS changes to propagate (we tried to time things so that the import occurred in parallel with DNS settling out, even with lowered TTLs it helped some for larger sites). We didn't have a lot of incentive to avoid downtime entirely in that situation, and instead just warned the various communities in advance when we'd scheduled their particular migration for and what they should expect. We relied primarily on delivery deferrals between taking a domain offline on one server and bringing up the imported copy on the other, and just accepted that the associated Web content would be offline during the maintenance window but that posts anyone sent at that time would still make it to the lists once they were back.
I think if you're migrating to HyperKitty you can speed up the migration by shutting off indexing, and doing that later at the cost of confusing people who expect the lists to be indexed. I'm not sure if it's possible to migrate the archives concurrently with accepting new posts, or maybe to migrate archives in advance and backfill any posts that arrive during the list migration. And there's no reason why you can't leave the legacy Mailman 2 archives up for browsing as a backup for as long as needed.
In our case, "as long as needed" is approximately forever, but we did install some redirects in Apache for things like list info pages and archive roots. We drew the line at trying to develop an automated mapping to redirect individual posts in the archives though, which is the main reason we keep the pre-migration pipermail content around, since we'd rather not break random links elsewhere on the Web, e.g. in news articles that link to list discussions.
Jeremy Stanley

Jeremy Stanley writes:
In our case, "as long as needed" is approximately forever, [...] since we'd rather not break random links elsewhere on the Web, e.g. in news articles that link to list discussions.
Yeah, that's a MAJOR consideration for MANY use cases. Be aware, folks!
It's a problem I didn't face in either of the massive migrations. In the bigger one a quick poll of active users suggested they didn't have such links, or if they did they'd do a search anyway. In the latter case the same archiver is still in use.

The trick to zero delivery downtime is that you can configure your MTA to route to Mailman 3 if the list exists there, if not route to Mailman 2 if the list exists there, and if not continue to any lower priority routes. It worked as designed (mops sweat off brow ;-).
Can you tell me how to implement this in Postfix? I was thinking of using a script to query the mailman3 api for incoming mail to see if the list exists there and to route the mail to mailman2 or mailman3 depending on the answer. However, I don't quite know how to teach this to postfix yet

t.maintz@fz-juelich.de writes:
The trick to zero delivery downtime is that you can configure your MTA to route to Mailman 3 if the list exists there, if not route to Mailman 2 if the list exists there, and if not continue to any lower priority routes. It worked as designed (mops sweat off brow ;-).
Can you tell me how to implement this in Postfix?
Postfix has plugins to query PostgreSQL, MySQL, and SQLite databases. Debian, Ubuntu, and SuSE at least have postfix-pgsql packages to install it. I can't speak to other flavors of OS distro or SQL database server. Here's the configuration for the queries:
# Save this in a file named /etc/mailman3/virtual.cf hosts = $YOUR_POSTGRES_HOST:5432 # usually localhost user = mailman # almost always password = $PASSWORD # for the mailman user in Postgres dbname = mailman # almost always # If you serve multiple mailman domains where list names may collide, # I think you should use (WHERE list_name = '%u' AND mail_host = '%d') # including the parentheses (I think SQL complains without them) query = SELECT list_id FROM mailinglist WHERE list_name = '%u' # This returns the input query verbatim result_format = %S
Copy that file to /etc/mailman3/transport.cf, and change the last line to result_format = lmtp:127.0.0.1:8024
Here's how I invoked it in Postfix's main.cf:
transport_maps = hash:/etc/mailman3/transport.cf virtual_alias_domains = $THE_LIST_DOMAIN virtual_alias_maps = pgsql:/etc/mailman3/virtual.cf hash:/var/lib/mailman/data/mailman-aliases # I don't recall exactly
I'm not sure what Postfix does if Postgres is down or the 'mailman' database in PostgreSQL doesn't exist (the mailinglist table can be empty, though).
This worked for me because (1) there are no user local addresses on the list domain, only a few roles like root and postmaster handled as aliases, (2) email commands were disabled. I'm sure there are alternative strategies if virtual_alias_domains won't work for you. <https://www.postfix.org/VIRTUAL_README.html#virtual_alias> was useful for me.
Postfix also knows internally how to check if a mailbox exists at a server. I don't know if you can exploit this, but if it's possible, you can use the same strategy that is described for Exim4 in the Mailman documentation.
Steve

On 2/24/25 05:36, t.maintz@fz-juelich.de wrote:
Can you tell me how to implement this in Postfix?
Here's what we do on mail.python.org which supports both Mailman 2.1 and Mailman 3 lists. We also use an alias domain as described at https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/docs/mta.ht... because python.org is a virtual_alias_domain. The relevant settings in main.cf are
alias_maps = ... hash:/srv/mailman/data/aliases lmtp_destination_recipient_limit = 1 recipient_delimiter = + relay_domains = hash:/opt/mailman/mm/var/data/postfix_domains transport_maps = ... hash:/opt/mailman/mm/var/data/postfix_lmtp virtual_alias_domains = python.org ... virtual_alias_maps = ... hash:/srv/mailman/data/virtual-mailman hash:/opt/mailman/mm/var/data/postfix_vmap
The alias_maps and the hash:/srv/mailman/data/virtual-mailman setting in virtual_alias_maps are MM 2.1 settings. The hash:/opt/mailman/mm/var/data/* settings are MM 3.
Since transport_maps takes priority over aliases, if there is a MM 3 list in hash:/opt/mailman/mm/var/data/postfix_lmtp, mail for that list will be delivered to MM 3 via LMTP. If not, and the destination is a MM 2.1 list, delivery will be via the virtual_alias_mapping hash:/srv/mailman/data/virtual-mailman
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Jeremy Stanley
-
Mark Sapiro
-
Stephen J. Turnbull
-
t.maintz@fz-juelich.de