Stephen J. Turnbull wrote:
s.dobrau@ucl.ac.uk writes:
We are currently working on a migration plan for Mailman -> Mailman > 3. We have around over 3000 mailing lists with a total of 50K > members
Medium-size. I guess a 2-CPU/4GB Linode or similar should handle the Mailman load (including Postorius and HyperKitty), you might want more memory and maybe more CPUs if you're running the RDBMS on the same host. In the 23k lists/50k users/100k inbound messages/day migration described below, >we ended up reconfiguring the VMs 3 times before declaring the system stable. IMO the 2 x 8CPU/16GB final setup was way overpowered for the need, but we didn't try to calibrate finer than that.
Yep, I have helped a migration with around 150 lists but not as much archived data, and ended up with a 2 CPU with 8GB machine. I think we could have gone lower, but found we needed the larger VM to help us import the archives. I used a managed PostGreSQL instance from Amazon for the database.
Our plan is to have core+web (instance 1) and Hyperkitty (instance > 2). With the archives on the FS local to the Hyperkitty.
The archives are in the database as BLOBs. There's no reason to have a separate instance for HyperKitty. It's not obvious to me that you need more than that 2x4 Linode for both Mailman and RDBMS, but if you're going to split it's Mailman vs RDMBS, not Mailman vs HyperKitty.
I would agree. I think the issue here is there are several ways of splitting out Mailman people get bogged down in whether they should run separate parts on different instances. In my view I have always ran Mailman, Postorius and Hyperkitty on the same host and I think it works better that way. For example, if you run Mailman and the web components on different hosts, you need to make sure the Mailman REST interface is secured, whereas if its on the same host then the REST interface can just listen on localhost. The Dockerised instances do use separate containers for Mailman Core and the web components, but that is the only real time I have seen the components split.
And all 3 tables (core, web, Hyperkitty) pointing to the same > database on a remote SQL (mysql-enterprise) instance. Using class > mailman.database.mysql.MySQLDatabase (and whatever necessary on the > Django/Hyperkitty instance).
Make sure your MySQL database(s) for Mailman are configured with the utf8mb4 option for 4 byte UTF-8 support, or it will choke on emojis and the like. I don't think there are any other gotchas for MySQL.
Are there any performance benchmarks for running MySQL vs PostGreSQL? I tend to fall back to PostGreSQL because it seems to be what is used in a lot of places, I have never recommended going with MySQL for a Mailman instance.
Note there are a lot more than 3 tables in Mailman's databases. The "traditional" configuration is a "mailman" database for core and a "mailmanweb" database for Django and the archives, but Mark says that as the tables are disjoint across Mailman, Django, and the archives there's no reason not to use a >single "mailman" database for all of them. I believe that's how this list's host is configured.
Yep, the original installation guide created one database for Mailman and the tables for all components are in the same database, this was how my larger install was done. Later revisions of the guide have us split the core vs web components to different databases, I don't think there is any performance hit either way by doing that.
Importing list configurations, users, and subscriptions into Mailman 3 is pretty fast, as long as you're not using the stock Postfix support.[2] The problem is that generating the Postfix alias files for the lists seems to be noticably linear in number of lists, which means it's quadratic for the mass import. IIRC with >5000 lists you'd be looking at >1m/list just to keep regenerating Postfix's alias database, ie 5000 minutes. I found two solutions, both of which required patching Mailman. I hope to get both into the next release. I think we got it down to <5s/list, for 5000 lists that's < 1hr.
That is very interesting I had a similar issue with the Postfix alias generation in my larger setup but didn't have time to identify the root cause. In the end I used Exim sending all mails for the list domain to Mailman which is the setup I use elsewhere and it doesn't rely on the alias file generation.
Importing archives is ... yeeech. The client decided they didn't want a ton of archives anyway (you know corporations, if it's not required by law, shred it after 6 months). I don't recall the size estimate accurately but we kept 6 months X 4500 lists, maybe 100GB out of 2.3TB of mboxes. That took 24 hours to >import into HyperKitty, without doing the full-text indexing. I do know the original conservative estimate was 20 days to import the whole 2,3 TB. The full-text indexing took more than a week if I remember correctly.
Yes, I had the same problem with the archives. I did import all the archives, in our case it took us several months to get the archives in so as not to stress load on the box. Full text indexing really caused an issue for me, I ended up disabling it for the time we were importing archives then generated a clean index once the archives were imported. This wasn't a good user experience and I would probably plan this out better next time.
It's also quite possible to migrate incrementally, a few lists at a time. Mailman 2 and Mailman 3 can coexist happily on the same host. Mark can advise on that, I think.
Its worth noting this is more difficult to do now on Debian/Ubuntu installs as Python2 has been removed. This was something I ran into myself when trying to co-exist both installs on the same box.
Andrew.