Hi,
Can someone share some hints about how to deploy redundant and load balanced Mailman3? What are your recommendations and best practices? For example what components to put on the same VM? Can we have 2VM with Mailman core/Postorious/Hyperkitty/django-mailman/mailmanclient connecting to one (the same) database? How to load balance mail routing (incoming)?
Thanks! Paweł
On Thu, Sep 17, 2020, at 8:31 AM, Pawel Grzywaczewski wrote:
Hi,
Can someone share some hints about how to deploy redundant and load balanced Mailman3? What are your recommendations and best practices? For example what components to put on the same VM? Can we have 2VM with Mailman core/Postorious/Hyperkitty/django-mailman/mailmanclient connecting to one (the same) database? How to load balance mail routing (incoming)?
The current implementation of Mailman is good for vertical scaling but not that great for horizontal scaling and redundancy.
The web frontend, Postorius & Hyperkitty, are pretty much stateless and can be horizontally scaled and redundantly deployed easily.
I am not going to recommend how to handle redundancy for database, because I am not the best person for that and I honestly don't know anything beyond using a "cloud managed" database from cloud providers for HA.
Mailman Core is a bit hard to run redundantly mostly because of it's reliance on filesystem for some data like email queues, held messages, shunted messages etc. You can potentially have two running instances, one on standby and failover to other instance. You'd need to synchronize the var directory between them using *some* mechanism, rsync or NFS or something. Although, you want to make sure that they both aren't sending out emails at the same time.
Thanks! Paweł
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- thanks, Abhilash Raj (maxking)
Thanks Abhilash,
I'm a bit confused about dependency on local files? You mean files that are created when an e-mail is processed? So where is the problem with redundancy? If this host is gone there will be another Mailman deployment able to deal with the load.
What about having 3 VM where all components of Mailman are installed. Those VMs connect to 1 database on a remote host. We deploy HAProxy in front of 3VMs. So as the result all requests (HTTP/SMTP) are distributed to 1 VM (out of 3). A given VM handles a request HTTP or SMTP, and if there is a problem (or very high load of SMTP) there are still 2 other VM that can do the job. So perhaps the question is: can we plug 3 Mailman instances into 1 DB? Is Mailman software developed taking into account that several Mailman instance writes to the same database at the same time?
Cheers, Paweł
Bottom line up front:
Mailman was designed in the context of a single host, and most installations do work that way; we at Mailman don't know a lot about multihost setups and performance yet. This can give you some pretty large installations (eg, mail.python.org is a single host, I believe). Maybe Mark will chime in on this point.
You might get better information from postmasters at the largest sites who may have experience with distributed setups. fedora.org might be a good candidate because their people put a ton of work into optimizing HyperKitty for their situation (that's why HyperKitty has all the options for redis caching etc).
Pawel Grzywaczewski writes:
I'm a bit confused about dependency on local files? You mean files that are created when an e-mail is processed?
Yes.
So where is the problem with redundancy? If this host is gone there will be another Mailman deployment able to deal with the load.
Where does the file go if the host goes down? If it's local and the host goes down, you lose or delay mail -- that should not happen if your Mailman system is up. That's why Abhilash mentions NFS, which allows another "core" host to take over if a "core" host goes down -- but that's not necessarily a good solution either. (I'm not saying it's *bad*, just that NFS has known problems for *some* applications, and I don't know if Mailman is one.) And of course the NFS server becomes a SPOF (single point of failure).
What about having 3 VM where all components of Mailman are installed. Those VMs connect to 1 database on a remote host.
But they don't connect to 1 database. Each component has its own database, and HyperKitty and Postorius also connects indirectly to core's database via core. They *could* all connect to one database server, but that server becomes a SPOF.
A given VM handles a request HTTP or SMTP, and if there is a problem (or very high load of SMTP)
Mailman doesn't handle SMTP requests, I assume you mean LMTP. I don't think that LMTP handling is very strenuous (could be wrong). The aiosmtp module can handle quite high loads, spending most time waiting for the payload (during which it handles other connections), then it dumps a file to disk, also quite fast.
As I understand it, the bottleneck is the database stuff. Mailman does a lot of database accesses to handle a post.
there are still 2 other VM that can do the job. So perhaps the question is: can we plug 3 Mailman instances into 1 DB? Is Mailman software developed taking into account that several Mailman instance writes to the same database at the same time?
I don't see why that would be an ACID problem -- the database takes care of that. I do think it would make the database bottleneck worse for performance.
Note that we're currently seeing reports of long delays (enough to cause database connection timeouts, so on the order of minutes) in creating lists when there are already a lot of lists. If I had to put money on it, I would bet that there is some procedure iterating over incremental requests in Python that should be doing "one big request" to the database. But I don't know that that's true.
I do know that there are some very big sites (such as fedora.org) that use Mailman 3. But they seem to have rather complex setups, with powerful databases and caches to speed up even that.
Finally, Mailman 3 was originally designed and tested with a single host doing all the work in mind. The most obvious symptom is the lack of authentication on the REST API. (Of course you can simulate that model with an appropriate firewall and subnetting setup.)
participants (3)
-
Abhilash Raj
-
Pawel Grzywaczewski
-
Stephen J. Turnbull