Eugenio Jordan (external) writes:
The idea we were considering was more or less stopping all of the components, then starting up only PostGRESQL, and then performing each one of the mentioned tasks.
In that case, your Mailman installation will basically be down (no web presence, no deliveries) for the duration of the database maintenance.
You mention that regardless the connectivity that Mailman's core processes might have with the database, the emails will be received by and queued until the connection is back again and they can be delivered to the distribution lists' members. Could you please confirm whether this task would be actually delegated to Postfix (or any MTA integrated with Mailman)?
If Mailman core is running, it will accept the mail and store it in its own queue. If not, the MTA will store it in its deferred queue until Mailman comes back up.
The big problem with delegating this to the MTA is that after two days the MTA probably won't retry for whole days, but if you do an MTA queue flush, it will probably stop other mail delivery for quite a while, since it will be occupied with processing the Mailman backlog.
The way Mailman works is that there is a master process, which manages a suite of runners. Each runner is responsible for a particular stage. The LMTP runner does nothing except accept messages from the MTA, and store each one in a file for the next runner in the chain. Each one does various things, then moves the file into the next runner's queue, until the message ends up in the outgoing queue, and Mailman's outgoing runner then picks it up and sends it to the MTA along with the list of RCPT TO addresses. When the MTA accepts the message, Mailman deletes it from its own queue. This is very much like the way that the MTA itself handles mail -- Mailman doesn't accept responsibility until the message is safe on disk, and it doesn't delete its copy until the outgoing MTA accepts responsibility.
Then, at some point in the chain of runners, the current runner will need to access the database, it will fail, and block. Messages will pile up in this runner's incoming queue until the database comes back up, at which point it will continue as if nothing happened.
The advantage to having Mailman accept and store the message is that the MTA doesn't have a backlog of its own that it keeps retrying, which can slow other mail, especially once Mailman comes back up, and the MTA gets tied up sending deferred mail to Mailman. Most MTAs can be configured to limit the rate at which they accept mail from individual users, which balances the load better. Mailman, on the other hand, just passes the deliveries to the MTA as fast as it can, which is normally what you want, and the MTA is in a better position to impose rate limits when needed.
If so, this is actually what we need: we could afford delaying the reception a couple of days, whilst we are sure that no mail will be lost.
Mailman doesn't lose mail, because it operates on the same principle of read, store, acknowledge that MTAs do.
Steve