Answering previous two messages in one reply.
Dan Caballero writes:
Yesterday we processed just over 60,000 messages to recipients. Some of our lists have 2000+ subscribers. We currently run Mailman as a Docker container in AWS on a t3.xlarge instance. 4 CPU 16GB RAM
OK, so that's the same order of magnitude. Interesting that 4CPUs seems to be enough.
90% of the time the system is handling incoming messages and the backlog of .pck files in the out queue doesn't last more than 10-15 minutes. Even if a single message may take longer to process others move through via other runners.
Right, but the thing is this is not a single-queue multiserver model. In queue theory terms, it's a queue-per-slice model. Load balancing is done by pseudorandomizing queue assignment. So if the head of the queue gets stuff, the whole slice is stuck, but the queue manager keeps adding to that slice.
It does sound like you're not seeing the stalling problem we did. I wonder why not, it looks like you're using Postfix, too.
So is there a 1:1 relationship between the number of runners and the maximum number of messages that may be processed with .bak files in the out queue?
I don't know about the relationship to the .bak files. I don't remember offhand how they get cleaned up. But runners handle only one message at a time. If you have 32 runners, there will be at most 32 messages being processed at a time. However, normally a runner can process several messages per second. So given 2 hours offline or whatever, 60k/12 = 5k messages would showing up at once. A single runner should be able to handle that in another two hours, so I think this indeed explained by outgoing gateway throttling.
Again, Dan Caballero writes:
We relay the list messages through an external SMTP host. The administrator tried increasing the limit on the number of client connections as they had set it fairly low.
smtpd_client_connection_count_limit
Oops, forgot about this. My client on the large site had already tweaked that because they had that setting for the Mailman 2 site we were migrating. He mentioned it but it wasn't top of mind for me because I didn't have access to the SMTP gateway host.
They've increased the threshold and that seems to have helped as I immediately saw an increasing both via lsof -i:25 and the number of .bak files in the out queue directory.
Yay!
Regards, Steve