One of the simpler ways of experimenting with mailman on a restricted system is using an SBC - something like a Raspberry Pi 2 or 3, or a Beaglebone, or similar. Both have limited memory and CPU, and are relatively slow systems. I would avoid a Pi 4 because it's too powerful in this context!
Logging errors always changes the behaviour, but you can do much by noting things to internal variables and then after the event printing those (e.g. set a variable to count the number of times a try-lock operation found the lock held), and perhaps then doing the logging to a memory filesystem (Linux: tmpfs) which will have much lower latency. Normally a linux /tmp is using tmpfs.
A multicore system will have different behaviour to a single core, but the kernel will almost always be preemptive (force timeslice). So one option might be to experiment with a fast-tick (e.g. 1000Hz or higher) kernel such that the timeslices are shorter. (It used to be the case Linux ran on 100Hz - I know most kernels are now "tick-less", meaning there is no persistent timer interrupt marking time, but there will be a unit of timeslice).
A final option: if the problem is (as it appears to be) one of starvation, then perhaps the system is better run (on that hardware) with fewer worker processes. That is, cut the 13 workers down to e.g. 8. That said, starvation is something that can normally be designed out of systems though I know far to little about the architecture here to say if that is the case here.
HTH,
Ruth
On 19/06/2023 18:51, Stephen Daniel wrote:
I'm currently running the entire mailman3 stack, including postgres and postfix, plus my production web server, all on a GCP e2-small instance. e2-small is two virtual CPUs running on a single physical core, restricted to consuming at most 50% of the CPU cycles on that core. The system has 2 GB of memory, 10GB of disk space.
The entire stack runs well, albeit slowly, on this configuration, for which I pay about $13/month. The entire stack is shut down every night so I can take a snapshot while the applications are down. Once the snapshot is complete, the stack is restarted. I have no startup issues in this configuration.
It is hard to imagine a more limited pool of resources, yet everything works.
On Mon, Jun 19, 2023 at 12:38 PM Nelson Strother < justfixit@marilynstrother.com> wrote:
[My apologies for the mangled formatting of the diffs in [2] and [3] above. at least as shown on the web view of this list. Will happily repost or resend by direct email if anyone expresses interest.]
Presumably there's also a chance for the runner to detect and log the expired lock before it dies.
This may require some cleverness, as the delays introduced by recording a log message e.g. "I'm the nntp runner holding the lock, and it has not expired yet." are likely to change the results of the lock competition, maybe even allowing all runner processes to continue as intended. It is unclear to me whether a process would ever have an opportunity to record a log message e.g. "I'm the nntp runner holding the lock which has expired, but I am not dead yet." before it dies. If such an opportunity exists, it should extend the lock's lifetime via e.g Lock.refresh(timedelta(seconds=10)).
As I attempted to summarize at the end of [3] above, each time I tried shorter sleep intervals (1, 10, 15 seconds) within .../flufl/lock/_lockfile.py the original problem remains, namely one or more locks are broken and runner processes die.
On 6/19/23 03:00, Stephen J. Turnbull wrote:
The best "performance" I have yet obtained on this single core system is [by] [3] [which waits 20s before trying to obtain the lock].
Is 20s necessary? There are a lot of runners, you may be able to cut the startup time by 2-3 minutes if you can cut that to 5s or less. That would still be 100x longer than the naive #cores * seconds estimate, but perhaps significant.
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
This message sent to swd@pobox.com
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
This message sent to ruth@ivimey.org
-- Software Manager & Engineer Tel: 01223 414180 Blog: http://www.ivimey.org/blog LinkedIn: http://uk.linkedin.com/in/ruthivimeycook/