Main daemon process sticks around after forking

Hello,
I'm running a (rather large) mailman3 instance on Debian 13 (trixie) using the distribution packages.
To manage the mailman3 daemon, the package installs this systemd service unit: https://sources.debian.org/src/mailman3/3.3.10-2/debian/mailman3.service/
It has been working fine until recently: after a reboot, we noticed the mailman3 unit consistently failing to start due to a timeout error.
I figured out that the timeout was related to the service unit type (Type=forking) and the fact that the main/parent process was forking subprocesses (and running normally), writing a PID file, but it wasn't exiting, so systemd identifies this as an error and aborts the service.
Since switching to Type=simple, it has been working fine.
I'm wondering if anyone else has seen such behavior and how I could troubleshoot it.
Thanks!
-- Jérôme

On 8/6/25 09:03, Jérôme Charaoui wrote:
I'm running a (rather large) mailman3 instance on Debian 13 (trixie) using the distribution packages.
To manage the mailman3 daemon, the package installs this systemd service unit: https://sources.debian.org/src/mailman3/3.3.10-2/debian/mailman3.service/
It has been working fine until recently: after a reboot, we noticed the mailman3 unit consistently failing to start due to a timeout error.
I figured out that the timeout was related to the service unit type (Type=forking) and the fact that the main/parent process was forking subprocesses (and running normally), writing a PID file, but it wasn't exiting, so systemd identifies this as an error and aborts the service.
The master is not supposed to exit. It continues to run and monitor the child runner processes and will under some circumstances restart a runner that dies.
Since switching to Type=simple, it has been working fine.
This is a Debian package issue and should be reported to Debian. See https://wiki.list.org/x/12812344
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

On 8/6/25 18:17, Mark Sapiro wrote:
On 8/6/25 09:03, Jérôme Charaoui wrote:
I'm running a (rather large) mailman3 instance on Debian 13 (trixie) using the distribution packages.
To manage the mailman3 daemon, the package installs this systemd service unit: https://sources.debian.org/src/mailman3/3.3.10-2/debian/mailman3.service/
It has been working fine until recently: after a reboot, we noticed the mailman3 unit consistently failing to start due to a timeout error.
I figured out that the timeout was related to the service unit type (Type=forking) and the fact that the main/parent process was forking subprocesses (and running normally), writing a PID file, but it wasn't exiting, so systemd identifies this as an error and aborts the service.
The master is not supposed to exit. It continues to run and monitor the child runner processes and will under some circumstances restart a runner that dies.
Further, our recommended systemd configuration is at
https://docs.mailman3.org/en/latest/install/virtualenv.html#starting-mailman...
and includes Type=forking
and this is the first I've heard of an issue
with that.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro wrote:
On 8/6/25 18:17, Mark Sapiro wrote:
On 8/6/25 09:03, Jérôme Charaoui wrote: I'm running a (rather large) mailman3 instance on Debian 13 (trixie) using the distribution packages. To manage the mailman3 daemon, the package installs this systemd service unit: https://sources.debian.org/src/mailman3/3.3.10-2/debian/mailman3.service/ It has been working fine until recently: after a reboot, we noticed the mailman3 unit consistently failing to start due to a timeout error. I figured out that the timeout was related to the service unit type (Type=forking) and the fact that the main/parent process was forking subprocesses (and running normally), writing a PID file, but it wasn't exiting, so systemd identifies this as an error and aborts the service. The master is not supposed to exit. It continues to run and monitor the child runner processes and will under some circumstances restart a runner that dies. Further, our recommended systemd configuration is at https://docs.mailman3.org/en/latest/install/virtualenv.html#starting-mailman... and includes Type=forking and this is the first I've heard of an issue with that.
I ended up figuring out the problem by myself:
I needed to create a list, and via the web interface it would return a 502 error after a small delay. On the command-line, the "create" operation was hanging forever, never completing.
I used strace to check out what the process was doing and saw it was in a loop attempting to get a lock on "/var/lib/mailman3/locks/mta". There were a number of files in that locks directory that also seemed stale, so I stopped the mailman3 daemons, removed all the lock files manually, and started them again. That fixed both creating a new list and the fact that the startup process was not exiting as systemd was expecting. [0]
It seems to me like this all might be caused by some code path that will retry forever to get a lock without ever timing out or logging some error.
Furthermore, I suspect the stale lock file themselves could have been a side-effect of the extremely frequent OOMs we were suffering from before disabling HYPERKITTY_MBOX_EXPORT, due to scrapers continuously hitting /export/ endpoints. [1] [2]
[0] https://gitlab.torproject.org/tpo/tpa/team/-/issues/42255 [1] https://gitlab.torproject.org/tpo/tpa/team/-/issues/41957 [2] https://gitlab.com/mailman/hyperkitty/-/issues/385
participants (2)
-
Jérôme Charaoui
-
Mark Sapiro