mrl@psfc.mit.edu writes:
Roland - Thanks, but I don't think that will help. Because systemctl doesn't actually check to see if a service properly started. I.e., see below.
Jul 26 04:52:11 psfcmail2 mariadbd[938]: 2023-07-26 4:52:11 0 [Note] InnoDB: Completed initialization of buffer pool Jul 26 04:52:12 psfcmail2 mariadbd[938]: 2023-07-26 4:52:12 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9811536157,9811536157 Jul 26 04:52:13 psfcmail2 systemd[1]: Starting GNU Mailing List Manager... Jul 26 04:52:25 psfcmail2 mailman[1313]: sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)")
If that's after making the "After: mariadb" change, that looks like a systemd or mariadb bug to me. If systemd is representing to Mailman that it's OK to start, but connection to the database is refused, it's NOT OK for Mailman to keep trying to start. Something is broken in the system, and Mailman's primary responsibility when it recognizes that something is broken is to *not lose mail*. So it should shut down and let the MTA queue up mail, while the system administrator fixes what's broken.
Mailman3 core needs to keep trying to access mysql, rather than stopping.
You're welcome to create a local vendor branch, but I don't see any reason to make any change in the distributed Mailman, which is not at all broken according to your story.
I could create a cron script that runs every 5 minutes, that checks if mailman3 is not running, and tries to start it. But that's a hack. :)
That's the hack you want Mailman to implement, even though Mailman is doing everything right.
There are a number of packages that do exactly this kind of thing, probably more cleanly than we could implement it. Install one of those and be done with it. In fact, almost surely systemd can do it.
Steve