Mailman3 core didn't start after reboot, because mysql wasn't running yet.
Hi - We had a power outage, and after my Ubuntu server rebooted , mailman 3 tried to start before mysql was running. I guess because it wasn't a clean reboot, the mysql server took much longer than normal to start, while checking databases. So mailman3 couldn't start. Normal reboots never had this problem. Should it have kept trying?
In any event, mailman3 core didn't start, but mailman3-web did start. But it wasn't very happy, of course. And any time someone tried to access the mailing list website, it would send me an error message. I had hundreds of such emails in the morning., that started with the one below.
Is there a way to configure my system to make sure this doesn't happen again? Thanks. - Mark
Service Unavailable: /mailman3/
Report at /mailman3/ Service Unavailable: /mailman3/
Request Method: GET Request URL:https://lists.psfc.mit.edu/mailman3/ Django Version: 4.1.8 Python Executable: /opt/mailman/venv/bin/uwsgi Python Version: 3.10.6 Python Path: ['.', '', '/riusMiddleware')
Dear Mark,
On 26/07/2023 17:03, Mark London wrote:
Hi - We had a power outage, and after my Ubuntu server rebooted , mailman 3 tried to start before mysql was running. I guess because it wasn't a clean reboot, the mysql server took much longer than normal to start, while checking databases. So mailman3 couldn't start. Normal reboots never had this problem. Should it have kept trying?
In any event, mailman3 core didn't start, but mailman3-web did start. But it wasn't very happy, of course. And any time someone tried to access the mailing list website, it would send me an error message. I had hundreds of such emails in the morning., that started with the one below.
Is there a way to configure my system to make sure this doesn't happen again? Thanks. - Mark
I had a similar problem and added the line
After=mysql.service
to the [Unit] section in /lib/systemd/system/mailman3.service to make sure mailman3 starts after mysql during reboots. Then I said
systemctl daemon-reload
to regenerate systemd dependencies.
But after Upgrading to Debian 11, this patch was somehow not necessary any more.
Roland
Roland - Thanks, but I don't think that will help. Because systemctl doesn't actually check to see if a service properly started. I.e., see below. Maradbd (mysql) did start before mailman3. However, it had to do a crash recovery after a non-clean reboot. So it was not ready to accept connections right away. Mailman3 then tried to start but failed. mailmanweb.service then tried to start, even though mailman3 wasn't running And tthen every attempt over the web to access the mailman3 web page, would generate an email message that was sent to me. So I had hundreds of emails when I woke up. Mailman3 core needs to keep trying to access mysql, rather than stopping. I could create a cron script that runs every 5 minutes, that checks if mailman3 is not running, and tries to start it. But that's a hack. :)
Jul 26 04:52:11 psfcmail2 mariadbd[938]: 2023-07-26 4:52:11 0 [Note] InnoDB: Completed initialization of buffer pool Jul 26 04:52:12 psfcmail2 mariadbd[938]: 2023-07-26 4:52:12 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9811536157,9811536157 Jul 26 04:52:13 psfcmail2 systemd[1]: Starting GNU Mailing List Manager...
...
ul 26 04:52:25 psfcmail2 mailman[1313]: File "/opt/mailman/venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py", line 901, in __connect Jul 26 04:52:25 psfcmail2 mailman[1313]: self.dbapi_connection = connection = pool._invoke_creator(self) Jul 26 04:52:25 psfcmail2 mailman[1313]: File "/opt/mailman/venv/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 636, in connect Jul 26 04:52:25 psfcmail2 mailman[1313]: return dialect.connect(*cargs, **cparams) Jul 26 04:52:25 psfcmail2 mailman[1313]: File "/opt/mailman/venv/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 580, in connect Jul 26 04:52:25 psfcmail2 mailman[1313]: return self.loaded_dbapi.connect(*cargs, **cparams) Jul 26 04:52:25 psfcmail2 mailman[1313]: File "/opt/mailman/venv/lib/python3.10/site-packages/pymysql/connections.py", line 352, in __init__ Jul 26 04:52:25 psfcmail2 mailman[1313]: self.connect() Jul 26 04:52:25 psfcmail2 mailman[1313]: File "/opt/mailman/venv/lib/python3.10/site-packages/pymysql/connections.py", line 668, in connect Jul 26 04:52:25 psfcmail2 mailman[1313]: raise exc Jul 26 04:52:25 psfcmail2 mailman[1313]: sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)") Jul 26 04:52:25 psfcmail2 systemd[1]: mailman3.service: Control process exited, code=exited, status=1/FAILURE Jul 26 04:52:25 psfcmail2 systemd[1]: mailman3.service: Failed with result 'exit-code'.'. Jul 26 04:52:25 psfcmail2 systemd[1]: Failed to start GNU Mailing List Manager. Jul 26 04:52:25 psfcmail2 systemd[1]: Started GNU Mailman Web UI.
Jul 26 04:52:25 psfcmail2 uwsgi[1600]: [uWSGI] getting INI configuration from /etc/mailman3/uwsgi.ini Jul 26 04:52:26 psfcmail2 mariadbd[938]: 2023-07-26 4:52:26 0 [Note] InnoDB: Read redo log up to LSN=9844697088 Jul 26 04:52:28 psfcmail2 mariadbd[938]: 2023-07-26 4:52:28 0 [Note] InnoDB: Starting final batch to recover 697 pages from redo log.
etc...
mrl@psfc.mit.edu writes:
Roland - Thanks, but I don't think that will help. Because systemctl doesn't actually check to see if a service properly started. I.e., see below.
Jul 26 04:52:11 psfcmail2 mariadbd[938]: 2023-07-26 4:52:11 0 [Note] InnoDB: Completed initialization of buffer pool Jul 26 04:52:12 psfcmail2 mariadbd[938]: 2023-07-26 4:52:12 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9811536157,9811536157 Jul 26 04:52:13 psfcmail2 systemd[1]: Starting GNU Mailing List Manager... Jul 26 04:52:25 psfcmail2 mailman[1313]: sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'localhost' ([Errno 111] Connection refused)")
If that's after making the "After: mariadb" change, that looks like a systemd or mariadb bug to me. If systemd is representing to Mailman that it's OK to start, but connection to the database is refused, it's NOT OK for Mailman to keep trying to start. Something is broken in the system, and Mailman's primary responsibility when it recognizes that something is broken is to *not lose mail*. So it should shut down and let the MTA queue up mail, while the system administrator fixes what's broken.
Mailman3 core needs to keep trying to access mysql, rather than stopping.
You're welcome to create a local vendor branch, but I don't see any reason to make any change in the distributed Mailman, which is not at all broken according to your story.
I could create a cron script that runs every 5 minutes, that checks if mailman3 is not running, and tries to start it. But that's a hack. :)
That's the hack you want Mailman to implement, even though Mailman is doing everything right.
There are a number of packages that do exactly this kind of thing, probably more cleanly than we could implement it. Install one of those and be done with it. In fact, almost surely systemd can do it.
Steve
Stephen - I just looked at a Sympa mailing list running on one of our other VMs. It ran into the same problem with MYSQL, when it also was rebooted. However, it was able to eventually restart on it's own, because of these lines in it's service files.
Restart=always RestartSec=15
I think that is the solution? Now I understand why I never had a similar problem with Sympa. Thanks. - Mark
On 7/28/2023 12:51 AM, Stephen J. Turnbull wrote:
There are a number of packages that do exactly this kind of thing, probably more cleanly than we could implement it. Install one of those and be done with it. In fact, almost surely systemd can do it.
Mark London writes:
Stephen - I just looked at a Sympa mailing list running on one of our other VMs. � It ran into the same problem with MYSQL, when it also was rebooted. � However,� it was able to eventually restart on it's own, because of these lines in it's service files.
Restart=always RestartSec=15
That's a systemd file, no?
Mailman 3 core itself has never distributed a systemd unit file, although there are some suggestions about it in the documentation: https://docs.mailman3.org/en/latest/install/virtualenv.html#starting-mailman...
We could add that to the documentation as advice for MySQL/MariaDB users specifically, and generally for any external RDBMS.
Note that it should not be a problem for PostgreSQL, as PostgreSQL supports and recommends configuration with the systemd "notify=" setting.
I have submitted this MR to document the issue and the proposed workaround; https://gitlab.com/mailman/mailman-suite-doc/-/merge_requests/129
participants (4)
-
Mark London
-
mrl@psfc.mit.edu
-
Roland Miyamoto
-
Stephen J. Turnbull