Hi Shashikanth,
It appears I may have sent you on a wild goose chase. You have my apologies. There's a very similar report today, so this appears to be a Mailman 3 issue. For now, pleasee see Mark's post for the correct way to increase timeout: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
I am working with Mark to diagnose the underlying problem.
Steve
Stephen J. Turnbull writes:
Shashikanth Komandoor writes:
I am currently running Mailman 3 with version 3.3.1 and in
built postfix version 2.10.1-6 on RHEL 7.5 with PostgreSQL 11.7 version with default values in production environment from the past 4 months almost. As of now, we are having around 1327 lists created on this.
Aren't you the crew that modified Mailman to interface with SMS or something like that? If so, don't worry, it doesn't invalidate your warranty, but it seems possible that it has something to do with this:
After the above configuration, I found many errors like below in mailman.log file:
*sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column 'pended.id <http://pended.id>'"*
*sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.*
*sqlalchemy.exc.DatabaseError: (psycopg2.DatabaseError) error with status PGRES_TUPLES_OK and no message from the libpq*
Whether you have that modified Mailman or not, the messages above suggest that there may have been a change in the schema of the database that one side (presumably SQLAlchemy) knows about but have not been propagated to the other (PostgreSQL). Have you perhaps done an upgrade but not run the database migration script?
About the REST API not available error:
Something went wrong
Mailman REST API not available. Please start Mailman core.
But my mailman core was working fine in the background. At
the same time I was able to do other operations on a few other lists.
By the time I am getting the above error, the mailman.log
recorded the below messages.
*[2020-09-16 12:49:08 +0530] [2600] [CRITICAL] WORKER TIMEOUT (pid:20234)*
*[2020-09-16 12:49:08 +0530] [20234] [INFO] Worker exiting (pid: 20234)*
*[2020-09-16 12:49:09 +0530] [21042] [INFO] Booting worker with pid: 21042*
Something is taking too long, the worker is timing out. That's probably why you get the REST API not available message. Do the sqlalchemy errors correlate with the timeouts?
I suspect the exit and reboot of the worker is normal but I haven't looked at this part of the code.
My mailman.cfg configuration is as below. The timeout
value is the default value. I did not set any customized value.
Thanks for sending it; sometimes it speeds up debugging. But in this case you say that other lists are working, so site configuration doesn't seem to be the problem. Is this particular list special in some way? List configuration is different?
The above might be few of the errors. FYI, for the creation of each list, it is taking almost around 15 to 20 minutes. After multiple times of the above error messages, at some time, I am fortunately getting the list created or some other operation getting done.
This is quite strange. I don't see why this would suddenly start working after many minutes.
Does PostgreSQL have any logs with "interesting" (or scary) messages in them?
Maybe somebody else has some ideas, but this is as far as I can go.
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/