Thanks for your patience with our team; I had already received your second "ping" and Mark's response, but due to the way my MUA presents my INBOX I didn't see it (aside to self: *don't* answer Mailman traffic from the inbox, use the "Mailman 3 virtual folder" view! :-). I would have written very differently if I had seen it....
Stephan Krinetzki writes:
Then we should maybe move to gunicorn, if this is the better way (or at least better supported way)
That's definitely a possibility. Among other things, gunicorn is a Python application, so all the Mailman devs have some hope of debugging it or at least localizing it. That's not true of uwsgi. The other possibility if you don't need high performance is to use the Apache mod_wsgi plugin which seems to be quite stable (that's what I do out of laziness -- I really should have uwsgi and gunicorn setups for testing -- my own throughput and admin needs are very modest).
After we enabled the "slow query log", we got in the postgresql database the following output:
2022-01-25 12:49:41.538 CET [16126] LOG: unexpected EOF on client connection with an open transaction
Ouch. That is a database problem but I don't think it's a problem with the database. There are two database connections involved, one is Mailman core, the other is Django, but this
the message is delivered to the list immediately,
says that core has finished at least some of its business. I guess it's possible that core drops the transaction on the floor, but that seems weird because I don't think this is a common problem. I'm not going to guess further, maybe Mark or Abhilash has something.
ISTR other delay issues that have to do with very large databases. Do you have huge numbers of users, lists, or subscriptions to particular lists?
Of course it could be Postorius/Django, but I think Postorius delegates all the logic and database work to core for this. Django's databases are user- and authentication-oriented, I don't recall any list-management logic in it that would induce this:
2022-01-25 15:34:36.047 CET [16864] LOG: duration: 30760.028 ms statement: DELETE FROM pendedkeyvalue WHERE pendedkeyvalue.id = XXX
Next log:
ERROR 2022-01-25 18:55:55,037 2599 django.request Service Unavailable: /postorius/lists/games.lists.rwth-aachen.de/held_messages ERROR 2022-01-25 18:58:24,325 2588 postorius Mailman REST API not available
Not sure about this. It could indicate a deadlock where each of core and the backend database is waiting for the other, so core isn't able to respond to Postorius.
It looks to me like the rest of the traceback is a natural consequence of unavailability of the REST API.
- The venv is in /opt/mailman/mailman-venv
- The settings are in /opt/mailman/mailman-venv/mailman-suite/mailman-suite_project
I don't see much to flag here. I don't recall whether allauth.socialaccount is supposed to be enabled or disabled if you have no allauth.socialaccount.providers enabled, but it's probably OK (and I don't see how it could be related to this problem):
contenten of settings.py # Application definition
INSTALLED_APPS = (
[...] 'allauth', 'allauth.account', 'allauth.socialaccount',
Database looks fine:
# Database # https://docs.djangoproject.com/en/1.8/ref/settings/#databases DATABASES = { #'default': { # Example for PostgreSQL (recommanded for production): 'default': { 'ENGINE': 'django.db.backends.postgresql_psycopg2', 'NAME': 'mailman', 'USER': 'mailman', 'PASSWORD': '<PASS>', 'HOST': 'localhost', } }
This shouldn't have anything to do with slow Postorius (full-text search is relevant only to archives, ie, HyperKitty), but Whoosh is known to be slower than Xapian:
# # Full-text search engine # HAYSTACK_CONNECTIONS = { 'default': { 'ENGINE': 'haystack.backends.whoosh_backend.WhooshEngine', 'PATH': os.path.join(BASE_DIR, "fulltext_index"), # You can also use the Xapian engine, it's faster and more accurate, # but requires another library. # http://django-haystack.readthedocs.io/en/v2.4.1/installing_search_engines.html#xapian # Example configuration for Xapian: #'ENGINE': 'xapian_backend.XapianEngine' }, }
Nope, the virtual machine is dedicated to Mailman3. Mailman3 as venv, uwsgi as Webservice, Apache as Webserver, Postfix as Mailserver and PostgreSQL 10 als Database, all on the same host
Oh, well, I guess the case against Mailman gets a little stronger....
At moment i think its a slow database. Because of the slow database, the Mailman API is slow and so on. We are trying now to solve the database part with postgresqltuner - maybe we will see more details then.
OK, keep us posted. I will say that it looks from the logs discussed above that PostgreSQL is waiting for Mailman, not vice versa (but that doesn't rule out a deadlock where both are waiting for the other). I hope my message trims things down enough that it rings a bell for somebody.
Steve