On 05/24/2018 06:21 PM, Darren Smith wrote:
Hello,
This morning our Mailman 3/Hyperkitty/Postorius installation has become unresponsive. When looking at the machine, postgres is pegging all 4 CPUs on the machine.
We did some digging in postgres, and here is the query that is causing problems:
"SELECT hyperkitty_email.date FROM hyperkitty_email WHERE hyperkitty_email.mailinglist_id = 'fooey@rootsweb.com' ORDER BY hyperkitty_email.date ASC LIMIT 1;"
We have found that it doesn't matter what mailinglist_id is being used.
We have also found that this might be an issue with the LIMIT 1 - our investigation is showing that using LIMIT 1 in postgres makes things execute very differently than if you use a limit 2 or more, and that this is probably what is going on.
Basically, every time this gets called, it pegs an entire CPU for 20-30 seconds. It appears to be called a LOT.
So our users are getting 504 gateway timeouts right now.
What has changed:
We have been in the process of importing our 40 million archived emails from our old mailman 2 archives.
Questions:
- We have tried doing a grep in the codebase - mailman3, hyperkitty, postorius, django - basically anything that might be calling this code, but we can't find it. Does anyone know what generates this query? It looks like the query is to get the date from the last email for a mailinglist. AFAIK Hyperkitty checks this when new emails get it (maybe even during import)
It could also relate to the "active" state of mailinglists...