Urgent help needed...

May 24, 2018

      Hello,
This morning our Mailman 3/Hyperkitty/Postorius installation has become
unresponsive. When looking at the machine, postgres is pegging all 4 CPUs
on the machine.
We did some digging in postgres, and here is the query that is causing
problems:
"SELECT hyperkitty_email.date FROM hyperkitty_email WHERE
hyperkitty_email.mailinglist_id = 'fooey@rootsweb.com' ORDER BY
hyperkitty_email.date ASC LIMIT 1;"
We have found that it doesn't matter what mailinglist_id is being used.
We have also found that this might be an issue with the LIMIT 1 - our
investigation is showing that using LIMIT 1 in postgres makes things
execute very differently than if you use a limit 2 or more, and that this
is probably what is going on.
Basically, every time this gets called, it pegs an entire CPU for 20-30
seconds. It appears to be called a LOT.
So our users are getting 504 gateway timeouts right now.
What has changed:
We have been in the process of importing our 40 million archived emails
from our old mailman 2 archives.
Questions:

We have tried doing a grep in the codebase - mailman3, hyperkitty,
postorius, django - basically anything that might be calling this code, but
we can't find it. Does anyone know what generates this query?

We are positing that it might be our archive imports that are causing
this problem. Does anyone know of any sort of limit/threshold that might
cause this to go haywire?

Any help you can give, any direction you can point us would be useful.
Right now our lists are just down for our users.
Thank you,
Darren

Darren Smith

Simon Hanna

Darren Smith

tags

participants (2)