High Server Load problem
For about a week now, I am seeing very high server loads and it seems to have something to do with the following process hanging:
uwsgi --ini /opt/mailman-web/uwsgi.ini
I am using the docker version of Mailman. When this process hangs, the server load just goes sky high and both Postorius and Hyperkitty no longer responds. I don't have very many sites on this server and I am running about 30 lists. I really need some assistance here as this needs to be fixed. I am seeing nothing in the logs that is explaining why this process is hanging and why the server load goes through the roof.
Running docker-compose restart mailman-web kills the hanged process and the server load goes down. Postorius and Hyperkitty starts responding again. This seems to happen once a day.
Brian
So the process caused another server load issue tonight. I have some more log information:
Traceback (most recent call last): File "./manage.py", line 10, in <module> File "/usr/lib/python3.6/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line File "/usr/lib/python3.6/site-packages/django/core/management/__init__.py", line 357, in execute File "/usr/lib/python3.6/site-packages/django/__init__.py", line 16, in setup from django.urls import set_script_prefix File "/usr/lib/python3.6/site-packages/django/urls/__init__.py", line 1, in <module> from .base import ( File "/usr/lib/python3.6/site-packages/django/urls/base.py", line 8, in <module> from .exceptions import NoReverseMatch, Resolver404 File "/usr/lib/python3.6/site-packages/django/urls/exceptions.py", line 1, in <module> from django.http import Http404 File "/usr/lib/python3.6/site-packages/django/http/__init__.py", line 1, in <module> from django.http.cookie import SimpleCookie, parse_cookie File "/usr/lib/python3.6/site-packages/django/http/cookie.py", line 1, in <module> from http import cookies File "<frozen importlib._bootstrap>", line 971, in _find_and_load File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 665, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 674, in exec_module File "<frozen importlib._bootstrap_external>", line 771, in get_code File "<frozen importlib._bootstrap_external>", line 482, in _validate_bytecode_header MemoryError DAMN ! worker 1 (pid: 20) died, killed by signal 9 :( trying respawn ...
Then a little while I see a whole bunch of the following lines:
Sun Oct 27 00:13:02 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:03 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:04 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:05 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:06 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:07 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:08 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:09 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:10 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:11 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) *** Sun Oct 27 00:13:12 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) ***
Would someone help me troubleshoot this? Thank you. Brian
Brian,
Sorry for the delay, but I don't know what's going on here. Mark and Abhilash must both be traveling.
Brian Carpenter writes:
File "/usr/lib/python3.6/site-packages/django/http/cookie.py", line 1, in <module> from http import cookies File "<frozen importlib._bootstrap>", line 971, in _find_and_load File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 665, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 674, in exec_module File "<frozen importlib._bootstrap_external>", line 771, in get_code File "<frozen importlib._bootstrap_external>", line 482, in _validate_bytecode_header MemoryError DAMN ! worker 1 (pid: 20) died, killed by signal 9 :( trying respawn ...
It seems to be "Django all the way down" to here. There's nothing maintained by Mailman there, and I doubt our Django expertise extends to much of its internals.
Then a little while I see a whole bunch of the following lines:
Sun Oct 27 00:13:02 2019 - *** uWSGI listen queue of socket "0.0.0.0:8080" (fd: 8) full !!! (101/100) ***
And again, uswgi is a different project (and in Debian at least it depends on an ancient GNU package that is unmaintained which depends on an EOL'ed version of GNU Guile).
Would someone help me troubleshoot this? Thank you.
Sorry, but I have no idea what the actual trouble is. It doesn't seem to be in our code, it seems to be in the Django or uwsgi initialization before Mailman code gets started. You might try in the Django lists first, since the main problem seems to be Django exhausting memory.
The uswgi problem I believe may be a separate known bug. I seem to recall a report besides yours of uswgi falling down and being unable to get up this way without a system reboot.
I use Apache + mod_wsgi, Abhilash recommends gunicorn as the WSGI host. I'm not sure what Mark uses. Is it possible to change the WSGI host? If not, can you try a different version of uwsgi?
Steve
Thanks Steve. I am using Abhilash's docker setup for this Mailman 3 server. I think I found the problem however: dang abusive bots! Semrush and Ahref were the two culprits. I finally figured it out that this had to be a webserver issue so I checked nginx access logs and those two bots were everywhere. Nginx makes it very easy to block abusive bots so I put an end to their free access. I haven't had a repeat of the issue since.
Brian Carpenter writes:
Thanks Steve. I am using Abhilash's docker setup for this Mailman 3 server. I think I found the problem however: dang abusive bots!
Good.
Semrush and Ahref were the two culprits. I finally figured it out that this had to be a webserver issue so I checked nginx access logs and those two bots were everywhere. Nginx makes it very easy to block abusive bots so I put an end to their free access.
Useful to know!
I haven't had a repeat of the issue since.
Good!
I hope we'll have more information about uswgi in the future, as that seems to be Debian's default configuration, too.
Steve
participants (2)
-
Brian Carpenter
-
Stephen J. Turnbull