Monthly mailman-web job using a lot of memory
Hi,
Is it normal for the monthly job:
@monthly www-data [ -f /usr/bin/django-admin ] && flock -n /var/run/mailman3-web/cron.monthly /usr/share/mailman3-web/manage.py runjobs monthly
to use nearly 3½GiB of memory?
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
770203 www-data 20 0 4516756 3.4g 1.6g R 99.7 43.5 12:08.00 python3 /usr/share/mailman3-web/manage.py runjo+
I assume it's updating the full text indexes or something. There are around 250k messages in our archives.
In the job list I see:
appname - jobname - when - help
hyperkitty - update_and_clean_index - monthly - Update the full-text index and clean old entries
so I'm guessing it's that, but there is also:
hyperkitty - update_index - hourly - Update the full-text index
which hasn't been noitceably a problem so I wonder why the "and clean old entries" bit is so much more heavyweight, what exactly it dopes and if there's any way to make it lighter on resources?
Thanks, Andy
On Fri, Jul 01, 2022 at 08:40:23AM +0000, Andy Smith wrote:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
770203 www-data 20 0 4516756 3.4g 1.6g R 99.7 43.5 12:08.00 python3 /usr/share/mailman3-web/manage.py runjo+
This has been going for nearly 6 hours now at 100% CPU and I've no idea if it's ever going to finish. If I strace it, it is doing *something*, albeit very slowly. I can't see what it needs the 100% CPU for.
Output from 10 minutes or so of:
$ sudo strace --output='|tee strace.txt' -tp 770203
is at:
http://sprunge.us/RCGLmv
It's only 145k and doesn't seem to do much. An occasional open,seek,read,close…
Any good way to find out what it's spending its CPU time on? "top" says it is 100% "user" so I think it's something the python code is doing, not anything inside the kernel.
Thanks, Andy
On Jul 1, 2022, at 6:50 AM, Andy Smith <andy@strugglers.net> wrote:
On Fri, Jul 01, 2022 at 08:40:23AM +0000, Andy Smith wrote:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 770203 www-data 20 0 4516756 3.4g 1.6g R 99.7 43.5 12:08.00 python3 /usr/share/mailman3-web/manage.py runjo+
This has been going for nearly 6 hours now at 100% CPU and I've no idea if it's ever going to finish. If I strace it, it is doing *something*, albeit very slowly. I can't see what it needs the 100% CPU for.
Output from 10 minutes or so of:
$ sudo strace --output='|tee strace.txt' -tp 770203
is at:
It's only 145k and doesn't seem to do much. An occasional open,seek,read,close…
Any good way to find out what it's spending its CPU time on? "top" says it is 100% "user" so I think it's something the python code is doing, not anything inside the kernel.
What indexer are you using? If it’s whoosh, this is wholly expected.
Switch to xapian. You’ll thank me.
There’s good implementation details in this list archive.
- Mark
mark@pdc-racing.net | 408-348-2878
On 7/1/22 1:40 AM, Andy Smith wrote:
In the job list I see:
appname - jobname - when - help
hyperkitty - update_and_clean_index - monthly - Update the full-text index and clean old entries
so I'm guessing it's that, but there is also:
hyperkitty - update_index - hourly - Update the full-text index
which hasn't been noitceably a problem so I wonder why the "and clean old entries" bit is so much more heavyweight, what exactly it dopes and if there's any way to make it lighter on resources?
The difference is the update_and_clean_index job calls hyperkitty.search_indexes.update_index() with remove=True See https://gitlab.com/mailman/hyperkitty/-/blob/master/hyperkitty/search_indexe.... The comments say "Setting remove to True is extremely slow, it needs to scan the entire index and database."
I second Mark Dadgar's comment to replace whoosh with xapian. It will help, although it requires rebuilding the entire search index.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi,
On Fri, Jul 01, 2022 at 12:57:41PM -0700, Mark Sapiro wrote:
I second Mark Dadgar's comment to replace whoosh with xapian. It will help, although it requires rebuilding the entire search index.
Thanks both. Yes I am currently using whoosh which is the Debian packages' default.
I decided to dramatically reduce the number of messages in our archives, as the vast majority of them are for lists that had never actually been looked at ever when under Mailman2, and external archives of them already exist.
For ~6k messages it still took approx 16 hours to run that monthly job, and during that time spewed out thousands of these errors:
Failed to remove document 'hyperkitty.email.29351' from Whoosh: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/haystack/backends/whoosh_backend.py", line 309, in remove self.index.delete_by_query(q=self.parser.parse('%s:"%s"' % (ID, whoosh_id))) File "/usr/lib/python3/dist-packages/whoosh/index.py", line 365, in delete_by_query w = self.writer() File "/usr/lib/python3/dist-packages/whoosh/index.py", line 464, in writer return SegmentWriter(self, **kwargs) File "/usr/lib/python3/dist-packages/whoosh/writing.py", line 515, in __init__ raise LockError whoosh.index.LockError
…so I'm not even sure if it worked.
I will definitely have to look in to switching to Xapian.
Thanks, Andy
On 7/2/22 12:04 PM, Andy Smith wrote:
For ~6k messages it still took approx 16 hours to run that monthly job, and during that time spewed out thousands of these errors:
Failed to remove document 'hyperkitty.email.29351' from Whoosh: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/haystack/backends/whoosh_backend.py", line 309, in remove self.index.delete_by_query(q=self.parser.parse('%s:"%s"' % (ID, whoosh_id))) File "/usr/lib/python3/dist-packages/whoosh/index.py", line 365, in delete_by_query w = self.writer() File "/usr/lib/python3/dist-packages/whoosh/index.py", line 464, in writer return SegmentWriter(self, **kwargs) File "/usr/lib/python3/dist-packages/whoosh/writing.py", line 515, in __init__ raise LockError whoosh.index.LockError
…so I'm not even sure if it worked.
I think the warnings come from the hourly update_index job which can't obtain a lock on the index because the update_and_clean_index job already has it locked. Once the update_and_clean_index job finishes the update_index job should run and do the right thing.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Sat, Jul 02, 2022 at 07:04:09PM +0000, Andy Smith wrote:
For ~6k messages it still took approx 16 hours to run that monthly job
[…]
I will definitely have to look in to switching to Xapian.
I switched to Xapian and the monthly reindex job now takes ~23 seconds! This seems implausibly fast but searches work; I can't see any issues.
Thanks, Andy
On Jul 4, 2022, at 3:11 AM, Andy Smith <andy@strugglers.net> wrote:
On Sat, Jul 02, 2022 at 07:04:09PM +0000, Andy Smith wrote:
For ~6k messages it still took approx 16 hours to run that monthly job
[…]
I will definitely have to look in to switching to Xapian.
I switched to Xapian and the monthly reindex job now takes ~23 seconds! This seems implausibly fast but searches work; I can't see any issues.
Yup - it really is that fast. 👍
- Mark
mark@pdc-racing.net | 408-348-2878
participants (3)
-
Andy Smith
-
Mark Dadgar
-
Mark Sapiro