I have now switched to Xapian but, in a way, my original questions still stand. Is there a way of monitoring the progress of "update_index_one_list"? What can I do to the specification of the server to make that process go (much) faster?
By the way, just in case anyone else is looking to use Xapian, installing and setting up Xapian for HyperKitty can be boiled down to:
- Download the git repo at https://github.com/notanumber/xapian-haystack
- Run the "install_xapian.sh" script (supplying the version number of Xapian)
- pip install xapian-haystack
- Edit the MM3 settings.py file to switch to using Xapian
Regards
Philip
On Fri, 14 Jan 2022 at 07:57, Philip Colmer <philip.colmer@linaro.org> wrote:
On Fri, 14 Jan 2022 at 07:50, Mark Dadgar <mark@pdc-racing.net> wrote:
On Jan 13, 2022, at 11:46 PM, Philip Colmer <philip.colmer@linaro.org> wrote:
Yesterday, I ran "mailman-web update_index_one_list" against a mailing list. The command output "Indexing 328077 emails" and that is the last I've heard from it.
The process is still running but, looking in ~mailman/web/logs, I can't see anything happening related to that list.
Is there any way to determine how far this process has got, or what it is actually doing? (Without stopping it)
For future indexing, what makes the most difference to the speed? Does it benefit from having multiple cores if I resize the AWS instance to a larger processor? What is the bottleneck for this process?
Which indexer are you using? Whoosh is interminably slow. Think “days to generate a list index” slow.
Yeah, we're using Whoosh. Good to see it lives up to its name :)
I recommend Xapien. It is ridiculously fast.
Thank you. I had been put off using Xapian because the documentation, from a MM3-perspective, seemed sparse and confusing, but I'll stick with it to try and get it working if Whoosh is the root problem here.
Philip