After a little digging it turns out the mailman3-web Debian GNU/Linux package I'm using is configured to use Woosh as a haystack backend. And it is not fit for volumes greater than a few hundred mega bytes. I should switch to something else from the list of supported backends https://django-haystack.readthedocs.io/en/master/backend_support.html
On 04/10/2020 13:01, Loïc Dachary wrote:
Hi,
I'm in the process of importing a large number of mbox (~30,000) for a few hundred mailing lists. So far around 300,000 mails (~12GB) were imported from two mailing lists. The hyperkitty_import process took ~8 hours and created a 10GB MySQL database. The update_index_one_list for the two lists took a total of ~72 hours and created a ~2GB worth of index in /var/lib/mailman3/web/fulltext_index, which is consistent with the fact that there are lot of attachments (probably 10GB out of 12GB).
When I search for one word via the full text search web interface, it takes around 30 seconds to complete (even when repeated twice) and I can see the process grow to use up to 6GB of resident memory. It is running on a recent physical machine with decent IO and a Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz.
Is this consistent with other people experience?
Cheers
-- Loïc Dachary, Artisan Logiciel Libre