We've now completed the indexing of our lists, some of which are quite large (100K+ messages per list). I wanted to share our findings of using Xapian to do this.
Firstly, using "-v 2", e.g. "mailman-web update_index_one_list -v 2 <email address>", causes the command to print out the progress being made in batches of 1,000 messages.
Secondly, Xapian is single-threaded. It only allows one writer at a time. Therefore, we focussed on speed of the hard drives and, on the advice received from the Xapian list, memory. Since we were using AWS EC2 with EBS storage, I elected to move the EBS storage from gp2 to gp3 - which has a higher base I/O figure - and I selected a r6i.8xlarge as this guaranteed 10Gb throughput to the EBS storage. It also delivered 32 vCPU and 256GB RAM. It was possibly overkill but it did the job (see below) and we've now switched back to a t3a.xlarge.
Speed-wise, with the above configuration, the system indexed a list of nearly 85K messages in 46 minutes.
I hope that is helpful information to anyone else who finds themselves migrating large Mailman 2 archives.
Regards
Philip
On Fri, 14 Jan 2022 at 17:33, Mark Sapiro <mark@msapiro.net> wrote:
On 1/14/22 12:38 AM, Philip Colmer wrote:
I have now switched to Xapian but, in a way, my original questions still stand. Is there a way of monitoring the progress of "update_index_one_list"? What can I do to the specification of the server to make that process go (much) faster?
HyperKitty does the indexing by calling the Haystack update_index command. See
https://django-haystack.readthedocs.io/en/master/management_commands.html#up...
This command has options such as --verbosity and --workers. I think verbosity is set from the option provided to update_index_one_list and if set to 2 will give some progress info.
workers could increase parallelism but is unconditionally passed by hyperkitty as 0
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/