Cron Error: <mailman@gw> /opt/mailman/mm/bin/django-admin runjobs monthly
Is this a race condition?
<CUT> ERROR OCCURED IN JOB: update_and_clean_index (APP: hyperkitty) Traceback (most recent call last): File "/opt/mailman/mm/venv/lib/python3.9/site-packages/django_extensions/management/commands/runjobs.py", line 40, in runjobs job().execute() File "/opt/mailman/mm/venv/lib/python3.9/site-packages/hyperkitty/jobs/update_and_clean_index.py", line 37, in execute run_with_lock(update_index, remove=True) File "/opt/mailman/mm/venv/lib/python3.9/site-packages/hyperkitty/lib/utils.py", line 181, in run_with_lock log.exception("Failed to update the fulltext index: %s", e) File "/opt/mailman/mm/venv/lib/python3.9/site-packages/flufl/lock/_lockfile.py", line 470, in __exit__ self.unlock() File "/opt/mailman/mm/venv/lib/python3.9/site-packages/flufl/lock/_lockfile.py", line 420, in unlock raise NotLockedError('Already unlocked') flufl.lock._lockfile.NotLockedError: Already unlocked <CUT>
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
On 1/31/23 23:33, Odhiambo Washington wrote:
Is this a race condition?
<CUT> ERROR OCCURED IN JOB: update_and_clean_index (APP: hyperkitty) Traceback (most recent call last): File "/opt/mailman/mm/venv/lib/python3.9/site-packages/django_extensions/management/commands/runjobs.py", line 40, in runjobs job().execute() File "/opt/mailman/mm/venv/lib/python3.9/site-packages/hyperkitty/jobs/update_and_clean_index.py", line 37, in execute run_with_lock(update_index, remove=True) File "/opt/mailman/mm/venv/lib/python3.9/site-packages/hyperkitty/lib/utils.py", line 181, in run_with_lock log.exception("Failed to update the fulltext index: %s", e) File "/opt/mailman/mm/venv/lib/python3.9/site-packages/flufl/lock/_lockfile.py", line 470, in __exit__ self.unlock() File "/opt/mailman/mm/venv/lib/python3.9/site-packages/flufl/lock/_lockfile.py", line 420, in unlock raise NotLockedError('Already unlocked') flufl.lock._lockfile.NotLockedError: Already unlocked <CUT>
Yes, it is a race between the monthly hyperkitty update_and_clean_index job and the hourly hyperkitty update_index job
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Wed, Feb 1, 2023 at 8:31 PM Mark Sapiro <mark@msapiro.net> wrote:
On 1/31/23 23:33, Odhiambo Washington wrote:
Is this a race condition?
<CUT> ERROR OCCURED IN JOB: update_and_clean_index (APP: hyperkitty) Traceback (most recent call last): File
"/opt/mailman/mm/venv/lib/python3.9/site-packages/django_extensions/management/commands/runjobs.py",
line 40, in runjobs job().execute() File
"/opt/mailman/mm/venv/lib/python3.9/site-packages/hyperkitty/jobs/update_and_clean_index.py",
line 37, in execute run_with_lock(update_index, remove=True) File
"/opt/mailman/mm/venv/lib/python3.9/site-packages/hyperkitty/lib/utils.py",
line 181, in run_with_lock log.exception("Failed to update the fulltext index: %s", e) File
"/opt/mailman/mm/venv/lib/python3.9/site-packages/flufl/lock/_lockfile.py",
line 470, in __exit__ self.unlock() File
"/opt/mailman/mm/venv/lib/python3.9/site-packages/flufl/lock/_lockfile.py",
line 420, in unlock raise NotLockedError('Already unlocked') flufl.lock._lockfile.NotLockedError: Already unlocked <CUT>
Yes, it is a race between the monthly hyperkitty update_and_clean_index job and the hourly hyperkitty update_index job
Okay. This is a bit off from my understanding.
What is the cause and how do I mitigate it?
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
On 2/2/23 04:26, Odhiambo Washington wrote:
Okay. This is a bit off from my understanding.
What is the cause and how do I mitigate it?
The cause is the monthly hyperkitty update_and_clean_index job runs for a long time, i.e. hours, because it is rebuilding the entire search index so during the time it's running multiple hourly hyperkitty update_index jobs are started and they are competing for the same lock.
See https://gitlab.com/mailman/hyperkitty/-/blob/master/example_project/settings... for mitigation.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Fri, Feb 3, 2023 at 3:23 AM Mark Sapiro <mark@msapiro.net> wrote:
On 2/2/23 04:26, Odhiambo Washington wrote:
Okay. This is a bit off from my understanding.
What is the cause and how do I mitigate it?
The cause is the monthly hyperkitty update_and_clean_index job runs for a long time, i.e. hours, because it is rebuilding the entire search index so during the time it's running multiple hourly hyperkitty update_index jobs are started and they are competing for the same lock.
See
https://gitlab.com/mailman/hyperkitty/-/blob/master/example_project/settings... for mitigation.
What is the danger of replacing my settings.py with this one, seeing as it differs in a number of ways with mine?
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
On 2/2/23 21:25, Odhiambo Washington wrote:
On Fri, Feb 3, 2023 at 3:23 AM Mark Sapiro <mark@msapiro.net> wrote:
See
https://gitlab.com/mailman/hyperkitty/-/blob/master/example_project/settings... for mitigation.
What is the danger of replacing my settings.py with this one, seeing as it differs in a number of ways with mine?
If your's works, don't do it. The danger is the differences might be significant and break things. Also, that one is for HyperKitty only and doesn't include Postorius settings.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Fri, Feb 3, 2023 at 6:47 PM Mark Sapiro <mark@msapiro.net> wrote:
On 2/2/23 21:25, Odhiambo Washington wrote:
On Fri, Feb 3, 2023 at 3:23 AM Mark Sapiro <mark@msapiro.net> wrote:
See
https://gitlab.com/mailman/hyperkitty/-/blob/master/example_project/settings...
for mitigation.
What is the danger of replacing my settings.py with this one, seeing as it differs in a number of ways with mine?
If your's works, don't do it. The danger is the differences might be significant and break things. Also, that one is for HyperKitty only and doesn't include Postorius settings.
So in my case, I should at least increase the value of HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE to, say, 1800?
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
On 2/5/23 01:13, Odhiambo Washington wrote:
So in my case, I should at least increase the value of HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE to, say, 1800?
It probably isn't even set in your case and defaults to 900 if it isn't set. Yes, you can set it to a larger value. If your archive search engine is woosh and your archive is even moderately large, 1800 probably isn't nearly enough. There was no indication of timing in your OP, but you should determine when that cron job started and when the error was reported. The difference is the absolute minimum you should set for HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE, but you should add some margin to that.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Sun, Feb 5, 2023 at 8:32 PM Mark Sapiro <mark@msapiro.net> wrote:
On 2/5/23 01:13, Odhiambo Washington wrote:
So in my case, I should at least increase the value of HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE to, say, 1800?
It probably isn't even set in your case and defaults to 900 if it isn't set.
I have never seen this config option in any of the settings files and so believe that no one has it set - except the DEVs, maybe.
Yes, you can set it to a larger value. If your archive search engine is woosh and your archive is even moderately large, 1800 probably isn't nearly enough.
Yes, my archive search engine is whoosh. And my archives size is 2.5GB+ For starters I will make it 3600 and observe next time.
There was no indication of timing in your OP, but you should determine when that cron job started and when the error was reported. The difference is the absolute minimum you should set for HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE, but you should add some margin to that.
Sadly I deleted the email. I will have to determine those aspects next time. I need to make a request for a small improvement on the documentation at: https://docs.mailman3.org/en/latest/install/virtualenv.html#cron-jobs-for-ma... The problem I am facing (the subject of this thread is that of a race between the "monthly hyperkitty update_and_clean_index" job and the hourly "hyperkitty update_index job" Would it be possible to add such details to the section, so that it is clear what the corn jobs do and perhaps what can lead to such race conditions (if that's not too much to ask)?
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
On Sun, Feb 5, 2023 at 8:32 PM Mark Sapiro <mark@msapiro.net> wrote:
On 2/5/23 01:13, Odhiambo Washington wrote:
So in my case, I should at least increase the value of HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE to, say, 1800?
In the last episode, I did set:
HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE = 3600
However, the same error has reared its head again.
It probably isn't even set in your case and defaults to 900 if it isn't set. Yes, you can set it to a larger value. If your archive search engine is woosh and your archive is even moderately large, 1800 probably isn't nearly enough. There was no indication of timing in your OP, but you should determine when that cron job started and when the error was reported.
Now that I have the error message - mail was generated at 0600hrs - what should I grep for in mailmanweb.log to ascertain when the cron job started?
The difference is the absolute minimum you should set for HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE, but you should add some margin to that.
..so as to get this difference, that I need to set as the _ bsolute minimum_, with some margin?
Thanks in advance.
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
On 2/28/23 21:33, Odhiambo Washington wrote:
Now that I have the error message - mail was generated at 0600hrs - what should I grep for in mailmanweb.log to ascertain when the cron job started?
You could look in /etc/crontab to see what time the cron.monthly jobs are run or you could look in syslog for 'cron\.monthly'.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Wed, Mar 1, 2023 at 6:13 PM Mark Sapiro <mark@msapiro.net> wrote:
On 2/28/23 21:33, Odhiambo Washington wrote:
Now that I have the error message - mail was generated at 0600hrs - what should I grep for in mailmanweb.log to ascertain when the cron job
started?
You could look in /etc/crontab to see what time the cron.monthly jobs are run or you could look in syslog for 'cron\.monthly'.
In FreeBSD we have the /var/log/cron. Mar 1 00:00:00 gw /usr/sbin/cron[66206]: (mailman) CMD (/opt/mailman/mm/bin/django-admin runjobs monthly)
So the job was started at 00:00, and the email was generated at 06:00: 2023-03-01 06:00:52 1pXCi4-000N2h-2D <= mailman@gw.kictanet.or.ke U=mailman P=local S=1466 T="Cron <mailman@gw> /opt/mailman/mm/bin/django-admin runjobs monthly" from <mailman@gw.kictanet.or.ke> for mailman
That is 6hrs and the lock was still there. Given that my search engine is whoosh and my archive is heading to 3GB, I should probably try setting the value to 10hrs for a start?
PS: Is there any documentation (for laymen) on how to set up an alternative search engine for MM3?
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
On 3/2/23 02:46, Odhiambo Washington wrote:
PS: Is there any documentation (for laymen) on how to set up an alternative search engine for MM3?
I recommend xapian. See http://django-haystack.readthedocs.io/en/v2.4.1/installing_search_engines.ht... and https://docs.mailman3.org/en/latest/config-web.html?highlight=xapian#enable-...
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Thu, Mar 2, 2023 at 8:50 PM Mark Sapiro <mark@msapiro.net> wrote:
On 3/2/23 02:46, Odhiambo Washington wrote:
PS: Is there any documentation (for laymen) on how to set up an
alternative
search engine for MM3?
I recommend xapian. See
http://django-haystack.readthedocs.io/en/v2.4.1/installing_search_engines.ht... and
https://docs.mailman3.org/en/latest/config-web.html?highlight=xapian#enable-...
I have migrated from Whoosh to Xapian.
I re-indexed manually and:
(venv) (venv) [mailman@gw /opt/mailman/mm]$ time /opt/mailman/mm/bin/django-admin update_index Indexing 37747 emails
real 43m43.417s user 11m40.643s sys 0m25.808s (venv) (venv) [mailman@gw /opt/mailman/mm]$ ls -alh xapian_index/ total 4676760 drwxr-xr-x 2 mailman mailman 512B Mar 4 10:00 . drwxr-xr-x 10 mailman mailman 1.0K Mar 4 08:58 .. -rw-r--r-- 1 mailman mailman 92M Mar 4 10:00 docdata.glass -rw-r--r-- 1 mailman mailman 0B Mar 4 10:00 flintlock -rw-r--r-- 1 mailman mailman 146B Mar 4 10:00 iamglass -rw-r--r-- 1 mailman mailman 1.3G Mar 4 09:41 position.glass -rw-r--r-- 1 mailman mailman 619M Mar 4 10:00 postlist.glass -rw-r--r-- 1 mailman mailman 256M Mar 4 10:00 termlist.glass
(venv) (venv) [mailman@gw /opt/mailman/mm]$ du -hc -d1 fulltext_index/ # Disk space taken by Whoosh indexes 1.9G fulltext_index/ 1.9G total (venv) (venv) [mailman@gw /opt/mailman/mm]$ du -hc -d1 xapian_index/ # Disk space taken by Xapian indexes 2.2G xapian_index/ 2.2G total root@gw:/opt/mailman/mm #
Am I right to say that this should allow me to set HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE to 3600?
Thank you.
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
Odhiambo Washington writes:
I have migrated from Whoosh to Xapian.
I re-indexed manually and:
(venv) (venv) [mailman@gw /opt/mailman/mm]$ time /opt/mailman/mm/bin/django-admin update_index Indexing 37747 emails
real 43m43.417s user 11m40.643s sys 0m25.808s [...] Am I right to say that this should allow me to set HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE to 3600?
Yes, although I'd be conservative, as well as allowing for the indexes to grow over time, and set it to 5400 (ie, ~2x the elapsed time).
Steve
On Sat, Mar 4, 2023 at 7:00 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Odhiambo Washington writes:
I have migrated from Whoosh to Xapian.
I re-indexed manually and:
(venv) (venv) [mailman@gw /opt/mailman/mm]$ time /opt/mailman/mm/bin/django-admin update_index Indexing 37747 emails
real 43m43.417s user 11m40.643s sys 0m25.808s [...] Am I right to say that this should allow me to set HYPERKITTY_JOBS_UPDATE_INDEX_LOCK_LIFE to 3600?
Yes, although I'd be conservative, as well as allowing for the indexes to grow over time, and set it to 5400 (ie, ~2x the elapsed time).
Steve
Thank you.
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-)
participants (3)
-
Mark Sapiro
-
Odhiambo Washington
-
Stephen J. Turnbull