Op 5 feb 2024, om 15:45 heeft Marco van Tol <mvantol@ripe.net> het volgende geschreven:
Op 29 jan 2024, om 18:29 heeft Mark Sapiro <mark@msapiro.net <mailto:mark@msapiro.net>> het volgende geschreven:
On 1/29/24 03:09, Marco van Tol wrote:
Right now I’m back at the original issue as I see no other solution than to go back to whoosh. Thanks! Marco van Tol Paste:
Indexing 194620 emails [ERROR/MainProcess] Failed indexing 156001 - 157000 (retry 5/5): Term too long (> 245): XSUBJECThttp://www.google.com/url?q=%68%74%74%70%73%3a%2f%2f%68%64%72%65%64%74%75%62... (pid 32): Term too long (> 245): XSUBJECThttp://www.google.com/url?q=%68%74%74%70%73%3a%2f%2f%68%64%72%65%64%74%75%62...
I'm not sure why the patch you are using doesn't avoid this, but you could try the other patch at <https://github.com/alexsilva/xapian-haystack/commit/a53523d2d0d13929a0729d487e7af79b57ee17a6> instead. If that fails, you could always find the offending message in the database maybe with a query like
SELECT * FROM hyperkitty_email Where subject LIKE 'http://www.google.com/url?q=\%68\%74\%74%';
and modify or delete it - it's probably spam anyway.
I tried the other patch, which looked very promising until something that had come in over SMTP some day threw a spanner in the wheels:
[ERROR/MainProcess] Failed indexing 287001 - 288000 (retry 5/5): Term too long (> 245): XTEXTº@[åeèúp¢i*h{õimô;]>ò&žôyþiýã#dzç8"¹ë= ;æmyš€vqe.âés:æä>üzúõœ'âú·ž[]kzñ-µ€3æfdñù£8çô<bœkkñ/ãžæjïw¿òþp-ùšã7/'ûvksqé (pid 404): Term too long (> 245): XTEXTº@[ åeèúp¢i*h{õimô;]>ò&žôyþiýã#dzç8"¹ë=;æmyš€vqe.âés:æä>üzúõœ'âú·ž[]kzñ-µ€3æfdñù£8çô<bœkkñ/ãžæjïw¿òþp-ùšã7/' ûvksqé
I agree that this is very likely the result of some spam, but the main point is that something that comes in over SMTP shouldn’t cause manual work on the listserver side.
I’ll try it with a maximum length of 122 which should never explode into more than 244 bytes, unless I overlook something.
This ended up working. :-)
Summary: Patch from https://github.com/alexsilva/xapian-haystack/commit/a53523d2d0d13929a0729d48...
But with TERM_LENGTH_LIMIT = 122
And str() instead of force_str()
Thanks so much for all your efforts!
Marco van Tol