On May 5, 2020, at 2:27 PM, Mark Sapiro <mark@msapiro.net> wrote:
On 5/5/20 8:55 AM, Mark Dadgar wrote:
ERROR/MainProcess] Failed indexing 1 - 1000 (retry 5/5): Term too long (> 245): XSUBJECT95251413
It isn't clear from looking at it and I never would have noticed had I not happened to look at the raw email with less, but the XSUBJECT95251413 in the above is the entire long word. This is how less displays it
XSUBJECT9<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>5<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>2<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>5<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>1<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>4<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>1<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>3<U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B><U+200B>
Each of those <U+200B> represents a unicode zero width space.
That’s gnarly.
BTW, the patch did not fix it. I haven’t had a chance to look more deeply into it yet.
- Mark
mark@pdc-racing.net | 408-348-2878