hyperkitty failed to create a thread

What could be a reason of HyperKitty failing to create a thread from chained emails?

For example, see "RECENTLY ACTIVE DISCUSSIONS" on page https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/

"Openvpn and Network mapping" discussion displayed as five threads instead of only one...

This looks exactly like the problem I have been experiencing, see <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/C...>

...

Any ideas how to improve HyperKitty ability to detect threaded messages?

Sadly, no. Let's hope the problem gets resolved.

Marvin

-- Blog: https://mg.guelker.eu PGP/GPG ID: F1D8799FBCC8BC4F

Abhilash Raj

6:01 p.m.

On Mon, Feb 11, 2019, at 9:55 AM, Marvin Gülker wrote:

...

Hi,

Am 11. Februar 2019 um 19:47 Uhr +0200 schrieb Danil Smirnov:

...
What could be a reason of HyperKitty failing to create a thread from chained emails?

For example, see "RECENTLY ACTIVE DISCUSSIONS" on page https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/

"Openvpn and Network mapping" discussion displayed as five threads instead of only one...

This looks exactly like the problem I have been experiencing, see <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/C...>

...
Any ideas how to improve HyperKitty ability to detect threaded messages?

Sadly, no. Let's hope the problem gets resolved.

Please create a bug report at https://gitlab.com/mailman/hyperkitty/issues with the affected messages.

It would help us recreate the problem if you could attach a raw messages in the issue. I am assuming the messages aren't sensitive since the archive are open. Otherwise, feel free to omit any sensitive information.

I haven't looked at the threading code in HK since a long time, but my impression is that it looks at the 'In-Reply-To' header to figure out if the current email is a response to a previous email that it received.

In some cases, the order of the incoming messages can cause this to break, like when the reply comes in first and the original messages comes in later. I am not sure exactly what is happening here, but looking at the raw messages should help us reproduce and possibly fix the issue.

-- thanks, Abhilash Raj (maxking)

tlhackque

6:27 p.m.

On 11-Feb-19 13:01, Abhilash Raj wrote:

...

On Mon, Feb 11, 2019, at 9:55 AM, Marvin Gülker wrote:

...
Hi,

Am 11. Februar 2019 um 19:47 Uhr +0200 schrieb Danil Smirnov:

...
What could be a reason of HyperKitty failing to create a thread from chained emails?

For example, see "RECENTLY ACTIVE DISCUSSIONS" on page https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/

"Openvpn and Network mapping" discussion displayed as five threads instead of only one... This looks exactly like the problem I have been experiencing, see <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/C...>

...
Any ideas how to improve HyperKitty ability to detect threaded messages? Sadly, no. Let's hope the problem gets resolved. Please create a bug report at https://gitlab.com/mailman/hyperkitty/issues with the affected messages.

It would help us recreate the problem if you could attach a raw messages in the issue. I am assuming the messages aren't sensitive since the archive are open. Otherwise, feel free to omit any sensitive information.

I haven't looked at the threading code in HK since a long time, but my impression is that it looks at the 'In-Reply-To' header to figure out if the current email is a response to a previous email that it received.

In some cases, the order of the incoming messages can cause this to break, like when the reply comes in first and the original messages comes in later. I am not sure exactly what is happening here, but looking at the raw messages should help us reproduce and possibly fix the issue.

I believe that for threading to work more or less reliably, the thing to do is to look at the 'References' header.

That should give you the thread in order, and allow any message received out of sequence to be put in the proper location in your display.

'References' should be a superset of Reply-To (which is at most 1 Message-ID), so you only need Reply-To if there is no References - or to handle a client that doesn't obey the RFCs.

See https://tools.ietf.org/html/rfc5537#page-15 & https://tools.ietf.org/html/rfc5322#section-3.6.4

Stephen J. Turnbull

4:09 p.m.

New subject: Implementing threading [was: hyperkitty failed to create a thread]

tlhackque writes:

...

I believe that for threading to work more or less reliably, the thing to do is to look at the 'References' header.

That should give you the thread in order, and allow any message received out of sequence to be put in the proper location in your display.

This is more difficult than it seems, since References is not defined for mail as far as I know (it's a netnews concept originally, and RFC 5337 is a netnews-specific RFC). Although it is adopted by most mail clients, there's no guarantee of strict conformance. In mail, In-Reply-To is more reliably present, but of course asynchronicity means you have no guarantee of a complete set of all links. Even if all clients conform to the netnews RFC, you still need to create the full tree (full conformance to RFC 5537 means it will be a tree), and break ties between branches in some arbitrary way to create a total order. It's worse if you *don't* have conformance from all clients: you can get a DAG or even something that isn't even a DAG. So, even today, you can't assume a well-behaved ancestry graph.

...

'References' should be a superset of Reply-To (which is at most 1 Message-ID), so you only need Reply-To if there is no References - or to handle a client that doesn't obey the RFCs.

I suppose HyperKitty uses References (it works for messages that have proper Message-IDs ;-), but I don't know what algorithm it uses. Might be worth looking into, as well as considering a more Postelian parsing of Message-IDs. Specifically, take the field body, unfold it, strip leading and trailing whitespace and leading "<" and trailing ">", and whatever's left is the message ID.

Alternatively, strip everything that's not atext or "@" (including inside the purported Message-ID). This won't break any RFC 5537-valid Message-IDs, but might identify two different, nonconforming Message-IDs as the same (too bad if they can't take a joke!), or identify a nonconforming message with a conforming one (<sad_emoji />.

Thoughts?

Steve

tlhackque

5:18 p.m.

New subject: Implementing threading [was: hyperkitty failed to create a thread]

Notes inline.

On 12-Feb-19 11:09, Stephen J. Turnbull wrote:

...

tlhackque writes:

...
I believe that for threading to work more or less reliably, the thing to do is to look at the 'References' header.

That should give you the thread in order, and allow any message received out of sequence to be put in the proper location in your display.

This is more difficult than it seems, since References is not defined for mail as far as I know (it's a netnews concept originally, and RFC 5337 is a netnews-specific RFC).

Not sure why you believe this. RFC2822 3.6.4 defines References for e-mail.

See https://tools.ietf.org/html/rfc2822#page-25

As I wrote in later post, the message-ID syntactically requires the <>. (just one set).

msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]

Look at that for the full syntax and reference.

...

Although it is adopted by most mail clients, there's no guarantee of strict conformance.

I agree that conformance has traditionally been hit-or-miss with almost everything about e-mail. Sigh.

I also agree that life is hard. There have been clients that use the same message-id for all messages, and it's a long-standing problem. E.g. See https://cr.yp.to/immhf/thread.html

Not only do you not have a guarantee of ever getting all links, you don't have a guarantee of the order in which they arrive. So one can have gaps ( parent, reply2-delayed, reply 3 references reply 2 and parent, reply 2 might arrive - or note.) You also can have a reply that references 2 or more branches. E.g.

/--R0 p/--r1
\ --r2 /

And a reply (r3) comes that references r1 AND r2 (and the parent) - which you probably want to show as a descendant of both r1 AND r2. R4 replies to r2 - but that's also an implicit response to r1.

Once you's sorted that out, r5 comes along that references R0, r4, and P.

2822 does say:

Therefore, trying to form a
   "References:" field for a reply that has multiple parents is
   discouraged and how to do so is not defined in this document.

But programmers are rarely discouraged - with GUIs, it's pretty easy to intuit that one might like to check the boxes on several branches of a thread and respond "Of course, you're all right - see my cat photo". :-)

5537 tries to make life easier - until you get to:

If the resulting References header field would, after unfolding,
   exceed 998 characters in length (including its field name but not the
   final CRLF), it MUST be trimmed (and otherwise MAY be trimmed).
   Trimming means removing any number of message identifiers from its
   content, except that the first message identifier and the last two
   MUST NOT be removed.

So, even if you get all the messages, you don't necessarily get all the references that you need.

I'm sure glad that I'm not working on Hyperkitty! But I'm not holding my breath for when MM3 is complete enough to adopt in my environment.

Yes, "In-Reply-To" is more often present (likely because it's modestly simpler for RFC-readers to understand - it's one thing, not a list.) It's less expressive (hence, informative.) One can try to use it as a fallback when no References is present (and that's what 2822 implies).

The whole undertaking in the real world is an example of the halting problem. Or Parkinson's Law. The best one can do is to come up with a "good enough" approximation in the available time, and tinker with/improve it until bored.

It's likely to be a bunch of heuristics. And if you're not careful, throw in the Peter principle :-(

...

In mail, In-Reply-To is more reliably present, but of course asynchronicity means you have no guarantee of a complete set of all links. Even if all clients conform to the netnews RFC, you still need to create the full tree (full conformance to RFC 5537 means it will be a tree), and break ties between branches in some arbitrary way to create a total order. It's worse if you *don't* have conformance from all clients: you can get a DAG or even something that isn't even a DAG. So, even today, you can't assume a well-behaved ancestry graph.

...
'References' should be a superset of Reply-To (which is at most 1 Message-ID), so you only need Reply-To if there is no References - or to handle a client that doesn't obey the RFCs.

I suppose HyperKitty uses References (it works for messages that have proper Message-IDs ;-), but I don't know what algorithm it uses. Might be worth looking into, as well as considering a more Postelian parsing of Message-IDs. Specifically, take the field body, unfold it, strip leading and trailing whitespace and leading "<" and trailing ">", and whatever's left is the message ID.

Alternatively, strip everything that's not atext or "@" (including inside the purported Message-ID).

See my other post. The left and right halves can, besides being atoms, be non-folding quotes or literals. So you have to handle that. Although I have recently seen e-mail clients that are confused when you do it. (Ran into that generating content-id for multipart/related. Yes, MIME makes plain e-mail look straightforward.)

Although it has a defined syntax, the semantics are that message-id is an opaque globally-unique identifier. Attempting to parse it as anything but '<[^>]+>' is likely to be a mistake. (On receipt; generating is the other side of Postel-correctness... I commend RFC 2468 to anyone who hasn't read it.)

I doubt 'striping' is worth doing - when an message-id is present, I've almost always seen it include the required <>.

Until yesterday's hyperkitty bug, I've never seen <<>>. I've rarely seen the <> missing - but I'd consider that a warning that the message format is suspect. As in this case.

My suggestion is that if you can't find a message-ID, keep the message in an "unthreaded" bucket. If you sort that by subject (omitting "[listtag]" and "Re:*" (case insensitive, and in multiple languages) then by date, you probably have a useful presentation. And a list of MUAs that need bug reports :-)

Yes, althought 2822 says that "Re:" is the proper introducer in the subject of a reply, RE: is common, and I've seen it (incorrectly) translated in the headers.)

As I said, "life is hard".

...

This won't break any RFC 5537-valid Message-IDs, but might identify two different, nonconforming Message-IDs as the same (too bad if they can't take a joke!), or identify a nonconforming message with a conforming one (<sad_emoji />.

Thoughts?

Steve

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

Stephen J. Turnbull

2:42 a.m.

New subject: Implementing threading [was: hyperkitty failed to create a thread]

tlhackque writes:

...

Not sure why you believe this. RFC2822 3.6.4 defines References for e-mail.

Ouch! I went to RFC 5322, searched for "References" and was taken to "Section 7. References". Evidently it was already positioned at the end of the appendices. :-(

...

As I wrote in later post, the message-ID syntactically requires the <>. (just one set).

msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]

Look at that for the full syntax and reference.

Sure. It's messy because the id-left could be a "local-part", which could be almost anything if quoted, and the id-right could be a domain literal, which is a little more restricted.

...

2822 does say:
Therefore, trying to form a
   "References:" field for a reply that has multiple parents is
   discouraged and how to do so is not defined in this document.
But programmers are rarely discouraged - with GUIs, it's pretty easy to intuit that one might like to check the boxes on several branches of a thread and respond "Of course, you're all right - see my cat photo". :-)

Oh, I've done this by hand! ;-)

...

5537 tries to make life easier - until you get to:

5322 has adopted the 5537 language for the abstract construction of the field. It doesn't say anything at all about the logical length issue and trimming.

...

The best one can do is to come up with a "good enough" approximation in the available time, and tinker with/improve it until bored.

Which is what Jamie Zawinski did. His algorithm (adopted by IMAP as the standard for threading IMAP servers) has three features (1) it's an algorithm (guaranteed to terminate on finite input :-), (2) it allows for various tie-breaking methods, and (3) if you have enough messages from the thread and all In-Reply-To and References conform to the 5337 language, it will be consistent with all References data.

...

See my other post. The left and right halves can, besides being atoms, be non-folding quotes or literals. So you have to handle that.

Of course, but that's just a SMOP. The harder problem is figuring out what to do about non-conforming input.

...

Although it has a defined syntax, the semantics are that message-id is an opaque globally-unique identifier. Attempting to parse it as anything but '<[^>]+>' is likely to be a mistake.

That's true in a world where people actually follow the rules.

...

I doubt 'striping' is worth doing - when an message-id is present, I've almost always seen it include the required <>.

Since the delimiters are constants and required, consistently stripping them doesn't hurt with a well-formed msg-id (the delimiters aren't allowed in a msg-id). And you do have to strip the whitespace, because it's likely that different MUAs will do different things with it since they edit the old value (space-separated, tab-separated). The point is to get to the unique content.

It's also possible to keep the literal content (after stripping whitespace) as well as the "cleaned" version, and use whichever one corresponds to a real message or to a different value of References in the same thread. It would be amusing if *both* were associated with real messages, but that seems unlikely.

...

Until yesterday's hyperkitty bug, I've never seen <<>>.

At least one of my correspondents has a client that frequently creates "addresses" of the form "<a@b.net <a@b.net>>". I don't recall seeing "<<...>>" before, but I rarely look at Message-IDs.

> I've rarely seen the <> missing - but I'd consider that a warning > that the message format is suspect. As in this case.

Agreed.

...

My suggestion is that if you can't find a message-ID,

A Message-ID field, or a valid msg-id in the References or In-Reply-To fields?

...

keep the message in an "unthreaded" bucket. If you sort that by subject (omitting "[listtag]" and "Re:*" (case insensitive, and in multiple languages) then by date, you probably have a useful presentation. And a list of MUAs that need bug reports :-)

The thing is that a message without a Message-ID (which I believe should not happen in Mailman, Mailman will assign one before distributing IIRC) is going to end up being a singleton thread. If there are replies to it, I don't see why they would be likely to be temporally adjacent. If they have proper replies, they will end up as proper thread roots, not in the unthreaded bucket. Am I missing something?

If it's an unusable munged reference in References, the munged reference may be visible as a placeholder (no real message) in a separate subthread, or pruned (and invisible) because no real message is indicated by it. The indicated message will be in a separate subthread if it has a valid References (including no References, in which case it will be a thread root). And, of course, if it's not the immediate parent, other messages' References fields will likely allow the message to be threaded corrects. AFAICS "stripping and cleaning" an invalid msg-id is highly unlikely to duplicate a valid msg-id associated with a different message, athough there's a good chance it won't allow identification of any message at all -- which is where we started.

-- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN

tlhackque

4:36 a.m.

New subject: Implementing threading [was: hyperkitty failed to create a thread]

On 13-Feb-19 21:42, Stephen J. Turnbull wrote:

...

...
As I wrote in later post, the message-ID syntactically requires the <>. (just one set).

msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]

Look at that for the full syntax and reference.

Sure. It's messy because the id-left could be a "local-part", which could be almost anything if quoted, and the id-right could be a domain literal, which is a little more restricted.

No. It's not an address. "local part" has to do with an address. in a Message-ID,

It's in "id-left", which is a supposed to be something generated by the host that makes the message unique within the namespace defined by "id-right". And while the rfc recommends that be a domain name, it needn't be. In fact, as one of the references I used pointed out, not all hosts have them.

One could generate a UUID and use that for both id-left and id-right & be conforming (though given some client bugs, I'd change the '-' in the standard UUID presentation to '.', or just delete it.) Or one could generate a UUID for the right part, use it for all messages sent by a client, and use something else for the left - like a hash of the message+headers. I tend to use a domain name for the right part, and UUID for the left. Then I prefix the left with a number when I need a Content-ID for a body part. The UUID pretty much makes the right part unnecessary, but since it's syntactically required, when I have a domain name it can be useful for forensics. When I don't, the UUID is fine.

The advantage of using a domain name for id-right is that there is a global registration system (DNS) that makes it unlikely for conflicts to occur. And that was true for RFC733 time, when hosts were king. Modulo all the "example.com". But once PCs came around, it doesn't work as well. DHCP-assigned names, un-named clients just don't provide the same uniqueness as bbn-tenex.com used to.

I think the '@' was just syntactic sugar - if you accept a domain name for id-right, then, as in an e-mail address, it indicates the scope of the unique part (id-left). But just because it can look like an e-mail address doesn't mean it is one.

Treat the whole thing as an opaque string. Nothing else is safe.

...

...
2822 does say:
Therefore, trying to form a
   "References:" field for a reply that has multiple parents is
   discouraged and how to do so is not defined in this document.
But programmers are rarely discouraged - with GUIs, it's pretty easy to intuit that one might like to check the boxes on several branches of a thread and respond "Of course, you're all right - see my cat photo". :-)
Oh, I've done this by hand! ;-) Yes, but did you fix the References header to match?

...
5537 tries to make life easier - until you get to:

5322 has adopted the 5537 language for the abstract construction of the field. It doesn't say anything at all about the logical length issue and trimming.

Yes, but you brought up the news RFCs, IIRC in saying that they defined threading.

My quote was verbatim. 5537 does say you must trim. The point is that the news rfcs may have introduced References, but they have different obstacles to reconstructing threading.

...

...
The best one can do is to come up with a "good enough" approximation in the available time, and tinker with/improve it until bored.

Which is what Jamie Zawinski did. His algorithm (adopted by IMAP as the standard for threading IMAP servers) has three features (1) it's an algorithm (guaranteed to terminate on finite input :-), (2) it allows for various tie-breaking methods, and (3) if you have enough messages from the thread and all In-Reply-To and References conform to the 5337 language, it will be consistent with all References data.

I haven't looked at that. Given the trimming in 3.4.4 of 5337, I don't see how this can be produce the whole thread, unless you are guaranteed to have the complete thread to fill the gaps. And you're not. (3) is the issue - you need "enough" and in the worst case, you get 3 references (First and last two). Plus, you can lose messages for strange reasons: The moderator deleted one for inappropriate content. A message was copied to the list and the poster. The response to the list is lost. But a reply from the poster happens later. So the response in question is never seen by the server, though the reference is. Life is hard :-)

As a practical matter, I don't expect a sane coder to trim unless necessary. So there's a good chance that you can reconstruct a complete thread in real life. But "necessary" when your main disk was an 8" floppy (or paper tape) seemed different from my notebook PC has a couple of TB SSDs. That doesn't mean that the code has changed.

Heuristics are fine. Guaranteeing that the resulting algorithm terminates is important. But real life doesn't get you to "enough messages" 100% of the time.

...

...
See my other post. The left and right halves can, besides being atoms, be non-folding quotes or literals. So you have to handle that.

Of course, but that's just a SMOP. The harder problem is figuring out what to do about non-conforming input. And, as I've noted: lost input.

...
Although it has a defined syntax, the semantics are that message-id is an opaque globally-unique identifier. Attempting to parse it as anything but '<[^>]+>' is likely to be a mistake.

That's true in a world where people actually follow the rules. When they don't, trying to guess at the semantics of an opaque ID will seem to work for a while. But it amounts to the halting problem. There are lots of variants of "generate a globally unique ID with an '@' in the middle". I may have created a new one today :-)

...
I doubt 'striping' is worth doing - when an message-id is present, I've almost always seen it include the required <>.

Since the delimiters are constants and required, consistently stripping them doesn't hurt with a well-formed msg-id (the delimiters aren't allowed in a msg-id). And you do have to strip the whitespace, because it's likely that different MUAs will do different things with it since they edit the old value (space-separated, tab-separated). The point is to get to the unique content.

Yes. But if they're well-formed, the <> are there, so stripping them only saves a couple of bytes.

If they're not, all bets are off. <two<three@example.net> - stripping the outer <>s doesn't help.

<"two<three"@example.net> is a valid message-Id, and distinct from <"twothree"@example.net. And is

<twothree@example.net> distinct from <"twothree"@example.net>? (unnecessary quoting, or a distinct message-ID) If you treat it as opaque, you don't care. Take the whole thing, <@> included, as given and look for it in the other fields. The most you might do is remove quotes (and escapes) and use the left and right parts as your key.

The whitespace is the [CFWS] on either side of the "<" id-left "# id-right " > production in 2822.

That's where you must strip it to get the unique ID.

...

It's also possible to keep the literal content (after stripping whitespace) as well as the "cleaned" version, and use whichever one corresponds to a real message or to a different value of References in the same thread. It would be amusing if *both* were associated with real messages, but that seems unlikely.

You could also fingerprint the user agent (e.g. by the order of headers, format of message-ID), and correct for its bugs. But I'm inclined to report client bugs and get them fixed. Meantime, their messages are unthreaded (but not lost). It's just not worth working around other people's bugs - there are better uses for your time. Just ask the DNS people. They just had a Flag Day to get rid of years of built-up workarounds in nameservers...

...

...
Until yesterday's hyperkitty bug, I've never seen <<>>.

At least one of my correspondents has a client that frequently creates "addresses" of the form "<a@b.net <a@b.net>>". I don't recall seeing "<<...>>" before, but I rarely look at Message-IDs.

> I've rarely seen the <> missing - but I'd consider that a warning > that the message format is suspect. As in this case.

Agreed.

...
My suggestion is that if you can't find a message-ID,

A Message-ID field, or a valid msg-id in the References or In-Reply-To fields?

I meant the latter. It's the references that you need to reconstruct the thread - to the extent possible.

...

...
keep the message in an "unthreaded" bucket. If you sort that by subject (omitting "[listtag]" and "Re:*" (case insensitive, and in multiple languages) then by date, you probably have a useful presentation. And a list of MUAs that need bug reports :-)

The thing is that a message without a Message-ID (which I believe should not happen in Mailman, Mailman will assign one before distributing IIRC) is going to end up being a singleton thread. If there are replies to it, I don't see why they would be likely to be temporally adjacent. If they have proper replies, they will end up as proper thread roots, not in the unthreaded bucket. Am I missing something?

Yes. A message ID should exist. These days I think all MTAs will assign one if the client doesn't.

What I tried to say was that there will be times where you can't figure out where to put a message into the thread graph (or forest). Because it references message that you can't find, or it doesn't have a references or reply-to, or those fields are corrupt.

In those cases, put them in a bucket. Call them orphans.

Most conversations come in bursts. And if I reply to you, my reply will (modulo time errors) be later than your post.

Further, there's a pretty good chance that my subject will be "Re: yours".

So, if that bucket is sorted by subject (primary key) and date (secondary), there's a fair approximation that within the set of orphans, they'll be near each-other. Which is better than nothing.

You're right that if my client generates good headers and yours doesn't, yours will be orphaned. But at least all your replies will share a subject, so in that bucket will be your side of the conversation, roughly in order. And if you quote my text when replying, both sides.

If several people have clients that orphan their replies, they all will cluster because of the subject. And be in rough time order.

You do want to skip any [list] tag and [Re:]* because they don't change the subject.

These are heuristics - you might come up with better ones. The idea is that they don't rely on or try to fix any particular fault in the client's posts. Either a message conforms, or it doesn't. If it does, it gets threaded as everyone expects. If it doesn't , it's an orphan, and there's one place to find it. And with a modest effort, this should keep related ones close to each-other. It's better than just keeping the orphans sorted by date - the subject tends to be stable. But it's not perfect. E.g. the not-uncomon 'Re: dogs are better (was cats rule)' won't keep the thread together.

But it seems better than nothing, without endless improvement. Once the afflicted users learn that messages they post are put into the Orphans thread, they can complain to their clients' developers. Meantime, life should be tolerable - but annoying enough for them to keep the pressure on for a fix :-)

...

If it's an unusable munged reference in References, the munged reference may be visible as a placeholder (no real message) in a separate subthread, or pruned (and invisible) because no real message is indicated by it. The indicated message will be in a separate subthread if it has a valid References (including no References, in which case it will be a thread root). And, of course, if it's not the immediate parent, other messages' References fields will likely allow the message to be threaded corrects. AFAICS "stripping and cleaning" an invalid msg-id is highly unlikely to duplicate a valid msg-id associated with a different message, athough there's a good chance it won't allow identification of any message at all -- which is where we started.

Yes, and that's why I think it's wasted effort. Opaque means "Opaque". If you don't have one that conforms, use something else, and use the imperfections to get the bug(s) fixed.

At least with an orphans bucket, you don't end up with "invisible" messages. They're just not where you expected. Like the SPAM folder :-) You know where to look, and you know that anything in there means there's a bug to fix.

Stephen J. Turnbull

3:08 a.m.

New subject: Implementing threading [was: hyperkitty failed to create a thread]

tlhackque writes:

...

No. It's not an address. "local part" has to do with an address. in a Message-ID,

It's syntactically a local-part or other stuff. RFC 5322:

4.5.4. Obsolete Identification Fields

The obsolete "In-Reply-To:" and "References:" fields differ from the current syntax in that they allow phrase (words or quoted strings) to appear. The obsolete forms of the left and right sides of msg-id allow interspersed CFWS, making them syntactically identical to local-part and domain, respectively.

...

It's in "id-left", which is a supposed to be something generated by the host that makes the message unique within the namespace defined by "id-right".

Right, those are the *semantics recommended to implementers*, for the reasons which you summarize accurately. But a validating parser doesn't care about that.

...

Treat the whole thing as an opaque string. Nothing else is safe.

s/else// and I'll agree with you. But safety is not an issue in threading (except for DoS if the procedure might be non-terminating; I suppose you could argue DoS if a user misses an out-of-order mail from their boss and gets fired :-).

...

Yes, but did you fix the References header to match?

Of course I did. :-) That was long before I was a Mailman developer and started reading the RFCs, also of course.

...

Yes, but you brought up the news RFCs, IIRC in saying that they defined threading.

Actually you brought up the news RFCs. I was wondering why. I guess it was to argue (implicitly) that there's a bug in the 2822 (and 5322) spec, and that 5537's solution tends to muck things up?

...

My quote was verbatim. 5537 does say you must trim. The point is that the news rfcs may have introduced References, but they have different obstacles to reconstructing threading.

I don't think that's right, because mail suffers from the same line length restriction, with no exception for References. Mail just has strictly more specification bugs. :-(

...

I haven't looked at that. Given the trimming in 3.4.4 of 5337, I don't see how this can be produce the whole thread, unless you are guaranteed to have the complete thread to fill the gaps.

It can't because you aren't. But it seems unlikely that anybody is producing msg-ids longer than 74 characters, so 998/(74+1) - 1 = 12 is a long enough gap that you probably don't care that they're the same thread. At least not in any forum I participate in!

It's true that an MUA that doesn't produce either References or In-Reply-To will necessarily break threading, and that "enough" missing messages can do so. But both are pretty uncommon in my experience. Even AppleMail gets those right AFAICT. ;-)

...

Plus, you can lose messages for strange reasons: The moderator deleted one for inappropriate content. A message was copied to the list and the poster. The response to the list is lost. But a reply from the poster happens later. So the response in question is never seen by the server, though the reference is. Life is hard :-) >

Sure, but Jamie's algorithm *will get that right*. What remains is a UI question: missing message placeholder, to display or not to display?

...

Heuristics are fine. Guaranteeing that the resulting algorithm terminates is important. But real life doesn't get you to "enough messages" 100% of the time.

100% isn't necessary. If the thread is sufficiently broken that Jamie's msd-id-based procedure doesn't work, his full algorithm falls back to your idea of collecting the singletons by subject and sorting. (He doesn't specify the ordering criteria, rather what objects (= subthreads of the same parent) are to be compared. Your suggestion of date seems most appropriate for the leftovers.)

...

...
...
See my other post. The left and right halves can, besides being atoms, be non-folding quotes or literals. So you have to handle that.

Of course, but that's just a SMOP. The harder problem is figuring out what to do about non-conforming input.

...

And, as I've noted: lost input.

Lost input does not prevent thread reconstruction as long as (1) all messages have a Message-ID, (2) "enough" messages have a non-empty References containing "enough" msg-ids, (3) those References do not misorder the msg-ids or introduce msg-ids corresponding to messages not in the thread. It should be obvious why I don't want to define "enough" (except in terms like "enough means Jamie's algorithm can reconstruct the thread" ;-), but in particular losing *one* message will certainly not prevent reconstruction.

...

When they don't, trying to guess at the semantics of an opaque ID will seem to work for a while. But it amounts to the halting problem. There are lots of variants of "generate a globally unique ID with an '@' in the middle". I may have created a new one today :-)

The problem I'm trying to address with the "stripping" is incorrect copying of msg-ids by MUAs that try to parse them, as we saw here. Apparently the bug was in Mailman itself, so it's not required, I guess.

...

Yes. But if they're well-formed, the <> are there, so stripping them only saves a couple of bytes.

If they're not, all bets are off. <two<three@example.net> - stripping the outer <>s doesn't help.

<"two<three"@example.net> is a valid message-Id, and distinct from <"twothree"@example.net.

Sure. Is it likely?

...

<twothree@example.net> distinct from <"twothree"@example.net>? (unnecessary quoting, or a distinct message-ID) If you treat it as opaque, you don't care. Take the whole thing, <@> included, as given and look for it in the other fields. The most you might do is remove quotes (and escapes) and use the left and right parts as your key.

Which is basically what my stripping and cleaning procedure would do. The question is "does it help or hurt, on net?" If there was a widely distributed MUA out there that doubled the delimiters, it would help. Since it was Mailman doing that, it's a bad idea.

...

You could also fingerprint the user agent (e.g. by the order of headers, format of message-ID), and correct for its bugs. But I'm inclined to report client bugs and get them fixed. Meantime, their messages are unthreaded (but not lost). It's just not worth working around other people's bugs - there are better uses for your time.

That's not my experience. See Reply-To munging. Depends on the bug, of course, but all too often people think it's our job to help them deal with bad MUA design.

...

Yes. A message ID should exist. These days I think all MTAs will assign one if the client doesn't.

...

What I tried to say was that there will be times where you can't figure out where to put a message into the thread graph (or forest). Because it references message that you can't find, or it doesn't have a references or reply-to, or those fields are corrupt.

None of those prevent you from threading that message. The only reasons you won't be able to thread a message at all are when no other available message references it (eg, if several MUAs in a row supply In-Reply-To but not References, and the middle messages are missing or their identification fields are corrupt), and when there are References that have conflicting opinions on where that message belongs. Of course if a message lacks References and In-Reply-To it will be identified as a thread root (unless some descendent manually corrects References ;-). Even then it could be grafted into the tree more or less correctly using your sort on subject and date procedure.

...

These are heuristics - you might come up with better ones.

Yours are already implemented in Pipermail, I believe, I'm not sure about HyperKitty. I don't think Jamie's algorithm messed with [name] and [serial number] because they weren't common at that time, but he stripped Re: and its nonconforming variants, as well as "Fwd:".

...

At least with an orphans bucket, you don't end up with "invisible" messages.

Nothing I suggested produces invisible messages, just a different set of orphans or occasionally a misthreaded message that seems to be a duplicate if different msg-ids are munged to the same string. At least not in Jamie's algorithm. I don't know exactly what algorithms are used in Pipermail and HyperKitty.

Interesting discussion, but I've reached the point of diminishing returns. I will be checking to see if any of your suggestions are improvements over what HyperKitty currently does.

Stephen J. Turnbull

7:09 a.m.

New subject: Implementing threading [was: hyperkitty failed to create a thread]

tlhackque writes:

...

No. It's not an address. "local part" has to do with an address. in a Message-ID,

It's syntactically a local-part or other stuff. RFC 5322:

4.5.4. Obsolete Identification Fields

...

It's in "id-left", which is a supposed to be something generated by the host that makes the message unique within the namespace defined by "id-right".

Right, those are the *semantics recommended to implementers*, for the reasons which you summarize accurately. But a validating parser doesn't care about that.

...

Treat the whole thing as an opaque string. Nothing else is safe.

s/else// and I'll agree with you. But safety is not an issue in threading (except for DoS if the procedure might be non-terminating; I suppose you could argue DoS if a user misses out-of-order mail from their boss and gets fired).

...

Yes, but did you fix the References header to match?

Of course I did. That was long before I was a Mailman developer and started reading the RFCs, of course.

...

My quote was verbatim. 5537 does say you must trim. The point is that the news rfcs may have introduced References, but they have different obstacles to reconstructing threading.

I don't think that's right, because mail suffers from the same line length restriction, with no exception for References. Mail has strictly more specification bugs. :-(

...

I haven't looked at that. Given the trimming in 3.4.4 of 5337, I don't see how this can be produce the whole thread, unless you are guaranteed to have the complete thread to fill the gaps.

It can't because you aren't. But it seems unlikely that anybody is producing msg-ids longer than 74 characters, so 998/(74+1) = 13 is a long enough gap that you probably don't care that they're the same thread. At least not in any forum I participate in!

...

Plus, you can lose messages for strange reasons: The moderator deleted one for inappropriate content. A message was copied to the list and the poster. The response to the list is lost. But a reply from the poster happens later. So the response in question is never seen by the server, though the reference is. Life is hard :-) >

Sure, but Jamie's algorithm *will get that right* as long as all messages have a semantically correct In-Reply-To or References, and at least one descendent of the missing message mentions both it and its parent. What remains is a UI question: missing message placeholder, to display or not to display?

...

Heuristics are fine. Guaranteeing that the resulting algorithm terminates is important. But real life doesn't get you to "enough messages" 100% of the time.

100% isn't necessary. If the thread is sufficiently broken that Jamie's msg-id-based procedure doesn't work, his full algorithm falls back to your idea of collecting the singletons by subject and sorting. (He doesn't specify the ordering criteria, rather what objects (= subthreads of the same parent) are to be compared. Your suggestion of date seems most appropriate for the leftovers.)

...

...
...
See my other post. The left and right halves can, besides being atoms, be non-folding quotes or literals. So you have to handle that.

Of course, but that's just a SMOP. The harder problem is figuring out what to do about non-conforming input.

...

And, as I've noted: lost input.

...

When they don't, trying to guess at the semantics of an opaque ID will seem to work for a while. But it amounts to the halting problem. There are lots of variants of "generate a globally unique ID with an '@' in the middle". I may have created a new one today :-)

...

Yes. But if they're well-formed, the <> are there, so stripping them only saves a couple of bytes.

If they're not, all bets are off. <two<three@example.net> - stripping the outer <>s doesn't help.

<"two<three"@example.net> is a valid message-Id, and distinct from <"twothree"@example.net.

Sure. Is it likely?

...

<twothree@example.net> distinct from <"twothree"@example.net>? (unnecessary quoting, or a distinct message-ID) If you treat it as opaque, you don't care. Take the whole thing, <@> included, as given and look for it in the other fields. The most you might do is remove quotes (and escapes) and use the left and right parts as your key.

...

You could also fingerprint the user agent (e.g. by the order of headers, format of message-ID), and correct for its bugs. But I'm inclined to report client bugs and get them fixed. Meantime, their messages are unthreaded (but not lost). It's just not worth working around other people's bugs - there are better uses for your time.

That's not my experience. See Reply-To munging. Depends on the bug, of course, but all too often people think it's our job to help them deal with bad MUA design.

...

Yes. A message ID should exist. These days I think all MTAs will assign one if the client doesn't.

...

What I tried to say was that there will be times where you can't figure out where to put a message into the thread graph (or forest). Because it references message that you can't find, or it doesn't have a references or reply-to, or those fields are corrupt.

...

These are heuristics - you might come up with better ones.

...

At least with an orphans bucket, you don't end up with "invisible" messages.

Interesting discussion, but I've reached the point of diminishing returns. I will be checking to see if any of your suggestions are improvements over what HyperKitty currently does, for sure!

Steve

Mark Sapiro

7:03 p.m.

New subject: Implementing threading [was: hyperkitty failed to create a thread]

On 2/12/19 8:09 AM, Stephen J. Turnbull wrote:

...

I suppose HyperKitty uses References (it works for messages that have proper Message-IDs ;-), but I don't know what algorithm it uses. Might be worth looking into, as well as considering a more Postelian parsing of Message-IDs. Specifically, take the field body, unfold it, strip leading and trailing whitespace and leading "<" and trailing ">", and whatever's left is the message ID.

Hyperkitty uses In-Reply-To: and if absent falls back to the last item in References: to determine a message's parent.

I haven't looked at what happens if the parent is not in the archive but arrives later.

We could look at earlier References: items to try to find one we know, but this could result in bad threading if a later reference arrives after we think we've determined the parent.

The unfolding of Message-ID: is now correct. See <https://gitlab.com/mailman/hyperkitty/merge_requests/125>.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Sapiro

9:20 p.m.

On 2/11/19 10:01 AM, Abhilash Raj wrote:

...

Please create a bug report at https://gitlab.com/mailman/hyperkitty/issues with the affected messages.

It would help us recreate the problem if you could attach a raw messages in the issue. I am assuming the messages aren't sensitive since the archive are open. Otherwise, feel free to omit any sensitive information.

I haven't looked at the threading code in HK since a long time, but my impression is that it looks at the 'In-Reply-To' header to figure out if the current email is a response to a previous email that it received.

In some cases, the order of the incoming messages can cause this to break, like when the reply comes in first and the original messages comes in later. I am not sure exactly what is happening here, but looking at the raw messages should help us reproduce and possibly fix the issue.

I downloaded the February mbox from <https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/export/wlug@lists.wlug.org-2019-02.mbox.gz?start=2019-02-01&end=2019-03-01> and imported it to a test list with 'django-admin hyperkitty_import and see the same issue.

Examination of the mbox shows for the first message

Message-ID: < <SN6PR01MB4719346533C08BF4FF62B880C36C0@SN6PR01MB4719.prod.exchangelabs.com>>

This will be wrapped, so here's an excerpt

Message-ID: < <SN6....prod.exchangelabs.com>>

Note the folding of the header and the double angle brackets.

The next message has (again excerpted to avoid wrapping)

Message-ID: < <BL0....prod.exchangelabs.com>> In-Reply-To: SN6....prod.exchangelabs.com

Note again the doubling of the angle brackets in the Message-ID and the absence of angle brackets in the In-Reply-To:

The Message-ID: and In-Reply-To: headers of subsequent messages have the same issues.

In looking at other mboxes downloaded from Hyperkitty, I see that the absence of References: and the dropping of the angle brackets from In-Reply-To: seem to be usual, and importing such a mbox preserves the threading.

However, the doubling of the angle brackets is not usual and is almost certainly the cause of the problem. Do you have any of the original messages? What do the headers look like in those?

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Marvin Gülker

10:08 p.m.

Hi,

Am 11. Februar 2019 um 13:20 Uhr -0800 schrieb Mark Sapiro:

...

I downloaded the February mbox from <https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/export/wlug@lists.wlug.org-2019-02.mbox.gz?start=2019-02-01&end=2019-03-01> and imported it to a test list with 'django-admin hyperkitty_import and see the same issue.

Here's the Hyperkitty Mbox for my affected mailinglist: <https://lists.secretchronicles.org/hyperkitty/list/tsc-devel@lists.secretchronicles.org/export/tsc-devel@lists.secretchronicles.org-2018-09.mbox.gz?start=2018-09-01&end=2018-10-01>

Maildir from my local machine as received from Mailman via e-mail: <https://files.guelker.eu/misc/tsc-devel.tar.gz>

...

Examination of the mbox shows for the first message [...] Note the folding of the header and the double angle brackets.

I can confirm that I see this behaviour for my affected mailinglist as well. The Message-ID header is broken over two lines with double angle brackets in the Hyperkitty MBox linked above:

Message-ID: &lt;
 &lt;153582973006.27514.7508703206376217479@alexandria.secretchronicles.org>>

However, the original message as it was delivered by Mailman to me does not have such a weird message-id header:

Message-ID:
 &lt;153582973006.27514.7508703206376217479@alexandria.secretchronicles.org>

In this message, the header is broken exactly as shown above after the colon of "Message-ID:". Maybe some problem with parsing long Message-ID headers, or ones folded over multiple lines?

With regard to bug reporting on gitlab.com: I already have accounts on a ton of sites, and at some point I'd actually like to stop creating new ones all the time...

Marvin

-- Blog: https://mg.guelker.eu PGP/GPG ID: F1D8799FBCC8BC4F

Abhilash Raj

10:15 p.m.

On Mon, Feb 11, 2019, at 2:08 PM, Marvin Gülker wrote:

...

Hi,

Am 11. Februar 2019 um 13:20 Uhr -0800 schrieb Mark Sapiro:

...
I downloaded the February mbox from <https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/export/wlug@lists.wlug.org-2019-02.mbox.gz?start=2019-02-01&end=2019-03-01> and imported it to a test list with 'django-admin hyperkitty_import and see the same issue.

Here's the Hyperkitty Mbox for my affected mailinglist: <https://lists.secretchronicles.org/hyperkitty/list/tsc-devel@lists.secretchronicles.org/export/tsc-devel@lists.secretchronicles.org-2018-09.mbox.gz?start=2018-09-01&end=2018-10-01>

Maildir from my local machine as received from Mailman via e-mail: <https://files.guelker.eu/misc/tsc-devel.tar.gz>

...
Examination of the mbox shows for the first message [...] Note the folding of the header and the double angle brackets.

I can confirm that I see this behaviour for my affected mailinglist as well. The Message-ID header is broken over two lines with double angle brackets in the Hyperkitty MBox linked above:
Message-ID: &lt;
 &lt;153582973006.27514.7508703206376217479@alexandria.secretchronicles.org>>
However, the original message as it was delivered by Mailman to me does not have such a weird message-id header:
Message-ID:
 &lt;153582973006.27514.7508703206376217479@alexandria.secretchronicles.org>
In this message, the header is broken exactly as shown above after the colon of "Message-ID:". Maybe some problem with parsing long Message-ID headers, or ones folded over multiple lines?

With regard to bug reporting on gitlab.com: I already have accounts on a ton of sites, and at some point I'd actually like to stop creating new ones all the time...

I created an issue on Gitlab for this with this thread:

https://gitlab.com/mailman/hyperkitty/issues/216

...

Marvin

-- Blog: https://mg.guelker.eu PGP/GPG ID: F1D8799FBCC8BC4F

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

-- thanks, Abhilash Raj (maxking)

tlhackque

10:22 p.m.

On 11-Feb-19 17:08, Marvin Gülker wrote:

...

Hi,

Am 11. Februar 2019 um 13:20 Uhr -0800 schrieb Mark Sapiro:

...
I downloaded the February mbox from <https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/export/wlug@lists.wlug.org-2019-02.mbox.gz?start=2019-02-01&end=2019-03-01> and imported it to a test list with 'django-admin hyperkitty_import and see the same issue. Here's the Hyperkitty Mbox for my affected mailinglist: <https://lists.secretchronicles.org/hyperkitty/list/tsc-devel@lists.secretchronicles.org/export/tsc-devel@lists.secretchronicles.org-2018-09.mbox.gz?start=2018-09-01&end=2018-10-01>

Maildir from my local machine as received from Mailman via e-mail: <https://files.guelker.eu/misc/tsc-devel.tar.gz>

...
Examination of the mbox shows for the first message [...] Note the folding of the header and the double angle brackets. I can confirm that I see this behaviour for my affected mailinglist as well. The Message-ID header is broken over two lines with double angle brackets in the Hyperkitty MBox linked above:
Message-ID: &lt;
 &lt;153582973006.27514.7508703206376217479@alexandria.secretchronicles.org>>
However, the original message as it was delivered by Mailman to me does not have such a weird message-id header:
Message-ID:
 &lt;153582973006.27514.7508703206376217479@alexandria.secretchronicles.org>
In this message, the header is broken exactly as shown above after the colon of "Message-ID:". Maybe some problem with parsing long Message-ID headers, or ones folded over multiple lines?

With regard to bug reporting on gitlab.com: I already have accounts on a ton of sites, and at some point I'd actually like to stop creating new ones all the time...

Marvin

The double angle brackets are a problem.

The folding is legal. Any whitespace in a header (not inside tokens, such as quoted strings, of course) can be replaced by a <CR><LF><one whitespace character> So

Message-ID: <glorp>

is identical to

Message-ID:

<glorp>

(See RFC2822 3.2.3 for a more precise explanation)

A message-ID within <<>> is not valid; whatever generated it has a bug.

RFC2822 grammar

message-id = "Message-ID:" msg-id CRLF

in-reply-to = "In-Reply-To:" 1*msg-id CRLF

references = "References:" 1*msg-id CRLF

msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]

id-left = dot-atom-text / no-fold-quote / obs-id-left

id-right = dot-atom-text / no-fold-literal / obs-id-right

no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE

no-fold-literal = "[" *(dtext / quoted-pair) "]"

specials = "(" / ")" / ; Special characters used in "<" / ">" / ; other parts of the syntax "[" / "]" / ":" / ";" / "@" / "\" / "," / "." / DQUOTE

atext = ALPHA / DIGIT / ; Any character except controls, "!" / "#" / ; SP, and specials. "$" / "%" / ; Used for atoms "&" / "'" / "*" / "+" / "-" / "/" / "=" / "?" / "^" / "_" / "`" / "{" / "|" / "}" / "~"

atom = [CFWS] 1*atext [CFWS]

dot-atom = [CFWS] dot-atom-text [CFWS]

dot-atom-text = 1*atext *("." 1*atext)

Mark Sapiro

2:06 a.m.

On 2/11/19 9:47 AM, Danil Smirnov wrote:

...

What could be a reason of HyperKitty failing to create a thread from chained emails?

For example, see "RECENTLY ACTIVE DISCUSSIONS" on page https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/

"Openvpn and Network mapping" discussion displayed as five threads instead of only one...

Any ideas how to improve HyperKitty ability to detect threaded messages?

Thanks for the report.

The MR at <https://gitlab.com/mailman/hyperkitty/merge_requests/125> will fix this.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Danil Smirnov

7:51 p.m.

Thank you very much for the quick fix, Mark and Abhilash!

Abhilash, could you include the fix into the "latest" version of mailman-web container please?

Btw I wasn't able to find a changelog of Mailman docker images anywhere and 'Releases' page in Github is quite outdated too... Is there any other place I should check?

Danil

вт, 12 февр. 2019 г., 4:06 Mark Sapiro <mark@msapiro.net>:

...

On 2/11/19 9:47 AM, Danil Smirnov wrote:

...
What could be a reason of HyperKitty failing to create a thread from chained emails?

For example, see "RECENTLY ACTIVE DISCUSSIONS" on page https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/

"Openvpn and Network mapping" discussion displayed as five threads

instead

...
of only one...

Any ideas how to improve HyperKitty ability to detect threaded messages?

Thanks for the report.

The MR at <https://gitlab.com/mailman/hyperkitty/merge_requests/125> will fix this.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

Abhilash Raj

4:46 p.m.

On Thu, Feb 14, 2019, at 11:51 AM, Danil Smirnov wrote:

...

Thank you very much for the quick fix, Mark and Abhilash!

Abhilash, could you include the fix into the "latest" version of mailman-web container please?

They are nightly builds, if the fix was merged and the CI went through, they should already be in there.

...

Btw I wasn't able to find a changelog of Mailman docker images anywhere and 'Releases' page in Github is quite outdated too... Is there any other place I should check?

I have admittedly been too busy to write release notes, but the recent releases have seen not too much changes other than version bumps for the included Mailman packages.

...

Danil

вт, 12 февр. 2019 г., 4:06 Mark Sapiro <mark@msapiro.net>:

...
On 2/11/19 9:47 AM, Danil Smirnov wrote:

...
What could be a reason of HyperKitty failing to create a thread from chained emails?

For example, see "RECENTLY ACTIVE DISCUSSIONS" on page https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/

"Openvpn and Network mapping" discussion displayed as five threads

instead

...
of only one...

Any ideas how to improve HyperKitty ability to detect threaded messages?

Thanks for the report.

The MR at <https://gitlab.com/mailman/hyperkitty/merge_requests/125> will fix this.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

-- thanks, Abhilash Raj (maxking)

Danil Smirnov

10:13 a.m.

Hi guys,

I've updated my containers (though can't confirm for sure the fix is in there; but I see that the 'latest' tag of 'mailman-web' has been rebuilt in the last night).

Could you advise how to rebuild the failing threads please? I've done index update with 'django-admin update_index' command but it seems irrelevant to the issue, I still have all those messages separated: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/2019/2/

Thank you very much!

Danil

сб, 16 февр. 2019 г. в 18:46, Abhilash Raj <maxking@asynchronous.in>:

...

On Thu, Feb 14, 2019, at 11:51 AM, Danil Smirnov wrote:

...
Thank you very much for the quick fix, Mark and Abhilash!

Abhilash, could you include the fix into the "latest" version of mailman-web container please?

They are nightly builds, if the fix was merged and the CI went through, they should already be in there.

...
Btw I wasn't able to find a changelog of Mailman docker images anywhere

and

...
'Releases' page in Github is quite outdated too... Is there any other place I should check?

I have admittedly been too busy to write release notes, but the recent releases have seen not too much changes other than version bumps for the included Mailman packages.

...
Danil

вт, 12 февр. 2019 г., 4:06 Mark Sapiro <mark@msapiro.net>:

...
On 2/11/19 9:47 AM, Danil Smirnov wrote:

...
What could be a reason of HyperKitty failing to create a thread from chained emails?

For example, see "RECENTLY ACTIVE DISCUSSIONS" on page https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/

"Openvpn and Network mapping" discussion displayed as five threads

instead

...
of only one...

Any ideas how to improve HyperKitty ability to detect threaded

messages?

...
...
Thanks for the report.

The MR at <https://gitlab.com/mailman/hyperkitty/merge_requests/125> will fix this.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

-- thanks, Abhilash Raj (maxking)

Mark Sapiro

3:41 p.m.

On 2/17/19 2:13 AM, Danil Smirnov wrote:

...

Could you advise how to rebuild the failing threads please? I've done index update with 'django-admin update_index' command but it seems irrelevant to the issue, I still have all those messages separated: https://wlug.mailman3.com/hyperkitty/list/wlug@lists.wlug.org/2019/2/

I've not tried it and have no idea if it will work, but the query

select message_id, message_id_hash, in_reply_to from hyperkitty_email;

will show those things. You will see some (most) entries have a message_id of the form 'left-part@right-part' with no angle brackets, but some are of form ' <left-part@right-part>'.

In those cases where 'left-part@right-part' of in_reply_to is in a message_id of form ' <left-part@right-part>', you could try altering that in_reply_to to ' <left-part@right-part>'

After doing this, you probably need to run 'django-admin runjob thread_order_depth'.

Obviously, make backups before you start.

An alternative might be to 'fix' the message_id which may or may not require recomputing message_id_hash.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

2364

Age (days ago)

2370

Last active (days ago)

List overview

Download

19 comments

6 participants

participants (6)

Abhilash Raj
Danil Smirnov
Mark Sapiro
Marvin Gülker
Stephen J. Turnbull
tlhackque

hyperkitty failed to create a thread

tags

participants (6)