Notes inline.
On 12-Feb-19 11:09, Stephen J. Turnbull wrote:
tlhackque writes:
I believe that for threading to work more or less reliably, the thing to do is to look at the 'References' header.
That should give you the thread in order, and allow any message received out of sequence to be put in the proper location in your display.
This is more difficult than it seems, since References is not defined for mail as far as I know (it's a netnews concept originally, and RFC 5337 is a netnews-specific RFC).
Not sure why you believe this. RFC2822 3.6.4 defines References for e-mail.
See https://tools.ietf.org/html/rfc2822#page-25
As I wrote in later post, the message-ID syntactically requires the <>. (just one set).
msg-id = [CFWS] "<" id-left "@" id-right ">" [CFWS]
Look at that for the full syntax and reference.
Although it is adopted by most mail clients, there's no guarantee of strict conformance.
I agree that conformance has traditionally been hit-or-miss with almost everything about e-mail. Sigh.
I also agree that life is hard. There have been clients that use the same message-id for all messages, and it's a long-standing problem. E.g. See https://cr.yp.to/immhf/thread.html
Not only do you not have a guarantee of ever getting all links, you don't have a guarantee of the order in which they arrive. So one can have gaps ( parent, reply2-delayed, reply 3 references reply 2 and parent, reply 2 might arrive - or note.) You also can have a reply that references 2 or more branches. E.g.
/--R0
p/--r1
\ --r2 /
And a reply (r3) comes that references r1 AND r2 (and the parent) - which you probably want to show as a descendant of both r1 AND r2. R4 replies to r2 - but that's also an implicit response to r1.
Once you's sorted that out, r5 comes along that references R0, r4, and P.
2822 does say:
Therefore, trying to form a
"References:" field for a reply that has multiple parents is
discouraged and how to do so is not defined in this document.
But programmers are rarely discouraged - with GUIs, it's pretty easy to intuit that one might like to check the boxes on several branches of a thread and respond "Of course, you're all right - see my cat photo". :-)
5537 tries to make life easier - until you get to:
If the resulting References header field would, after unfolding,
exceed 998 characters in length (including its field name but not the
final CRLF), it MUST be trimmed (and otherwise MAY be trimmed).
Trimming means removing any number of message identifiers from its
content, except that the first message identifier and the last two
MUST NOT be removed.
So, even if you get all the messages, you don't necessarily get all the references that you need.
I'm sure glad that I'm not working on Hyperkitty! But I'm not holding my breath for when MM3 is complete enough to adopt in my environment.
Yes, "In-Reply-To" is more often present (likely because it's modestly simpler for RFC-readers to understand - it's one thing, not a list.) It's less expressive (hence, informative.) One can try to use it as a fallback when no References is present (and that's what 2822 implies).
The whole undertaking in the real world is an example of the halting problem. Or Parkinson's Law. The best one can do is to come up with a "good enough" approximation in the available time, and tinker with/improve it until bored.
It's likely to be a bunch of heuristics. And if you're not careful, throw in the Peter principle :-(
In mail, In-Reply-To is more reliably present, but of course asynchronicity means you have no guarantee of a complete set of all links. Even if all clients conform to the netnews RFC, you still need to create the full tree (full conformance to RFC 5537 means it will be a tree), and break ties between branches in some arbitrary way to create a total order. It's worse if you *don't* have conformance from all clients: you can get a DAG or even something that isn't even a DAG. So, even today, you can't assume a well-behaved ancestry graph.
'References' should be a superset of Reply-To (which is at most 1 Message-ID), so you only need Reply-To if there is no References - or to handle a client that doesn't obey the RFCs.
I suppose HyperKitty uses References (it works for messages that have proper Message-IDs ;-), but I don't know what algorithm it uses. Might be worth looking into, as well as considering a more Postelian parsing of Message-IDs. Specifically, take the field body, unfold it, strip leading and trailing whitespace and leading "<" and trailing ">", and whatever's left is the message ID.
Alternatively, strip everything that's not atext or "@" (including inside the purported Message-ID).
See my other post. The left and right halves can, besides being atoms, be non-folding quotes or literals. So you have to handle that. Although I have recently seen e-mail clients that are confused when you do it. (Ran into that generating content-id for multipart/related. Yes, MIME makes plain e-mail look straightforward.)
Although it has a defined syntax, the semantics are that message-id is an opaque globally-unique identifier. Attempting to parse it as anything but '<[^>]+>' is likely to be a mistake. (On receipt; generating is the other side of Postel-correctness... I commend RFC 2468 to anyone who hasn't read it.)
I doubt 'striping' is worth doing - when an message-id is present, I've almost always seen it include the required <>.
Until yesterday's hyperkitty bug, I've never seen <<>>. I've rarely seen the <> missing - but I'd consider that a warning that the message format is suspect. As in this case.
My suggestion is that if you can't find a message-ID, keep the message in an "unthreaded" bucket. If you sort that by subject (omitting "[listtag]" and "Re:*" (case insensitive, and in multiple languages) then by date, you probably have a useful presentation. And a list of MUAs that need bug reports :-)
Yes, althought 2822 says that "Re:" is the proper introducer in the subject of a reply, RE: is common, and I've seen it (incorrectly) translated in the headers.)
As I said, "life is hard".
This won't break any RFC 5537-valid Message-IDs, but might identify two different, nonconforming Message-IDs as the same (too bad if they can't take a joke!), or identify a nonconforming message with a conforming one (<sad_emoji />.
Thoughts?
Steve
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/