I believe that for threading to work more or less reliably, the thing to do is to look at the 'References' header.
That should give you the thread in order, and allow any message received out of sequence to be put in the proper location in your display.
This is more difficult than it seems, since References is not defined for mail as far as I know (it's a netnews concept originally, and RFC 5337 is a netnews-specific RFC). Although it is adopted by most mail clients, there's no guarantee of strict conformance. In mail, In-Reply-To is more reliably present, but of course asynchronicity means you have no guarantee of a complete set of all links. Even if all clients conform to the netnews RFC, you still need to create the full tree (full conformance to RFC 5537 means it will be a tree), and break ties between branches in some arbitrary way to create a total order. It's worse if you don't have conformance from all clients: you can get a DAG or even something that isn't even a DAG. So, even today, you can't assume a well-behaved ancestry graph.
'References' should be a superset of Reply-To (which is at most 1 Message-ID), so you only need Reply-To if there is no References - or to handle a client that doesn't obey the RFCs.
I suppose HyperKitty uses References (it works for messages that have proper Message-IDs ;-), but I don't know what algorithm it uses. Might be worth looking into, as well as considering a more Postelian parsing of Message-IDs. Specifically, take the field body, unfold it, strip leading and trailing whitespace and leading "<" and trailing ">", and whatever's left is the message ID.
Alternatively, strip everything that's not atext or "@" (including inside the purported Message-ID). This won't break any RFC 5537-valid Message-IDs, but might identify two different, nonconforming Message-IDs as the same (too bad if they can't take a joke!), or identify a nonconforming message with a conforming one (<sad_emoji />.