Hyperkitty search on Message-ID?
I've been working with a rather large import and am finding that I can't seem to get the search on a Message-ID: to return anything. I can do this via the DB, and then look at the message, but this is less than ideal, and requires DB access.
mailmanweb=> select message_id_hash from hyperkitty_email where message_id = '19991231211903.C2870@abq-mail-01.ihighway.net'; message_id_hash
IWKEJSPTJODEHARQ2VI6BEFQKNEJYM22
Any ideas how to get it indexing the ID so a layperson can search on it?
-- Bryan Fields
727-409-1194 - Voice http://bryanfields.net
On 12/5/24 04:25, Bryan Fields wrote:
I've been working with a rather large import and am finding that I can't seem to get the search on a Message-ID: to return anything. ... Any ideas how to get it indexing the ID so a layperson can search on it?
Message-ID is not one of the fields indexed. These are defined at https://gitlab.com/mailman/hyperkitty/-/blob/master/hyperkitty/search_indexe...
A bit obscure for laypersons, but one can compute the message_id_hash from the Message-ID
from base64 import b32encode
from hashlib import sha1
def get_message_id_hash(msg_id):
return b32encode(sha1(msg_id).digest()).decode()
I can't help but wonder though how one has the Message-ID without having the message. I'm wondering in general how often if ever the layperson would be searching the archive for a Message-ID.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 12/5/24 2:02 PM, Mark Sapiro wrote:
Message-ID is not one of the fields indexed. These are defined at https://gitlab.com/mailman/hyperkitty/-/blob/master/hyperkitty/search_indexe...
Looks like it would trivial to add there. Think anyone would be opposed to adding support for this?
A bit obscure for laypersons, but one can compute the message_id_hash from the Message-ID
from base64 import b32encode from hashlib import sha1 def get_message_id_hash(msg_id): return b32encode(sha1(msg_id).digest()).decode()
This is great to know. I assume the thread ID is generated the same way?
I can't help but wonder though how one has the Message-ID without having the message. I'm wondering in general how often if ever the layperson would be searching the archive for a Message-ID.
Well I've been working with an import of 34 years of mail archives (200k messages or so), and we have several people spot checking what we have and fixing threading based on their archive. Since they have the message already, it would be easy to search for the message ID for them. The older archives predate RFC2822 by a decade or so and in-reply-to is missing or just wrong on quite a few of them. From what I can see it was migrated from LISTSERV to majordomo in 1993, then to mailman2 in 2008, and what I have for an archive was from the mail to NNTP spool, at least for the pre-1993 list.
Thanks,
Bryan Fields
727-409-1194 - Voice http://bryanfields.net
On 12/5/24 14:26, Bryan Fields wrote:
On 12/5/24 2:02 PM, Mark Sapiro wrote:
A bit obscure for laypersons, but one can compute the message_id_hash from the Message-ID
from base64 import b32encode from hashlib import sha1 def get_message_id_hash(msg_id): return b32encode(sha1(msg_id).digest()).decode()
Actually, that should be
return b32encode(sha1(msg_id.encode()).digest()).decode()
as sha1 wants bytes. I actually made a very simple command lime converter for myself
#!/usr/bin/env python3
import sys
from base64 import b32encode
from hashlib import sha1
print(b32encode(sha1(sys.argv[1].encode()).digest()).decode())
This is great to know. I assume the thread ID is generated the same way?
The thread ID is the message_id_hash of the first message in the thread, so yes.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Bryan Fields
-
Mark Sapiro