Johannes Rohr writes:
IndexError: list index out of range
That isn't the problem, it's the earlier failure, I think:
File "/usr/lib/python3.8/email/_header_value_parser.py", line 2069, in get_msg_id token, value = get_dot_atom_text(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 1334, in get_dot_atom_text raise errors.HeaderParseError("expected atom at a start of " email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found '@localhost>'
It appears that HyperKitty is trying to find a domain name (the common use for "dot_atom_text" in mail headers in a Message-ID), but is finding "@localhost>" instead.
First I would try the check_hk_import script which is provided with HyperKitty. (You may also want to use the Mailman 2.1 cleanarch script to check for unescaped 'From ' lines, or the script at https://www.msapiro.net/scripts/cleanarch2 which can do that and check for unparseable Date: headers as well.)
If that fails to identify the problem, try
grep -iE '^(Message-ID|In-Reply-To).*@localhost>' problem_mbox_file | wc -l
to see how how often that string appears, then
grep -iE '^(Message-ID|In-Reply-To).*@localhost>' problem_mbox_file | head -15
to see if you can identify the problem text.[1] The problem character before the "@" is probably "<", "@", or ".", but maybe it's one of these: ()<>[]:;@\,." (note the double quote is a disallowed character). Other ASCII punctuation are allowed in message IDs.
Once we know how the message id(s) is (are) malformed, we can discuss how to deal with them.
Steve
Footnotes: [1] In theory we should also check References but those usually have multiple continuations, which is annoying to deal with in grep.