On 6/28/20 12:55 PM, Mark Dadgar wrote:
So in trying to download the recent archives ("export/trackjunkies-new.mbox.gz?start=2019-10-10&end=2020-06-28”), I end up with an 8.5MB file with one day’s worth of posts in it (2019-10-10).
I see these errors in the mailman-web.log:
Traceback (most recent call last): File "/usr/lib/python3.8/email/_header_value_parser.py", line 2060, in get_msg_id token, value = get_dot_atom_text(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 1325, in get_dot_atom_text raise errors.HeaderParseError("expected atom at a start of " email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found ' <BYAPR02MB5976132151FF19510269C44CB3940@BYAPR02MB5976.namprd02.prod.outlook.com>>'
If I'm reading the above correctly, it appears that there is a message in the archive database whose message_id value is 'BYAPR02MB5976132151FF19510269C44CB3940@BYAPR02MB5976.namprd02.prod.outlook.com>'. The trailing '>' is causing HyperKitty's Email.as_message() method to try to set a value of '<BYAPR02MB5976132151FF19510269C44CB3940@BYAPR02MB5976.namprd02.prod.outlook.com>>'. Or maybe there's leading white space in the value, or both. Try the following patch. --- a/hyperkitty/models/email.py +++ b/hyperkitty/models/email.py @@ -24,7 +24,7 @@ import logging import os import re from email.message import EmailMessage -from email.utils import formataddr +from email.utils import formataddr, make_msgid from django.conf import settings from django.db import IntegrityError, models @@ -175,7 +175,12 @@ class Email(models.Model): header_date = self.date.astimezone(tz).replace(microsecond=0) # Date format: http://tools.ietf.org/html/rfc5322#section-3.3 msg["Date"] = header_date.strftime("%a, %d %b %Y %H:%M:%S %z") - msg["Message-ID"] = "<%s>" % self.message_id + try: + msg["Message-ID"] = "<%s>" % re.sub('[<>\s]', '', + self.message_id) + except: + msg["Message-ID"] = make_msgid() if self.in_reply_to: msg["In-Reply-To"] = unfold(self.in_reply_to) Also note that if you got 8.5.MB for one day's messages, probably the entire range will time out and you'll have to get it in pieces. -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan