On Jun 18, 2020, at 4:08 PM, Mark Sapiro <mark@msapiro.net> wrote:
On 6/18/20 3:33 PM, Mark Dadgar wrote:
Any thoughts on how to salvage ~20 years of list archives?
The archives shouldn't be an issue. I don't think they are involved in the corruption, but anyway, hopefully you still have the archive mbox you imported. Then for the messages since then, you should be able to get a mbox from hyperkitty with something like
with the obvious substitutions and appropriate start and end dates.
So in trying to download the recent archives ("export/trackjunkies-new.mbox.gz?start=2019-10-10&end=2020-06-28”), I end up with an 8.5MB file with one day’s worth of posts in it (2019-10-10).
I see these errors in the mailman-web.log:
Traceback (most recent call last): File "/usr/lib/python3.8/email/_header_value_parser.py", line 2060, in get_msg_id token, value = get_dot_atom_text(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 1325, in get_dot_atom_text raise errors.HeaderParseError("expected atom at a start of " email.errors.HeaderParseError: expected atom at a start of dot-atom-text but found ' <BYAPR02MB5976132151FF19510269C44CB3940@BYAPR02MB5976.namprd02.prod.outlook.com>>'
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/lib/python3/dist-packages/hyperkitty/views/mlist.py", line 332, in stream_mbox msg = email.as_message() File "/usr/lib/python3/dist-packages/hyperkitty/models/email.py", line 175, in as_message msg["Message-ID"] = "<%s>" % self.message_id File "/usr/lib/python3.8/email/message.py", line 409, in __setitem__ self._headers.append(self.policy.header_store_parse(name, val)) File "/usr/lib/python3.8/email/policy.py", line 148, in header_store_parse return (name, self.header_factory(name, value)) File "/usr/lib/python3.8/email/headerregistry.py", line 602, in __call__ return self[name](name, value) File "/usr/lib/python3.8/email/headerregistry.py", line 197, in __new__ cls.parse(value, kwds) File "/usr/lib/python3.8/email/headerregistry.py", line 530, in parse kwds['parse_tree'] = parse_tree = cls.value_parser(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 2117, in parse_message_id token, value = get_msg_id(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 2064, in get_msg_id token, value = get_obs_local_part(value) File "/usr/lib/python3.8/email/_header_value_parser.py", line 1509, in get_obs_local_part obs_local_part[1].token_type=='dot'): IndexError: list index out of range [pid: 1359|app: 0|req: 12933/12933] 76.102.110.193 () {60 vars in 1504 bytes} [Sun Jun 28 19:52:57 2020] GET /mailman3/hyperkitty/list/trackjunkies@pdc-racing.net/export/trackjunkies-new.mbox.gz?start=2019-10-10&end=2020-06-28 => generated 5572382 bytes in 4271 msecs (HTTP/1.1 200) 5 headers in 203 bytes (377 switches on core 1)
Thoughts? It would be nice to salvage the last 8 months of archives.
- Mark
mark@pdc-racing.net | 408-348-2878