On 6/21/19 12:57 AM, andrew.bernard@gmail.com wrote:
A curious problem, but seems to be well defined. I am importing mbox format files with 25 years worth of archives in 250 files from a Listserv list into hyperkitty. All is going swimmingly, except I get lots of partial emails with sender of None put into the current month, with text that has clearly been truncated. As I look at each one, I see that the line in the imported text is immediately after a line starting with 'From'. Clearly, 'From:' is an RCS-82 header, but 'From ' is not, and yet there is much evidence here that this is messing up the import.
Is this a known bug? If not, how do I report it?
As other responses in this thread have pointed out, it is a well known issue with mbox format in that lines beginning with 'From ' are message separators.
There is a (Python 2) script at <https://bazaar.launchpad.net/~mailman-coders/mailman/2.1/view/head:/bin/cleanarch> that will process a mbox and prefix with '>' all lines beginning with 'From ' that don't look like real 'From ' lines or which aren't immediately followed by a line that looks like a valid header line.
It isn't perfect because it won't handle a message that contains in it's body a copy of another message containing a Unix From_ line, but it can helm with most unescaped From_ lines.
If you want to use that script, replace lines 55 and 56
import paths from Mailman.i18n import C_
with
def C_(s): return s
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan