I (along with some other admins) run a mailman3 (3.2.1, on debian stable) mailing list for notmuch (https://notmuchmail.org). It's our main method of reviewing, so we get lots of mail from git-send-email. Occasionaly people send it with Content-Transfer-Encoding 8bit. As far as I can tell what happens then is mailman3 re-encodes it as base64, taking the line endings as CRLF. git-am is unable to apply the patches that have been re-encoded.
This seems somewhat related to 1. Like one commentator there, I would be happy to be able to turn off the re-encoding, although I have the feeling from playing with the python email libraries that this might not be that easy. I'm writing in hope that someone can suggest another workaround, perhaps some kind line-ending rewriting before base64 encoding. I guess with 8bit transfer encoding there probably isn't a clear distinction between the line-endings on the wire, and the intended line-endings for delivery, but it could be inferred from the "X-Mailer: git-send-email" header.
As a test, I sent the same patch through our mailman instance with Content-Transfer-Encoding base64 and that applies fine, since the original \n line endings are inside the base64 container. It's a known mis-feature of git-am that it can handle either CRLF line endings or base64 encoding, but not the combination of both. I have a local workaround involving re-formatting the patches, but I'd really like anyone to be able to apply the patches from the list archive.
Quoted parts are reordered.
Executive summary: Mail occasionally arrives at Mailman, Mailman reassembles it with a git patch using CRLF line endings encoded as BASE64, and git can't apply the patch without "fixing" the line endings. I don't think this is a Mailman problem. Analysis below.
It would be very helpful to have a copy of such a mail (verbatim as delivered to Mailman's MTA would be best, but at least all Content-Type, Content-Disposition, and Content-Transfer-Encoding header fields).
David Bremner writes:
Occasionaly people send it with Content-Transfer-Encoding 8bit.
I guess with 8bit transfer encoding there probably isn't a clear distinction between the line-endings on the wire, and the intended line-endings for delivery,
It's clear from RFC 2045: line endings on the wire must be CRLF.[1] What they are at either end is ambiguous because it's up to the MTAs, *and will be different on different receiving hosts if the native conventions of the hosts are different.* This means that as far as Mailman is concerned those line endings are *supposed* to be CRLF and nothing else. (Not clear what should happen if 8bit conflicts with the MIME content type or the content of the patch body, see below, but that's the sending system's issue.)
I don't think this is a Mailman 3 problem (except possibly in the archive). The canonical [content transfer] encoding model[2] implies that since Mailman receives the message "off the wire" via LMTP (rather than as a system file), if the MIME content type is text/*, (1) the line breaks Mailman sees in the message body are always CRLF after content transfer decoding, (2) the line breaks Mailman emits to its MTA are always CRLF "inside" the BASE64, and (3) the receiving mail system (usually the MTA) is responsible for decoding the BASE64 and then transforming the CRLFs to the local convention. (1) and (2) seem to be consistent with your report, and would indicate that Mailman is conforming to the relevant standards.
I wonder if the sending systems are using a binary MIME content type such as application/patch? That would be incompatible with the 8bit content transfer encoding, and might confuse the systems on either end as to what encoding should be done. (I guess it might also confuse Mailman, I'll have to check, but it doesn't point *only* at Mailman.) In that case, the MTA will decode the BASE64 but *not* transform the CRLF to LF, which is consistent with the observed behavior as well.
Of course it's *possible* that Mailman is receiving LF and transforming that to CRLF, but that implies sender behavior inconsistent with the 8bit content transfer encoding, there's nothing in the email processing model that suggests Mailman might do that under the reported conditions (in particular, the 8bit content transfer encoding is the identity, implying that even if Mailman "notices" that it is 8bit, it should do nothing about it except maybe complain about NUL, solo CR or LF, and lines longer than 998 bytes), and I think it would show up in other contexts with similarly distressing consequences if Mailman were doing it.
I have a local workaround involving re-formatting the patches, but I'd really like anyone to be able to apply the patches from the list archive.
If your archive is hosted on an LF system and the content-type applied to the patch is text/*, it "should" just work and the archived patches being impossible to apply "as is" is probably a Mailman issue.
This seems somewhat related to https://gitlab.com/mailman/mailman/-/issues/269
Not the same. The problem there is that for some reason Mailman is altering the text (removing trailing whitespace). That's unambiguously wrong regardless of MIME content type or content transfer encoding, and damaging in the case of format=flowed. I'm going to have to revisit that issue.
I guess with 8bit transfer encoding there probably isn't a clear distinction between the line-endings on the wire, and the intended line-endings for delivery, but it could be inferred from the "X-Mailer: git-send-email" header.
No, *you* as receiver can make that inference, but *we* as a middleman can't. git is used on both LF and CRLF platforms. If we're receiving CRLF, we have to send it back out.
This implies that if you want to, you can write a plug-in handler to make this transformation inside Mailman, and insert it into the processing pipeline before the archive handler.
Footnotes: [1] RFC 2045, Section 2.8. 8bit Data
"8bit data" refers to data that is all represented as relatively short lines with 998 octets or less between CRLF line separation sequences [RFC-821]), but octets with decimal values greater than 127 may be used. As with "7bit data" CR and LF octets only occur as part of CRLF line separation sequences and no NULs are allowed.
That is, Content-Transfer-Encoding: 8bit is not binary, does not admit non-text, and requires that line-endings be CRLF.
[2] RFC 2049, Section 4. Canonical Encoding Model
participants (2)
-
David Bremner
-
Stephen J. Turnbull