Mailman rewrite the Cc header incorrectly
I just notice that if I send an email to a list with a Cc header broken into multiple lines and the pair of double quotes are put in different lines, the email addresses afterwards will be wrongly rewritten.
For example,
CC: "ABC DEF (XYZ)" <name@example.com>
It will become the following after going through the mailing list:
CC: XYZ <"ABC DEF"@mail.server.name>
It seems that Outlook often split the To/Cc header into multiple lines and they often add double quotes to the full name of the users. Is the rewriting of Cc header handled by Mailman's code or another Python class? Thanks.
Regards, Alan So
On 5/14/20 7:44 PM, Alan So wrote:
I just notice that if I send an email to a list with a Cc header broken into multiple lines and the pair of double quotes are put in different lines, the email addresses afterwards will be wrongly rewritten.
For example,
CC: "ABC DEF (XYZ)" <name@example.com>
tl;dr It is not valid to fold a header inside of a quotes string.
First of all, the above is invalid. Presumably the address is intended to be
"ABC DEF (XYZ)" <name@example.com>
RFC 5322, sec 2.2.3 says in part:
"The general rule is that wherever this specification allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP."
and the definitions in sec 3.4 say in part:
address = mailbox / group
mailbox = name-addr / addr-spec
name-addr = [display-name] angle-addr
angle-addr = [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr
display-name = phrase
In this case, the quoted-string "ABC DEF (XYZ)" is a phrase which is the display-name and the only places where comments or folding white space are allowed is immediately preceding the "<" or following the ">" surrounding the addr-spec. Folding in the middle of the quoted string is not allowed.
Were there no quotes, e.g.
ABC DEF (XYZ) <name@example.com>
Then ABC DEF is a phrase and (XYZ) is a comment and folding between DEF and (XYZ) would be allowed
It will become the following after going through the mailing list:
CC: XYZ <"ABC DEF"@mail.server.name>
Which is clearly wrong, but GIGO.
It seems that Outlook often split the To/Cc header into multiple lines and they often add double quotes to the full name of the users. Is the rewriting of Cc header handled by Mailman's code or another Python class?
I haven't looked at the code to see exactly what's responsible here. I do note that adding double quotes around a display name is not only allowed, but is required if the display name contains "specials".
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Just FYI.
I omitted the "<" and ">" when I replace the real testing address with name@example.com.
It should be,
CC: "ABC DEF (XYZ)" <name@example.com>
On 5/14/20 9:24 PM, Alan So wrote:
Just FYI.
I omitted the "<" and ">" when I replace the real testing address with name@example.com.
It should be,
CC: "ABC DEF (XYZ)" <name@example.com>
Actually, you didn't omit them. It's just the way HyperKitty renders what it thinks is an email address that makes it appear they aren't there. If you download the thread from HyperKitty, you'll see they are there.
Anyway I am unable to duplicate your issue with your example data. I sent this raw message
To: test@example.com
From: Mark <mark@msapiro.net>
Cc: abc def
(ghi) <abc@example.com>
"abc def
(ghi)" <def@example.com>
Subject: test
Message-Id: <msgid@example.com>
Date: Sat, 16 May 2020 09:22:12 -0700
Hi
and the message I received from the list included these headers
To: test@example.com
From: Mark <mark@msapiro.net>
Message-Id: <msgid@example.com>
Date: Sat, 16 May 2020 09:22:12 -0700
CC: "abc def (ghi)" <abc@example.com>, "abc def
(ghi)" <def@example.com>
Subject: [List] test
I do observe something similar to what you report that has nothing to do with folding. If I omit the trailing ">" in an address as in
From: Mark <mark@msapiro.net
the address becomes
From: "Mark <mark"@msapiro.net
So possibly your issue involves missing angle brackets.
In order to say more, I'd need to see the actual headers as sent and received or at least an accurate munged representation.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
It is exactly the file I used to test (listname and list server munged). I can send you the full header personally if you want.
Sent====> (I use "cat file | sendmail listname@listserv.domain" to send)
From: ktso@cuhk.edu.hk To: listname@listserv.domain CC: "ABC DEF (XYZ)" <ktso@cuhk.edu.hk> Subject: testing please ignore Message-ID: <sfhifdgdfdsjHHIHIo987sd@98vD9FVhdv.8shy65FGvsd>
hello
Received====> From: ktso@cuhk.edu.hk To: listname@listserv.domain Subject: testing please ignore Message-ID: <sfh39897897jHHIHIo987sd@98vD9FVhdv.8shy65FGvsd> CC: XYZ <"ABC DEF"@fqdn.domain>
On 5/19/20 4:09 AM, Alan So wrote:
It is exactly the file I used to test (listname and list server munged). I can send you the full header personally if you want.
I did a similar test. I used this message exactly
From: mark@msapiro.net
To: list@example.com
CC: "ABC DEF
(XYZ)" <ktso@cuhk.edu.hk>
Subject: testing please ignore
Message-ID: <sfhifdgdfdsjHHIHIo987sd@98vD9FVhdv.8shy65FGvsd>
Hello
and I posted it to the list via mailman inject
, and the message I
received back from the list with some headers removed was:
From: mark@msapiro.net
To: list@example.com
Message-ID: <sfhifdgdfdsjHHIHIo987sd@98vD9FVhdv.8shy65FGvsd>
Message-ID-Hash: NUQVMF52CDRIRH4RXEM2VV7MU3HYXASG
X-Message-ID-Hash: NUQVMF52CDRIRH4RXEM2VV7MU3HYXASG
Date: Tue, 19 May 2020 17:24:34 -0700
CC: "ABC DEF
(XYZ)" <ktso@cuhk.edu.hk>
X-Mailman-Version: 3.3.2b1
Precedence: list
Subject: [List] testing please ignore
List-Id: <list.example.com>
Archived-At:
<https://msapiro.net/archives/list/list@example.com/message/NUQVMF52CDRIRH4RXEM2VV7MU3HYXASG/>
List-Archive: <https://msapiro.net/archives/list/list@example.com/>
List-Help: <mailto:list-request@example.com?subject=help>
List-Post: <mailto:list@example.com>
List-Subscribe: <mailto:list-join@example.com>
List-Unsubscribe: <mailto:list-leave@example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Hello
_______________________________________________
List mailing list -- list@example.com
To unsubscribe send an email to list-leave@example.com
Note that the Folded CC: header in the received message is exactly the same as in the original message.
Are you looking at the raw message received from the list or some MUA's rendering of it?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks a lot Mark. Just manage to perform the same test using "mailman inject" and the result is the same as yours. Therefore, it is quite probable that Postfix may rewrite the headers before passing the message to Mailman using LMTP. We will continue to check the issue with our Postfix installation. Thanks again for your help!
Further checking the LMTP conversation by using tcpdump -i lo port 8024 and the CC header still appear as is during LMTP:
220 list.host.domain GNU Mailman LMTP runner 2.0 LHLO list.host.domain 250 list.host.domain MAIL FROM:<ktso@myhost.domain> 250 OK RCPT TO:<listname@listserv.domain> 250 OK DATA 354 End data with <CR><LF>.<CR><LF> Received: from myhost.domain (myhost.domain [192.168.123.123]) by list.host.domain (Postfix) with ESMTP id 9370C155349E for <listname@listserv.domain>; Fri, 22 May 2020 09:53:58 +0800 (HKT) Received: by myhost.domain (Postfix, from userid 1000) id 6D848320BA5; Fri, 22 May 2020 09:53:58 +0800 (HKT) From: ktso@cuhk.edu.hk To: listname@listserv.domain CC: "ABC DEF (XYZ)" <ktso@cuhk.edu.hk> Subject: testing please ignore Message-ID: <sYGHb7tfH7549ub3To987sd@9oHHH7934v.8shy65FGvsd> Date: Fri, 22 May 2020 09:53:58 +0800 (HKT)
hello . 250 Ok QUIT 221 Bye
Since the CC header remains unchanged if we test using "mailman inject", we may further check whether Mailman LMTP Runner change the CC header.
On 5/21/20 7:52 PM, Alan So wrote:
Since the CC header remains unchanged if we test using "mailman inject", we may further check whether Mailman LMTP Runner change the CC header.
You are correct. If I post your test via email rather than mailman inject
, I see the issue. I'm working on figuring out why. (Sorry for
spamming you with test Ccs.)
I'm at a loss to explain why mailman inject
doesn't show this. I have
captured the queued message and the only differences I see between the
one from the lmtp runner and the one from inject is the one from lmtp
runner has an X-MailFrom: header and
'received_time': datetime.datetime(2020, 5, 22, 3, 45, 24, 532711),
'to_list': True,
and I even tried running mailman inject
with -m to_list=True
and it
still didn't fail.
However, if I take the qfile that dumps as
> $ bin/mailman qfile 1590119124.5909426+9a002936a59e194f70187405ac624173cbf470b5.pck
> [----- start pickle -----]
> <----- start object 1 ----->
> Received: by msapiro.net (Postfix, from userid 1000)
> id 60624340253; Thu, 21 May 2020 20:45:24 -0700 (PDT)
> From: mark@msapiro.net
> To: list@example.com
> CC: "ABC DEF
> (XYZ)" <mark@msapiro.net>
> Subject: testing please ignore
> Message-ID: <something_more@example.net>
> Date: Fri, 22 May 2020 09:53:58 +0800 (HKT)
> Message-ID-Hash: NDG3FHLPO35F2Y4YXFAKFIKDXAC6FHGM
> X-Message-ID-Hash: NDG3FHLPO35F2Y4YXFAKFIKDXAC6FHGM
> X-MailFrom: mark@msapiro.net
>
> hello
>
> <----- start object 2 ----->
> { '_parsemsg': False,
> 'listid': 'list.example.com',
> 'original_size': 326,
> 'received_time': datetime.datetime(2020, 5, 22, 3, 45, 24, 532711),
> 'to_list': True,
> 'version': 3}
> [----- end pickle -----]
and put it in the 'in' queue the received message from the list does contain
CC: XYZ <ABCDEF@msapiro.net>
Thus, the issue occurs downstream of lmtp runner. It will take more investigation to determine where.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro wrote:
Thus, the issue occurs downstream of lmtp runner. It will take more investigation to determine where.
The issue occurs in mailman/handlers/avoid_duplicates.py. That module will rewrite the Cc: header after possibly deleting some of the entries.
That module calls msg.get_all('cc') to get the original Ccs and calls email.utils.getaddresses() on that to get a list of all the name, address pairs.
In our case, msg.get_all('cc') returns one of two things. If the message came via LMTP with CRLF line endings, it returns
['"ABC DEF\r\n (XYZ)" <mark@msapiro.net>']
If the message came from mailman inject
with LF line endings, it returns
['"ABC DEF\n (XYZ)" <mark@msapiro.net>']
then email.utils.getaddresses() calls internally email.utils.parseaddr
on each item in the list to make a list (of one in our case) of name,
address pairs. The result is different depending on the line endings
which explains why it works with mailman inject
but not with LMTP.
In the case where the line endings are LF, the return is
[('ABC DEF\n (XYZ)', 'mark@msapiro.net')]
but if the line endings are CRLF, the return is
[('XYZ', 'ABC DEF')]
If the "folding" is not quoted as in
ABC DEF (XYZ) <mark@msapiro.net>
then the return is
('ABC DEF (XYZ)', 'mark@msapiro.net')
regardless of the line ending as it should be. I can't really fault parseaddr() here because the only time it produces an unexpected result is when the CRLF line folding occurs in the middle of a quoted string which is clearly non-compliant.
For Mailman to defend against this, something like the attached patch would do.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 5/22/20 2:32 PM, Mark Sapiro wrote:
For Mailman to defend against this, something like the attached patch would do.
The patch attached to the prior post has a typo, a misplaced colon. The one attached here should be correct
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Really thanks for your hard work Mark! It is a bit strange that parseaddr behave differently with CRLF and LF. Maybe if the server is a Windows, the behavior is consistent.
May I know when or whether will this patch be committed to the repository so that it will be effective in future release of Mailman? Thanks.
On 5/24/20 7:20 PM, Alan So wrote:
Really thanks for your hard work Mark! It is a bit strange that parseaddr behave differently with CRLF and LF. Maybe if the server is a Windows, the behavior is consistent.
It might be that on Windows, mailman inject
would create the message
with CRLF line endings.
May I know when or whether will this patch be committed to the repository so that it will be effective in future release of Mailman? Thanks.
In the absence of an issue at <https://gitlab.com/mailman/mailman/-/issues>, probably never. <hint>
The bottom line is the original mail created by Outlook is defective.[1] While Mailman could defend against this defect, starting to do that opens the possibility that there are other areas of the code that need a similar defense.
Have you asked Microsoft to fix Outlook to not create these defective messages? That's the real solution.
That said, if an issue is filed, someone might consider adding something along the lines of <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/Y6DCI4Z2J57K5577MGFXUEUXXQVOIOBE/attachment/2/avoid_duplicates_patch.txt> to the code base.
[1] given an address of the form
display-name (comment) <addr-spec>
in a header, it can be folded either between the display-name and the (comment) or between the (comment) and the <addr-spec>, but if display-name (comment) is quoted as in
"display-name (comment)" <addr-spec>
"display-name (comment)" is now a phrase within which folding is not allowed and this can only be folded between the "display-name (comment)" and the <addr-spec>.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro wrote:
In the absence of an issue at https://gitlab.com/mailman/mailman/-/issues, probably never. <hint>
Is that anyone with a gitlab account can file an issue? If it is appropriate, I can try to file one.
Have you asked Microsoft to fix Outlook to not create these defective messages? That's the real solution.
Yes and I agree. My colleague has actually done so but it is probably not yet escalated to the appropriate tier of support. Here is the comment of my colleague: "I can submit this problem to Microsoft. However, it may take a long time to fix it or not even important enough to be fixed."
On 5/25/20 12:12 AM, Alan So wrote:
Mark Sapiro wrote:
In the absence of an issue at https://gitlab.com/mailman/mailman/-/issues, probably never. <hint>
Is that anyone with a gitlab account can file an issue? If it is appropriate, I can try to file one.
Yes, anyone with a gitlab account can file an issue.
Have you asked Microsoft to fix Outlook to not create these defective messages? That's the real solution.
Yes and I agree. My colleague has actually done so but it is probably not yet escalated to the appropriate tier of support. Here is the comment of my colleague: "I can submit this problem to Microsoft. However, it may take a long time to fix it or not even important enough to be fixed."
At least you're trying and that's a good thing even if they don't fix it.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks. Just filed an issue: https://gitlab.com/mailman/mailman/-/issues/725
On 5/26/20 12:33 AM, Alan So wrote:
Thanks. Just filed an issue: https://gitlab.com/mailman/mailman/-/issues/725
Thank you for your "well formed" issue. I have started to work on this, but I have run into a snag.
The patch at <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/Y6DCI4Z2J57K5577MGFXUEUXXQVOIOBE/attachment/2/avoid_duplicates_patch.txt> doesn't work. The problem is msg.get_all(header, []) does not parse the header and return a list of addresses. It just returns a list, the items of which are the raw values if the header instances in msg. I.e., if there is only one Cc: header, msg.get_all('cc', []) returns a list of one string which is the raw value of that header. If there were two Cc: headers in the message it would return a list of two strings with the two raw values.
Thus, we would need to parse the strings returned into the individual email addresses. The simple minded way to do that is
addrs = []
for hdr in msg.get_all(header, []):
for addr in hdr.split(','):
addrs.append(parseaddr(re.sub('[\r\n]', '', addr)))
but that will fail if the header contains a quoted display name containing a comma such as
"last, first" <user@example.com>
So, I need to go back to using getaddresses, but do the unfolding first. Watch the issue for the final fix.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks again for the work and just notice that the issue has been closed already. Looks like unfolding the header line is a logical way but it seems that >=3.3 email package unfold the header before parsing the header according to the following python issue:
On 5/26/20 9:10 PM, Alan So wrote:
Thanks again for the work and just notice that the issue has been closed already. Looks like unfolding the header line is a logical way but it seems that >=3.3 email package unfold the header before parsing the header according to the following python issue:
Yes, the issue is closed because the fix has been merged and will be in Mailman Core 3.3.2.
I agree that it appears that there would have been no issue if Mailman
created messages using the SMTP
policy and retrieved the addresses
using the addresses
attribute of the headers.
However, for historical reasons, Mailman Core still uses the compat32
policy (Python 3.2 compatibility) and uses email.utils.getaddresses()
to get the addresses from the header value, so the "fix" for
<https://bugs.python.org/issue11050> doesn't apply.
Changing Mailman Core to use email.EmailMessage() objects rather than the legacy email.Message() objects is a worthwhile goal, but would require extensive changes to fix all the things that broke as a result.
Anyway, I really appreciate your diligence in following through repeatedly on this issue. Were it not for that, I wouldn't have been motivated to fix it. For a look at why even a trivial change is not trivial, see the commits at <https://gitlab.com/mailman/mailman/-/merge_requests/652/commits>. The first of these is developing good tests for the issue and then fixing the code so the tests pass, and the second, unanticipated one was due to the fact that Python 3.5 didn't maintain the original ordering of the addresses in Cc: so I couldn't compare against a fixed result. This is due to the fact that the underlying code uses dictionaries, and prior to Python 3.6 dictionaries didn't preserve order.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro wrote:
Yes, the issue is closed because the fix has been merged and will be in Mailman Core 3.3.2.
Thanks again.
I tried to read RFC5322 again but I am not sure whether I interpret it correctly. The display-name seem allowing folding in the middle.
3.2.4 quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
A quoted-string is treated as a unit. That is, quoted-string is identical to atom, semantically. Since a quoted-string is allowed to contain FWS, folding is permitted. Also note that since quoted-pair is allowed in a quoted-string, the quote and backslash characters may appear in a quoted-string so long as they appear as a quoted-pair.
3.2.5 word = atom / quoted-string phrase = 1*word / obs-phrase
3.4 display-name = phrase
A quoted-string is treated as a unit. That is, quoted-string is identical to atom, semantically. Since a quoted-string is allowed to contain FWS, folding is permitted. Also note that since quoted-pair is allowed in a quoted-string, the quote and backslash characters may appear in a quoted-string so long as they appear as a quoted-pair.
Anyway, I really appreciate your diligence in following through repeatedly on this issue. Were it not for that, I wouldn't have been motivated to fix it. For a look at why even a trivial change is not trivial, see the commits at https://gitlab.com/mailman/mailman/-/merge_requests/652/commits. The first of these is developing good tests for the issue and then fixing the code so the tests pass, and the second, unanticipated one was due to the fact that Python 3.5 didn't maintain the original ordering of the addresses in Cc: so I couldn't compare against a fixed result. This is due to the fact that the underlying code uses dictionaries, and prior to Python 3.6 dictionaries didn't preserve order.
As our organization chose a free open-source software to provide service, I think it might be the only way that we can get the issue resolved since there is no official support service for a volunteer based project. Most importantly, without your detail explanation it will be really hard for us to further study the issue and file the issue in the bug tracking system.
It is a good lesson anyway since it really gives me some solid experiences about running codes with different versions of Python and it is sometimes not so trivial when developing unit tests.
On 5/27/20 2:35 AM, Alan So wrote:
I tried to read RFC5322 again but I am not sure whether I interpret it correctly. The display-name seem allowing folding in the middle.
3.2.4 quoted-string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE [CFWS]
A quoted-string is treated as a unit. That is, quoted-string is identical to atom, semantically. Since a quoted-string is allowed to contain FWS, folding is permitted. ...
You are correct. I missed that, so the messages created by Outlook are not defective.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark Sapiro wrote:
I haven't looked at the code to see exactly what's responsible here. I do note that adding double quotes around a display name is not only allowed, but is required if the display name contains "specials".
So, it looks like that Outlook still consider the part (XYZ) as comment during folding even though it is within the double quotes. It is folded between DEF and (XYZ) quite often in our cases if there are four or more than four email addresses in the Cc header. Since we normally have full name in the format of "Firstname Lastname (Dept)" in our organization, we hit the issue quite easily if users send to a list with more than four addresses in the Cc header.
participants (2)
-
Alan So
-
Mark Sapiro