Issues with international characters (replaced by ?) when mail is send as text/html
Hi,
I have noticed that any mail send with text/html, e.g. with following headers:
Content-Language: en-US-large Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
is slightly transformed by mailman3 - the body is put into its own part, but all the international letters are replaced by question marks.
Having searched the list I tried:
- adding following to mailman.cfg:
[language.master] description: English (USA) charset: utf-8 enabled: yes
[language.en] description: English (USA) utf8 charset: utf-8 enabled: yes
- changing preferred language of a mailing list to something else via shell, e.g.:
m.preferred_language = 'pl'
changed default_language to 'pl' in mailman.cfg
Ensured all relevant glibc locales are generated (not sure if that matters though in this case)
None of this seems to have any positive result - all mails, whether under 'pl' or 'en' setting still get all non-ascii chars replaced with question marks (in html case, probably(?) in any case where it becomes multipart).
Any ideas what to do so html mails are passed without filtering out their characters ? Do I have to recreate all mailing lists under new [language.en/master] settings ?
On 1/11/22 7:08 AM, Michal Soltys wrote:
Hi,
I have noticed that any mail send with text/html, e.g. with following headers:
Content-Language: en-US-large Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
is slightly transformed by mailman3 - the body is put into its own part, but all the international letters are replaced by question marks.
The body being put in its own part is because of message decoration. In order to add the list's message header and/or footer the message is recast as multipart/mixed and the header and/or footer are added as separate MIME parts. See https://wiki.list.org/x/4030707 for an explanation of why this is done for non-plain text messages.
Having searched the list I tried:
- adding following to mailman.cfg:
[language.master] description: English (USA) charset: utf-8 enabled: yes
[language.en] description: English (USA) utf8 charset: utf-8 enabled: yes
I would have expected that to work.
- changing preferred language of a mailing list to something else via shell, e.g.:
m.preferred_language = 'pl'
That should be m.preferred_language = getUtility(ILanguageManager).get('pl')
- changed default_language to 'pl' in mailman.cfg
I would have expected any of those things to work, but I can see that they don't.
I'm looking into why.
- Ensured all relevant glibc locales are generated (not sure if that matters though in this case)
None of this seems to have any positive result - all mails, whether under 'pl' or 'en' setting still get all non-ascii chars replaced with question marks (in html case, probably(?) in any case where it becomes multipart).
Any ideas what to do so html mails are passed without filtering out their characters ? Do I have to recreate all mailing lists under new [language.en/master] settings ?
No, this is a bug. It has to do with message decoration (adding of headers and footers). You can probably avoid this issue by setting the list's filter_content and convert_html_to_plaintext to True. I am looking at why this occurs and fixing it.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 1/11/22 2:08 PM, Mark Sapiro wrote:
No, this is a bug. It has to do with message decoration (adding of headers and footers). You can probably avoid this issue by setting the list's filter_content and convert_html_to_plaintext to True. I am looking at why this occurs and fixing it.
See https://gitlab.com/mailman/mailman/-/issues/967 which I just filed.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 22/01/12 00:59, Mark Sapiro wrote:
On 1/11/22 2:08 PM, Mark Sapiro wrote:
No, this is a bug. It has to do with message decoration (adding of headers and footers). You can probably avoid this issue by setting the list's filter_content and convert_html_to_plaintext to True. I am looking at why this occurs and fixing it.
See https://gitlab.com/mailman/mailman/-/issues/967 which I just filed.
Thanks for looking into it, meanwhile I'll try suggested workarounds.
https://gitlab.com/mailman/mailman/-/issues/967 is now fixed. This is the patch to fix it. See https://gitlab.com/mailman/mailman/-/merge_requests/946 ``` diff --git a/src/mailman/handlers/decorate.py b/src/mailman/handlers/decorate.py index ec0ba8ed385e529b2b4c959e41d7d8985cc783e5..53ea861de13c76d1571c3664ec8358bf9c97b4e0 100644 --- a/src/mailman/handlers/decorate.py +++ b/src/mailman/handlers/decorate.py @@ -18,13 +18,13 @@ """Decorate a message by sticking the header and footer around it.""" import re +import copy import logging from email.mime.text import MIMEText from email.utils import formataddr from mailman.archiving.mailarchive import MailArchive from mailman.core.i18n import _ -from mailman.email.message import Message from mailman.interfaces.handler import IHandler from mailman.interfaces.mailinglist import IListArchiverSet from mailman.interfaces.template import ITemplateLoader @@ -176,26 +176,12 @@ def process(mlist, msg, msgdata): # Because of the way Message objects are passed around to process(), we # need to play tricks with the outer message -- i.e. the outer one must # remain the same instance. So we're going to create a clone of the outer - # message, with all the header chrome intact, then copy the payload to it. - # This will give us a clone of the original message, and it will form the - # basis of the interior, wrapped Message. - inner = Message() - # Which headers to copy? Let's just do the Content-* headers - for h, v in msg.items(): - if h.lower().startswith('content-'): - inner[h] = v - inner.set_payload(msg.get_payload()) - # For completeness - inner.set_unixfrom(msg.get_unixfrom()) - inner.preamble = msg.preamble - inner.epilogue = msg.epilogue - # Don't copy get_charset, as this might be None, even if - # get_content_charset isn't. However, do make sure there is a default - # content-type, even if the original message was not MIME. - inner.set_default_type(msg.get_default_type()) - # BAW: HACK ALERT. - if hasattr(msg, '__version__'): - inner.__version__ = msg.__version__ + # message, with all the header chrome intact, then delete unwanted headers. + inner = copy.deepcopy(msg) + # Which headers to keep? Let's just do the Content-* headers + for h, v in inner.items(): + if not h.lower().startswith('content-'): + del inner[h] # Now, play games with the outer message to make it contain three # subparts: the header (if any), the wrapped message, and the footer (if # any). ``` -- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 1/12/22 17:56, Mark Sapiro wrote:
https://gitlab.com/mailman/mailman/-/issues/967 is now fixed. This is the patch to fix it. See https://gitlab.com/mailman/mailman/-/merge_requests/946
Checked, works fine. Thanks again.
participants (2)
-
Mark Sapiro
-
Michal Soltys