UnicodeEncodeError: 'ascii' codec can't encode character
Hi,
I've accidentally sent from an unregisted address to one of my lists. Mailman correctly held the message back for moderation, but when I go to the page for held messages in postorios, I receive a 500 error. I've included an except from the log below. To me, it looks as if it errors on processing my last name (which has a non-ascii character in it).
This is Postorius 1.1.2 on Ubuntu 18.04 LTS from the repos. I've searched on the tracker for the problem and found one ticket[1], but it has something with logging so I suppose this is a different problem.
Ideas? And how do I discard the message without access to the held messages page?
Python version is 3.6.8.
Log excerpt:
Dec 09 13:21:04 2019 (3090) REST request handler error: Traceback (most recent call last): File "/usr/lib/python3.6/wsgiref/handlers.py", line 137, in run self.result = application(self.environ, self.start_response) File "/usr/lib/python3/dist-packages/mailman/database/transaction.py", line 50, in wrapper rtn = function(*args, **kws) File "/usr/lib/python3/dist-packages/mailman/rest/wsgiapp.py", line 214, in __call__ return super().__call__(environ, start_response) File "falcon/api.py", line 215, in falcon.api.API.__call__ (falcon/api.c:2872) File "falcon/api.py", line 189, in falcon.api.API.__call__ (falcon/api.c:2419) File "/usr/lib/python3/dist-packages/mailman/rest/post_moderation.py", line 167, in on_get resource = self._make_collection(request) File "/usr/lib/python3/dist-packages/mailman/rest/helpers.py", line 159, in _make_collection for resource in collection] File "/usr/lib/python3/dist-packages/mailman/rest/helpers.py", line 159, in <listcomp> for resource in collection] File "/usr/lib/python3/dist-packages/mailman/rest/post_moderation.py", line 157, in _resource_as_dict resource = self._make_resource(request.id) File "/usr/lib/python3/dist-packages/mailman/rest/post_moderation.py", line 78, in _make_resource resource['msg'] = msg.as_string() File "/usr/lib/python3.6/email/message.py", line 158, in as_string g.flatten(self, unixfrom=unixfrom) File "/usr/lib/python3.6/email/generator.py", line 116, in flatten self._write(msg) File "/usr/lib/python3.6/email/generator.py", line 181, in _write self._dispatch(msg) File "/usr/lib/python3.6/email/generator.py", line 214, in _dispatch meth(msg) File "/usr/lib/python3.6/email/generator.py", line 243, in _handle_text msg.set_payload(payload, charset) File "/usr/lib/python3.6/email/message.py", line 315, in set_payload payload = payload.encode(charset.output_charset) UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 269: ordinal not in range(128) Dec 09 13:21:04 2019 (3090) 127.0.0.1 - - "GET /3.0/lists/tsc-devel@lists.secretchronicles.org/held?count=10&page=1 HTTP/1.1" 500 59
-- Blog: https://mg.guelker.eu
Marvin Gülker writes:
I've accidentally sent from an unregisted address to one of my lists. Mailman correctly held the message back for moderation, but when I go to the page for held messages in postorios, I receive a 500 error. I've included an except from the log below. To me, it looks as if it errors on processing my last name (which has a non-ascii character in it).
You're the "it's 2019, let's use SMTP UTF8 everywhere" guy, right? If you're right about the umlaut being the trigger, the first thing I'd look at is a feature-negotiation problem in the MTA delivering to Mailman. I'm pretty sure our LMTP does not offer the SMTP UTF8 feature.
Ideas? And how do I discard the message without access to the held messages page?
I would guess this is in the shunt queue. If there's only one file there, you can just delete it.
To get more information about the content of posts, use "mailman qfiles".
UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 269: ordinal not in range(128)
Assuming you're right about your name being the trigger, U+FFFD REPLACEMENT CHARACTER is not in your name. So the non-ASCII character was replaced *on the way in*, and on the way out Mailman assumes ASCII in the header because Mailman doesn't know how to negotiate SMTP UTF8, and SMTP UTF8 specifies UTF8 in a context where there is no MIME way to indicate it to Mailman. So Mailman has no way of knowing it needs to encode to anything but ASCII (as specified by the Ancient Wisdom of RFC 5322).
I'll put SMTP UTF8 support as a feature request in the tracker. I'm not sure how difficult this will be; it presumably involves RFC something (6532 or so?) internationalization support etc. and may not be trivial. So no short term promises, just medium term aspirations (next year's GSoC task list, for example).
Steve
Hi,
Am 10. Dezember 2019 um 17:09 Uhr +0900 schrieb Stephen J. Turnbull:
You're the "it's 2019, let's use SMTP UTF8 everywhere" guy, right?
No, I'm not that guy. I've not said anything like that, nor have I knowingly configured Postfix to use that. I've looked up how to do it now that you mention it, and found the smtputf8_enable parameter1. I have not used it, but it turns out that it is enabled by default. I can turn it off and see if that makes any difference in the future. Testing with telnet, Postfix indeed announced "250 SMTPUTF8" on EHLO.
If you're right about the umlaut being the trigger, the first thing I'd look at is a feature-negotiation problem in the MTA delivering to Mailman. I'm pretty sure our LMTP does not offer the SMTP UTF8 feature.
In that case the problem should go away with disabling SMTPUTF8, I suppose?
I would guess this is in the shunt queue. If there's only one file there, you can just delete it.
There was a whole bunch of files in that directory. I've taken a look several of them, and as they were all spam, I've taken the liberty to delete them all (this list is so low volume that I know all the senders personally anyway). However, even after restarting mailman, the 500 error persists when visiting the held messages page in postorius.
I've also looked into all the other queue directories, they were all empty.
Assuming you're right about your name being the trigger, U+FFFD REPLACEMENT CHARACTER is not in your name.
I confess that I haven't visited the held messages page for a very long time (because I didn't have a reason to; I received no notifications of held messages). Only after I received the notification that my errorneous message was being held, I visited it, and I was faced with the 500 error. If Mailman failed to notify me of held messages previously, they might have been in hold status already and one of them might be the cause.
Marvin
PS: Sorry for the missing link 1 from the last mail. Here it is: https://gitlab.com/mailman/hyperkitty/issues/201
-- Blog: https://mg.guelker.eu
On 12/10/19 5:35 AM, Marvin Gülker wrote:
Am 10. Dezember 2019 um 17:09 Uhr +0900 schrieb Stephen J. Turnbull:
I would guess this is in the shunt queue. If there's only one file there, you can just delete it.
The issue in this case is not a shunted message. it is a message held for moderation. The message itself is in Mailman's var/messages/ directory. It is pointed to by an entry in the database 'message' table.
The var/messages/ directory contains two levels of subdirectories and the message files. The file name is the message-id hash. The first subdirectory is the first two characters of the name and the second subdirectory is the next two characters of the name. E.g. a message with message-id hash = UPZXRAAO6Q7HDDJR7LH5NAYTIX57C32X will be stored as a pickle in the file var/messages/UP/ZX/UPZXRAAO6Q7HDDJR7LH5NAYTIX57C32X.
The underlying issue here is probably <https://bugs.python.org/issue32330>, but that knowledge doesn't help us. We need to get rid of the offending message in the message store. The first step is in mailman shell. Note the paths /opt/mailman/mm/bin/ and /opt/mailman/mm/var/ will need to be adjusted.
/opt/mailman/mm/bin/mailman shell Welcome to the GNU Mailman shell
import os import pickle for root, dirs, files in os.walk('/opt/mailman/mm/var/messages'): ... for fn in files: ... print(fn) ... with open(os.path.join(root, fn), 'rb') as fp: ... msg = pickle.load(fp) ... print(msg['Message-ID-Hash']) ...
This should print the file name and Message-ID-Hash (two lines the same) for each message in the messages directory until it prints only one line followed by the UnicodeEncodeError exception. That list line is the file name/Message-ID-Hash of the offending message.
Then you need to run the query
DELETE FROM message WHERE message_id_hash = xxx;
against the mailman database where xxx is the hash value you found above.
I confess that I haven't visited the held messages page for a very long time (because I didn't have a reason to; I received no notifications of held messages). Only after I received the notification that my errorneous message was being held, I visited it, and I was faced with the 500 error. If Mailman failed to notify me of held messages previously, they might have been in hold status already and one of them might be the cause.
As of Mailman core 3.3.0 there is a 'mailman notify' command designed to be run periodically by cron to notify owners/moderators of prnding requests.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 12/10/19 10:25 AM, Mark Sapiro wrote:
/opt/mailman/mm/bin/mailman shell
Welcome to the GNU Mailman shell
>>> import os
>>> import pickle
>>> for root, dirs, files in os.walk('/opt/mailman/mm/var/messages'):
... for fn in files:
... print(fn)
... with open(os.path.join(root, fn), 'rb') as fp:
... msg = pickle.load(fp)
... print(msg['Message-ID-Hash'])
...
The above is not visible in the HyperKitty archive in my prior reply in this thread. Trying again with markdown block quote to see if that works.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi,
Am 10. Dezember 2019 um 10:25 Uhr -0800 schrieb Mark Sapiro:
The underlying issue here is probably <https://bugs.python.org/issue32330>, but that knowledge doesn't help us. We need to get rid of the offending message in the message store. The first step is in mailman shell. Note the paths /opt/mailman/mm/bin/ and /opt/mailman/mm/var/ will need to be adjusted.
Thanks for this information. Ubuntu's mailman stores the messages below /var/lib/mailman3/messages. The "mailman" command is in the $PATH.
This should print the file name and Message-ID-Hash (two lines the same) for each message in the messages directory until it prints only one line followed by the UnicodeEncodeError exception. That list line is the file name/Message-ID-Hash of the offending message.
I've execucted that code, but it does not cause an exception at all. It quietly runs through. I've attached the transcript of the shell session to this mail. With a little crude use of grep(1), however I found my held message in WQ/ZS/WQZSFFJLVMWQHMG4VYKGXJEKQVQXVK67.
Does the messages directory contain only held messages? I've looked at some messages and all of these I looked at were spam. The directory structure contains 40 messages in total, accumulating to 480 KiB disk space.
If required, I can zip that directory up and e-mail it to you for inspection.
DELETE FROM message WHERE message_id_hash = xxx;
As the exception you expected did not occur, I didn't do this yet. I can make a database and directory backup and then run it, though. Should I try?
I assume after this I should also delete the respective files from the messages/ directory?
As of Mailman core 3.3.0 there is a 'mailman notify' command designed to be run periodically by cron to notify owners/moderators of prnding requests.
I did receive an e-mail to the list owner address that told me that the e-mail from my incorrect e-mail address "requires approval". That was the first time I received such an e-mail. In parallel, on the incorrect e-mail address, I received the notification for moderator approval.
Marvin
-- Blog: https://mg.guelker.eu
On 12/11/19 12:43 AM, Marvin Gülker wrote:
Thanks for this information. Ubuntu's mailman stores the messages below /var/lib/mailman3/messages. The "mailman" command is in the $PATH.
This should print the file name and Message-ID-Hash (two lines the same) for each message in the messages directory until it prints only one line followed by the UnicodeEncodeError exception. That list line is the file name/Message-ID-Hash of the offending message.
I've execucted that code, but it does not cause an exception at all. It quietly runs through. I've attached the transcript of the shell session to this mail. With a little crude use of grep(1), however I found my held message in WQ/ZS/WQZSFFJLVMWQHMG4VYKGXJEKQVQXVK67.
Here's a revised script:
$ mailman shell
Welcome to the GNU Mailman shell
>>> import os
>>> import pickle
>>> for root, dirs, files in os.walk('/var/lib/mailman3/messages'):
... for fn in files:
... print(fn)
... with open(os.path.join(root, fn), 'rb') as fp:
... msg = pickle.load(fp)
... print(msg['Message-ID'])
... x = msg.as_string()
...
I have adjusted the paths for your package, changed the second print to
print the message-is and added the
x = msg.as_string()
line. Now we expect to see the hash of each message-id followed by the
message-id followed by the exception for the offending message.
Does the messages directory contain only held messages? I've looked at some messages and all of these I looked at were spam. The directory structure contains 40 messages in total, accumulating to 480 KiB disk space.
Yes, the directory and the corresponding database table contains only messages held for moderator approval.
DELETE FROM message WHERE message_id_hash = xxx;
As the exception you expected did not occur, I didn't do this yet. I can make a database and directory backup and then run it, though. Should I try?
I assume after this I should also delete the respective files from the messages/ directory?
Actually, a better way to deal with this is in mailman shell
Welcome to the GNU Mailman shell
>>> ms = getUtility(IMessageStore)
>>> ms.delete_message('xxx')
>>>
where xxx is the message-id value printed from the above script.
As of Mailman core 3.3.0 there is a 'mailman notify' command designed to be run periodically by cron to notify owners/moderators of pending requests.
I did receive an e-mail to the list owner address that told me that the e-mail from my incorrect e-mail address "requires approval". That was the first time I received such an e-mail. In parallel, on the incorrect e-mail address, I received the notification for moderator approval.
Yes, those are the only messages unless you are running mailman notify
. It seems however that the latest Debian (hence Ubuntu) package
doesn't have 3.3 yet.
As to why you didn't receive owner notices about the held spam, perhaps they were spam filtered or somehow otherwise lost.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi,
Am 11. Dezember 2019 um 09:55 Uhr -0800 schrieb Mark Sapiro:
Here's a revised script: [...] Now we expect to see the hash of each message-id followed by the message-id followed by the exception for the offending message.
Success! This time it crashes with the expected exception. I've attached the shell transcript.
As expected, it's a spam message. I've uploaded the offending file here for you to look at: <https://ftp.secretchronicles.org/misc/Q5PQZI4QEE4C42XNZMXESSKDCSZBUT4T> This way I don't risk getting caught by spam filters. I'm going to put that down quickly again, though, I don't want to distribute spam messages.
Now that the problem is identified, I'm going to purge it from both database and file system. Thank you for your help. If you need any further information for debugging the problem, please tell me.
As to why you didn't receive owner notices about the held spam, perhaps they were spam filtered or somehow otherwise lost.
Maybe. I'll ensure I update quickly when the new Ubuntu LTS comes out to get that functionality. Installing Mailman not via the repos is not really an option for me; this list is for a hobby project and should run with as low maintenance effort as possible (i.e. automatic updates).
Thanks a lot. Marvin
-- Blog: https://mg.guelker.eu
On 12/11/19 11:40 AM, Marvin Gülker wrote:
Success! This time it crashes with the expected exception. I've attached the shell transcript.
Good.
As expected, it's a spam message. I've uploaded the offending file here for you to look at: <https://ftp.secretchronicles.org/misc/Q5PQZI4QEE4C42XNZMXESSKDCSZBUT4T>
I get a Forbidden status on that URL.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Am 11. Dezember 2019 um 11:48 Uhr -0800 schrieb Mark Sapiro:
I get a Forbidden status on that URL.
Oops. cp copied the file permissions and I didn't check. Should work now.
-- Blog: https://mg.guelker.eu
On 12/11/19 12:53 PM, Marvin Gülker wrote:
Am 11. Dezember 2019 um 11:48 Uhr -0800 schrieb Mark Sapiro:
I get a Forbidden status on that URL.
Oops. cp copied the file permissions and I didn't check. Should work now.
OK. I have verified that this is due to <https://bugs.python.org/issue32330> and has actually been reported as <https://gitlab.com/mailman/mailman/issues/441> and worked around in Mailman by <https://gitlab.com/mailman/mailman/merge_requests/350> which is included in Mailman 3.2.0 and up.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi,
Am 11. Dezember 2019 um 13:49 Uhr -0800 schrieb Mark Sapiro:
OK. I have verified that this is due to <https://bugs.python.org/issue32330> and has actually been reported as <https://gitlab.com/mailman/mailman/issues/441> and worked around in Mailman by <https://gitlab.com/mailman/mailman/merge_requests/350> which is included in Mailman 3.2.0 and up.
Thanks, and sorry for bugging you about old versions then. I'm going to forward the problem to Ubuntu's bugtracker then.
But I now have a different problem. I've removed all offending messages (it were more than just one that caused the UnicodeEncodeError), but I still can't access the page for the held messages queue. It gives an error 500, now with a different exception in the log. I've included it below. It looks to me that the messages haven't properly been deleted.
What I did was this: for each offending message hash
- I ran the DELETE query
- I deleted the file from the file system
- I ran the delete_message() function
I originally wanted to run only the delete_message() function, but a quick check on the DB and the file system after running it for one hash told me that it neither deletes the message from the DB nor from the file system. So I did everything at once. I do have a backup of both the DB and the filesystem from before I started this, in case this broke something.
Is there something left that I have to do now?
Here's the exception:
Dec 12 08:34:30 2019 (9570) REST request handler error: Traceback (most recent call last): File "/usr/lib/python3.6/wsgiref/handlers.py", line 137, in run self.result = application(self.environ, self.start_response) File "/usr/lib/python3/dist-packages/mailman/database/transaction.py", line 50, in wrapper rtn = function(*args, **kws) File "/usr/lib/python3/dist-packages/mailman/rest/wsgiapp.py", line 214, in __call__ return super().__call__(environ, start_response) File "falcon/api.py", line 215, in falcon.api.API.__call__ (falcon/api.c:2872) File "falcon/api.py", line 189, in falcon.api.API.__call__ (falcon/api.c:2419) File "/usr/lib/python3/dist-packages/mailman/rest/post_moderation.py", line 167, in on_get resource = self._make_collection(request) File "/usr/lib/python3/dist-packages/mailman/rest/helpers.py", line 159, in _make_collection for resource in collection] File "/usr/lib/python3/dist-packages/mailman/rest/helpers.py", line 159, in <listcomp> for resource in collection] File "/usr/lib/python3/dist-packages/mailman/rest/post_moderation.py", line 157, in _resource_as_dict resource = self._make_resource(request.id) File "/usr/lib/python3/dist-packages/mailman/rest/post_moderation.py", line 78, in _make_resource resource['msg'] = msg.as_string() AttributeError: 'NoneType' object has no attribute 'as_string'
-- Blog: https://mg.guelker.eu
On 12/12/19 12:42 AM, Marvin Gülker wrote:
But I now have a different problem. I've removed all offending messages (it were more than just one that caused the UnicodeEncodeError), but I still can't access the page for the held messages queue. It gives an error 500, now with a different exception in the log. I've included it below. It looks to me that the messages haven't properly been deleted.
They have, but I apologize - more is required.
What I did was this: for each offending message hash
- I ran the DELETE query
- I deleted the file from the file system
- I ran the delete_message() function
I originally wanted to run only the delete_message() function, but a quick check on the DB and the file system after running it for one hash told me that it neither deletes the message from the DB nor from the file system.
The argument for delete_message() is not the hash, it is the <message-id> including the angle brackets.
So I did everything at once. I do have a backup of both the
DB and the filesystem from before I started this, in case this broke something.
Is there something left that I have to do now?
Yes, my bad for not realizing this.
There are still references to these messages in the _request table. The columns in that table are
id | key | request_type | data_hash | mailing_list_id
key is the <message-id> of the request message request_type is 1 for a held message mailing_list_id is a numeric index into the mailinglist table
The issue is there are still held message requests there for the messages you removed. You need to delete those entries from the _request table.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi,
Am 13. Dezember 2019 um 17:10 Uhr -0800 schrieb Mark Sapiro:
key is the <message-id> of the request message request_type is 1 for a held message mailing_list_id is a numeric index into the mailinglist table
The issue is there are still held message requests there for the messages you removed. You need to delete those entries from the _request table.
Thank you again for your valuable help. I've now taken the easy route and ran
DELETE FROM _requests WHERE request_type=1;
, i.e. I cleared the entire held messages queue. As it turned out, there were no requests of any other type than type 1. I've deleted all remaining files below /var/lib/mailman3/messages as well (after seeing it was only spam anyway, this time non-broken spam, but still spam).
Now, finally, I can go to the held messages queue in Postorius again without a 500 error. The list of held messages there is now empty, but this is obviously expected. Let's hope that disabling SMTPUTF8 prevents the problem from further occuring. In any case, I've asked a friend report the problem to the Ubuntu tracker: https://bugs.launchpad.net/ubuntu/+source/mailman3/+bug/1856181
I'm back on track now. Thanks everyone!
Marvin
-- Blog: https://mg.guelker.eu
Marvin Gülker writes:
Am 10. Dezember 2019 um 17:09 Uhr +0900 schrieb Stephen J. Turnbull:
You're the "it's 2019, let's use SMTP UTF8 everywhere" guy, right?
No, I'm not that guy.
I'm sorry about that. I knew I didn't know for sure, but not being able to get at the moderation queue seemed urgent enough to be worth the chance of being wrong.
I have not used it, but it turns out that it is enabled by default. I can turn it off and see if that makes any difference in the future. Testing with telnet, Postfix indeed announced "250 SMTPUTF8" on EHLO.
If you're right about the umlaut being the trigger, the first thing I'd look at is a feature-negotiation problem in the MTA delivering to Mailman. I'm pretty sure our LMTP does not offer the SMTP UTF8 feature.
In that case the problem should go away with disabling SMTPUTF8, I suppose?
Well, since it's predicated on a bug in Postfix, I wouldn't bet on it, especially since Postfix probably dominates Mailman 3 installations. But it's cheap to test, and since you don't explicitly want SMTP UTF8 at this point, little harm in turning it off. *If* there is a bug, that will prevent the problem in the future, but it wouldn't fix the problem in the held messages queue.
I would guess this is in the shunt queue. If there's only one file there, you can just delete it.
This was a bad guess. I'm not sure why it isn't in the shunt queue (it shouldn't have been able to escape into the main rule chains with that bad breakage in the header), but Mailman doesn't try to moderate shunted messages, that's the whole point of the shunt queue -- they're out of the way.
There was a whole bunch of files in that directory. I've taken a look several of them, and as they were all spam, I've taken the liberty to delete them all (this list is so low volume that I know all the senders personally anyway).
A tiny bit of good came of it, at least.
However, even after restarting mailman, the 500 error persists when visiting the held messages page in postorius.
I've also looked into all the other queue directories, they were all empty.
I guess it's in the MessageStore, then, which is by default in /var/tmp/mailman/messages/. (Defined in schema.cfg.) I guess it's a standard qfile that you can examine with "mailman qfiles /path/to/file". (I've never actually looked at a "raw" held message file in Mailman 3, haven't had a failure of this type.) If not, you can unpickle it with Python (if that makes no sense to you, ask; I'm running out of steam and need sleep).
Assuming you're right about your name being the trigger, U+FFFD REPLACEMENT CHARACTER is not in your name.
I'm starting to wonder about this. It's still possible, but if you don't deliberately use SMTP UTF8 then your mail client probably doesn't either, and equally possibly it's a different message causing the problem (something spammy).
Am 11. Dezember 2019 um 03:26 Uhr +0900 schrieb Stephen J. Turnbull:
Well, since it's predicated on a bug in Postfix, I wouldn't bet on it, especially since Postfix probably dominates Mailman 3 installations. But it's cheap to test, and since you don't explicitly want SMTP UTF8 at this point, little harm in turning it off. *If* there is a bug, that will prevent the problem in the future, but it wouldn't fix the problem in the held messages queue.
I've now set "smtputf8_enable" to "no". Let's hope this prevents further problems like this.
For the rest, I've followed up with Mark in the other post. The "var_dir" is set to "/var/lib/mailman3" for me.
I'm starting to wonder about this. It's still possible, but if you don't deliberately use SMTP UTF8 then your mail client probably doesn't either, and equally possibly it's a different message causing the problem (something spammy).
Given the amount of spam messages I found in /var/lib/mailman3, this is not at all unlikely.
Marvin
-- Blog: https://mg.guelker.eu
Teilnehmer (3)
-
Mark Sapiro
-
Marvin Gülker
-
Stephen J. Turnbull