Stephen J Turnbull wrote:
On 5/6/23 11:27, Dave Hall via Mailman-users wrote:
django.db.utils.OperationalError: (1366, "Incorrect string value: '\xE1\x90\xA7\x0A\x0AO...' for column mailman3web.hyperkitty_email.content at row 1") Is there a known cause for this? Almost always nonconforming user clients. You write "frequently", but I'm guessing it's maybe 1% or 2% of all the messages, right? I suspect it's a
Is it fixed in some release more recent than 0+20180916? This is almost surely not something we can fix. Based on the error, it's in third
I'm not sure what's going on here. My first thought is the database is MySQL or MariaDB and this is a 4-byte UTF-8 encoding and the database column charset definition is utf8 and not utf8mb4, but '\xE1\x90\xA7\x0A\x0A' is the 3-byte UTF-8 encoding for CANADIAN SYLLABICS FINAL MIDDLE DOT ('\xE1\x90\xA7') followed by two newlines. so I'm not sure what the issue is. I'm guessing home-made client, possibly Asian or a spammer, which
On 2023-05-07 07:39 Mark Sapiro writes: particular author or institution using such a client. party code (Django) that we import. It's possible we could provide a more explanatory message, but the only thing we can sensibly do is hold the message for an admin to edit it by hand. As long as the text in question and the Content-Type header aren't too sensitive, we can help with identifying charset (often UTF-8 as Mark suggests but why someone would be encoding FINAL MIDDLE DOT is a mystery to me). provides no content-type header. Even today I see raw 8-bit encoded text without an encoding spec from legitimate Japanese and Chinese sources, and of course with spammers all bets are off -- they might even do it deliberately to crash filters.
I have located an email with a similar error in a V2.1 .mbox file. The post was by a faculty member in my organization using Thunderbird 60.2.1 on a Mac - OSX 10.13. The list was used to correspond with students taking a programming course.
The message body is multi-part MIME with two sections:
This is a multi-part message in MIME format. --------------4BAB6CD38F4927471D366565 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit
and
--------------4BAB6CD38F4927471D366565 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit
The hexadecimal sting \\xEF\\xBB\\xBF\\xEF\\xBB\\xBF appears one place in each section.
I'm not sure exactly what this means. I did check my MariaDB (as Mark hinted) and found that it set for utf8 rather than utf8mb4. I'd be willing to change this, but would I also have to do something to update the Mailman databases since those tables are already created?
Thanks.
-Dave