On 2023-05-07 07:39 Mark Sapiro writes:
On 5/6/23 11:27, Dave Hall via Mailman-users wrote:
django.db.utils.OperationalError: (1366, "Incorrect string value: '\\xE1\\x90\\xA7\\x0A\\x0AO...' for column
mailman3web
.hyperkitty_email
.content
at row 1")Is there a known cause for this?
Almost always nonconforming user clients. You write "frequently", but I'm guessing it's maybe 1% or 2% of all the messages, right? I suspect it's a particular author or institution using such a client.
Is it fixed in some release more recent than 0+20180916?
This is almost surely not something we can fix. Based on the error, it's in third party code (Django) that we import. It's possible we could provide a more explanatory message, but the only thing we can sensibly do is hold the message for an admin to edit it by hand. As long as the text in question and the Content-Type header aren't too sensitive, we can help with identifying charset (often UTF-8 as Mark suggests but why someone would be encoding FINAL MIDDLE DOT is a mystery to me).
I'm not sure what's going on here. My first thought is the database is MySQL or MariaDB and this is a 4-byte UTF-8 encoding and the database column charset definition is utf8 and not utf8mb4, but '\\xE1\\x90\\xA7\\x0A\\x0A' is the 3-byte UTF-8 encoding for CANADIAN SYLLABICS FINAL MIDDLE DOT ('\\xE1\\x90\\xA7') followed by two newlines. so I'm not sure what the issue is.
I'm guessing home-made client, possibly Asian or a spammer, which provides no content-type header. Even today I see raw 8-bit encoded text without an encoding spec from legitimate Japanese and Chinese sources, and of course with spammers all bets are off -- they might even do it deliberately to crash filters.
-- University of Tsukuba Faculty of Policy and Planning Sciences Tennodai 1-1-1, Tsukuba 305-8573 JAPAN tel/fax: +81-29-853-5091 turnbull@sk.tsukuba.ac.jp https://turnbull.sk.tsukuba.ac.jp/