[MM3-users] Re: import21: A string literal cannot contain NUL (0x00) characters.

Feb. 13, 2018 · *[1], *printf, etc stdlib functions and either (A) dive down to mem*


      Henrik Rasmussen writes:
...
It seems that this issue is still unsolved. Is there any estimated
date for a fix? https://gitlab.com/mailman/hyperkitty/issues/155 .
Problem-oriented executive summary:
Why do you think this is a problem that Mailman should solve?
What is your desired treatment of this data?
tl;dr for Unicode wonks and wonkabees:
My personal take is similar to Simon's.
Unicode NULs are rather dangerous, because any time you convert to an
ASCII-compatible encoding and store it where a C routine can get at
it, you risk data corruption when a C string function decides that the
NUL is end-of-string, and ignores the tail of the string.  To deal
with NULs embedded in an array of C chars, you have to forego the
whole suite of str*[1], *printf, etc stdlib functions and either (A)
dive down to mem* functions, keeping accurate char[] lengths on the
side, or (B) create a complete suite of array-handling functions that
deal with this for you.
I agree with Simon's guess that you probably have corrupted data or
metadata that you're passing to HyperKitty.  (By "corrupted metadata"
I mean that you've got binary data labelled with "text", or that you
have text encoded in something like UTF-16 that normally includes 0x00
*bytes* as a component of non-NUL *characters*, but labelled with an
ASCII- compatible encoding such as ISO 8859/1.  The latter is a common
hack to insert binary data, or "any-encoded" text, into text
databases.)  This is based on the long tradition of avoiding NULs in
string data based on both their standard interpretation as ASCII
(memory padding which is to be *ignored*[2]) and the danger that they
pose to C programs that use the C stdlib to handle "character" data.
It's true that both Mailman 3 suite and Django are written in Python,
which has implemented NUL-handling strategy (B) for us.  However,
Mailman suite eventually calls into a few C (or C++) libraries,
especially the database backends, as in this issue.  We can't do
anything about those backends, of course.  Filtering the incoming data
is something we could do, I guess, but as Simon points out that would
be pretty expensive, and filtering erroneous NULs silently would
likely result in inserting corrupt data in the databases.
If you're really sure that you have valid ASCII (or Unicode) NULs
embedded in your text data, you could try configuring an alternative
backend (both for Mailman core and for Django) that handles NULs as
valid characters.  I don't know if there are any, let alone whether we
support any, though.
Footnotes:
[1]  Including the strn* functions!  They protect against buffer
*overruns*, but NUL is still end-of-string.
[2]  Here's what ECMA-48 "Control Functions for Coded Character Sets",
which I believe to be identical to ISO 6429 except that ECMA standards
are free to read and ISO charges about $100 for this, says about NUL:
8.3.88 NUL - NULL
Notation: (C0)
Representation: 00/00
NUL is used for media-fill or time-fill. NUL characters may be
inserted into, or removed from, a data stream without affecting
the information content of that stream, but such action may affect
the information layout and/or the control of equipment.
Unicode doesn't say how to interpret NUL at all, but does recommend
ISO 6429 as one good way to interpret control characters not otherwise
dealt with in Unicode (and it's the only way that Unicode mentions).

[MM3-users] Re: import21: A string literal cannot contain NUL (0x00) characters.

Stephen J. Turnbull