import21: A string literal cannot contain NUL (0x00) characters.
Docker-mailman 0.1.1.
I am facing errors when using import21 to import list archives. The error message is "A string literal cannot contain NUL (0x00) characters." and the import stops, thus importing only mails up to 2009. The dublicates is mails already imported when retrying. If a mail contains illegal characters I would expect disregarding the particular mail and continue, or ignoring the nul character.
bash-4.3# python manage.py hyperkitty_import --verbosity 3 --since "01.01.1970" --list-address example@mailman3.ku.dk /opt/mailman-web-data/tmp/example@mailman3.ku.dk.mbox
Duplicate email with message-id '857A4D84DE6D1D41837DD9284F2729AB08F0C385@srv1.example.ku.dk' <857A4D84DE6D1D41837DD9284F2729AB09DA5E45@srv1.example.ku.dk> (149) Duplicate email with message-id '857A4D84DE6D1D41837DD9284F2729AB09DA5E45@srv1.example.ku.dk' <857A4D84DE6D1D41837DD9284F2729AB09E20126@srv1.example.ku.dk> (150) Failed adding message <857A4D84DE6D1D41837DD9284F2729AB09E20126@srv1.example.ku.dk>: A string literal cannot contain NUL (0x00) characters. Traceback (most recent call last): File "manage.py", line 10, in <module> execute_from_command_line(sys.argv) File "/usr/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line utility.execute() File "/usr/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/local/lib/python2.7/site-packages/django/core/management/base.py", line 305, in run_from_argv self.execute(*args, **cmd_options) File "/usr/local/lib/python2.7/site-packages/django/core/management/base.py", line 356, in execute output = self.handle(*args, **options) File "/usr/local/lib/python2.7/site-packages/hyperkitty/management/commands/hyperkitty_import.py", line 278, in handle importer.from_mbox(mbfile) File "/usr/local/lib/python2.7/site-packages/hyperkitty/management/commands/hyperkitty_import.py", line 152, in from_mbox add_to_list(self.list_address, message) File "/usr/local/lib/python2.7/site-packages/hyperkitty/lib/incoming.py", line 149, in add_to_list email.save() File "/usr/local/lib/python2.7/site-packages/django/db/models/base.py", line 796, in save force_update=force_update, update_fields=update_fields) File "/usr/local/lib/python2.7/site-packages/django/db/models/base.py", line 824, in save_base updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) File "/usr/local/lib/python2.7/site-packages/django/db/models/base.py", line 908, in _save_table result = self._do_insert(cls._base_manager, using, fields, update_pk, raw) File "/usr/local/lib/python2.7/site-packages/django/db/models/base.py", line 947, in _do_insert using=using, raw=raw) File "/usr/local/lib/python2.7/site-packages/django/db/models/manager.py", line 85, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 1043, in _insert return query.get_compiler(using=using).execute_sql(return_id) File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 1054, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python2.7/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) ValueError: A string literal cannot contain NUL (0x00) characters. bash-4.3#
Hi Henrik,
On Sep 20, 2017, at 08:58, Henrik Rasmussen <her@adm.ku.dk> wrote:
I am facing errors when using import21 to import list archives. The error message is "A string literal cannot contain NUL (0x00) characters." and the import stops, thus importing only mails up to 2009. The dublicates is mails already imported when retrying. If a mail contains illegal characters I would expect disregarding the particular mail and continue, or ignoring the nul character.
bash-4.3# python manage.py hyperkitty_import --verbosity 3 --since "01.01.1970" --list-address example@mailman3.ku.dk /opt/mailman-web-data/tmp/example@mailman3.ku.dk.mbox
This is actually a bug in HyperKitty, not import21 (which is part of the Core). I don’t see this issue previously reported. Can you submit a new issue here:
https://gitlab.com/mailman/hyperkitty/issues
Thanks, -Barry
It seems that this issue is still unsolved. Is there any estimated date for a fix? https://gitlab.com/mailman/hyperkitty/issues/155 .
Thanks.
Henrik
On 02/12/2018 09:02 AM, Henrik Rasmussen wrote:
It seems that this issue is still unsolved. Is there any estimated date for a fix? https://gitlab.com/mailman/hyperkitty/issues/155 .
Thanks.
Henrik I commented on the issue, here is my comment for the lazy:
That's actually an issue with the Postgresql backend. They don't allow storing null characters in strings.
More information can be found in this bug report https://code.djangoproject.com/ticket/28201 and this discussion https://groups.google.com/forum/#!topic/django-developers/D1gvXYCezEc/discus...
TLDR: Saving such values is currently not supported, it's highly unlikely you want to store the null character in the first place, so you should clean your inputs (in this case the file) to strip them out.
I'm not sure Hyperkitty should actually do anything about that. Checking all inputs to strip null characters doesn't sound like a practical idea. If you import the values in any other way (loaddata for instace) you'd still run into this issue and there's nothing we can do about that.
So I guess the sane thing to do, is to say that your input file has errors that you should fix. But then again, I don't really have anything to say. It's up to @maxking and @abompard to decide that.
I'll better du the same then, copying my answer from the issue :-)
Thanks, but there doesn't seem to be any null character in the input file. I have tried several things:
First I tried tr < file-with-nulls -d '\000' > /opt/mailman-web-data/tmp/${LIST}.mbox
and a docker exec -it mailman-web python manage.py hyperkitty_import --verbosity 3 --since "01.01.1970" --list-address ${LIST} /opt/mailman-web-data/tmp/${LIST}.mbox
(with $LIST being the name of the list).
Then I tried to vim -u NONE -U NONE -c 'set hls dy=uhex' input file
, searching for NULL-characters (ctrl-v 000
<enter).
I have tried sed -n '/\x0/p' input file
and grep -Pa '\x00' input file
and other commands that should show me any occurrences of null characters.
None of them indicates any null characters in the file that hyperkitty_import complains about.
On 02/13/2018 03:59 AM, Henrik Rasmussen wrote:
I'll better du the same then, copying my answer from the issue :-) ...
See my comment at <https://gitlab.com/mailman/hyperkitty/issues/155#note_58898247>.
It would be interesting to see the actual content of the mbox file for that message, the Message-ID of which is in the original error message.
There may be an issue in Hyperkitty that needs to be addressed, but we need to see the message for that.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Henrik Rasmussen writes:
It seems that this issue is still unsolved. Is there any estimated date for a fix? https://gitlab.com/mailman/hyperkitty/issues/155 .
Problem-oriented executive summary:
Why do you think this is a problem that Mailman should solve? What is your desired treatment of this data?
tl;dr for Unicode wonks and wonkabees:
My personal take is similar to Simon's.
Unicode NULs are rather dangerous, because any time you convert to an ASCII-compatible encoding and store it where a C routine can get at it, you risk data corruption when a C string function decides that the NUL is end-of-string, and ignores the tail of the string. To deal with NULs embedded in an array of C chars, you have to forego the whole suite of str*[1], *printf, etc stdlib functions and either (A) dive down to mem* functions, keeping accurate char[] lengths on the side, or (B) create a complete suite of array-handling functions that deal with this for you.
I agree with Simon's guess that you probably have corrupted data or metadata that you're passing to HyperKitty. (By "corrupted metadata" I mean that you've got binary data labelled with "text", or that you have text encoded in something like UTF-16 that normally includes 0x00 *bytes* as a component of non-NUL *characters*, but labelled with an ASCII- compatible encoding such as ISO 8859/1. The latter is a common hack to insert binary data, or "any-encoded" text, into text databases.) This is based on the long tradition of avoiding NULs in string data based on both their standard interpretation as ASCII (memory padding which is to be *ignored*[2]) and the danger that they pose to C programs that use the C stdlib to handle "character" data.
It's true that both Mailman 3 suite and Django are written in Python, which has implemented NUL-handling strategy (B) for us. However, Mailman suite eventually calls into a few C (or C++) libraries, especially the database backends, as in this issue. We can't do anything about those backends, of course. Filtering the incoming data is something we could do, I guess, but as Simon points out that would be pretty expensive, and filtering erroneous NULs silently would likely result in inserting corrupt data in the databases.
If you're really sure that you have valid ASCII (or Unicode) NULs embedded in your text data, you could try configuring an alternative backend (both for Mailman core and for Django) that handles NULs as valid characters. I don't know if there are any, let alone whether we support any, though.
Footnotes: [1] Including the strn* functions! They protect against buffer *overruns*, but NUL is still end-of-string.
[2] Here's what ECMA-48 "Control Functions for Coded Character Sets", which I believe to be identical to ISO 6429 except that ECMA standards are free to read and ISO charges about $100 for this, says about NUL:
8.3.88 NUL - NULL
Notation: (C0)
Representation: 00/00
NUL is used for media-fill or time-fill. NUL characters may be
inserted into, or removed from, a data stream without affecting
the information content of that stream, but such action may affect
the information layout and/or the control of equipment.
Unicode doesn't say how to interpret NUL at all, but does recommend ISO 6429 as one good way to interpret control characters not otherwise dealt with in Unicode (and it's the only way that Unicode mentions).
Stephen J. Turnbull writes:
It seems that this issue is still unsolved. Is there any estimated date for a fix? https://gitlab.com/mailman/hyperkitty/issues/155 . Why do you think this is a problem that Mailman should solve? What is your desired treatment of this data?
I was originally just asking for help and was asked by Berry Warsaw to submit the issue: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
I tried to clean up the files as told by Simon Hannah. When that didn't work, I was simply just reporting that, thinking that there might be a problem somewhere else.
I am sorry if I did anything wrong by asking for help.
-- Henrik Rasmussen
I also ran into the same error.
For me, I had to make sure I had no lines in my import mbox file that started with =00
Hope that helps.
participants (6)
-
Barry Warsaw
-
Henrik Rasmussen
-
kawhite@ancestry.com
-
Mark Sapiro
-
Simon Hanna
-
Stephen J. Turnbull