Hyperkitty - Non-ascii sender address
Hi,
I imported some mbox archives in Hyperkitty.
I have lot issues like:
/\Failed adding message <5b0900ab.1c69fb81.70ab.2f87@mx.google.com>: ('Non-ascii sender address', <email.message.EmailMessage object at 0x7f05f1606048>)// //Non-ascii sender address from "Stéphane blipblop <stlablipblop at gmail.com>" about [HATLAB][It] Re: A débarrasser ! /
It's probably because "Stéphane".
Some advise to fix it ?
Name: HyperKitty Version: 1.2.1
Thanks,
Thomas.
On 10/28/18 12:03 PM, Guiseppin Thomas wrote:
Hi,
I imported some mbox archives in Hyperkitty.
I have lot issues like:
/\Failed adding message <5b0900ab.1c69fb81.70ab.2f87@mx.google.com>: ('Non-ascii sender address', <email.message.EmailMessage object at 0x7f05f1606048>)// //Non-ascii sender address from "Stéphane blipblop <stlablipblop at gmail.com>" about [HATLAB][It] Re: A débarrasser ! /
It's probably because "Stéphane".
The 'Non-ascii sender address' message comes from
try: from_str = header_to_unicode(message['From']) from_name, from_email = parseaddr(from_str) from_name = from_name.strip() sender_address = from_email.encode('ascii').decode("ascii").strip() except (UnicodeDecodeError, UnicodeEncodeError): raise ValueError("Non-ascii sender address", message)
Possibly the exception is thrown in parseaddr(), but much more likely, the email address, not the display name contains non-ascii.
Some advise to fix it ?
All the other messages should have been imported. If you can find and fix the 'bad' ones in the mbox, you can make a new mbox with just those and import that. You could also import the entire mbox with fixed messages. The already added ones won't be re-imported because of duplicate Message-ID:, but it's extra processing.
If the actual bad message From: header is
From: Stéphane blipblop <stlablipblop@gmail.com>
or
From: Stéphane blipblop <stlablipblop at gmail.com>
That shouldn't cause the error, so I'g guessing that 'blipblop' is not the actual value and the actual value in <stlablipblop at gmail.com> has non-ascii.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Hi,
https://gitlab.com/mailman/hyperkitty/blob/master/hyperkitty/lib/incoming.py...
I changed "sender" for this; # Sender try: from_str = header_to_unicode(message['From']) from_name, from_email = parseaddr(from_str) from_name = unidecode.unidecode(from_name).strip() except (UnicodeDecodeError, UnicodeEncodeError): raise (ValueError("Non-ascii sender address", message))
try:
sender_address = from_email.encode('ascii').decode("ascii").strip()
except (UnicodeDecodeError, UnicodeEncodeError):
if from_name:
sender_address = re.sub("[^a-z0-9]", "", from_name.lower())
if not sender_address:
sender_address = "unknown@unknown.com"
else:
sender_address = "unknown@unknown.com"
print("Non-ascii sender addre -- Sender address replaced by ",
sender_address)
I got this output (example): |Non-ascii sender address -- Sender address replaced by stephaneblipblopstlablipblopgmailcom It's not really perfect, I'm working to do something better, but basically it's works.
I think that there is a issue in the original code: Because : "" raise ValueError("Non-ascii sender address", message) ""
The second part will be never call: if not sender_address: if from_name: sender_address = re.sub("[^a-z0-9]", "", from_name.lower()) if not sender_address: sender_address = "unknown" sender_address = "{}@example.com".format(sender_address) else: sender_address = "unknown@example.com"
If you bypass : "" raise ValueError("Non-ascii sender address", message) "", by a simple print(), you will get an error due to an unassigned variable (sender_address).
Le mar. 30 oct. 2018 à 04:10, Mark Sapiro <mark@msapiro.net> a écrit :
On 10/28/18 12:03 PM, Guiseppin Thomas wrote:
Hi,
I imported some mbox archives in Hyperkitty.
I have lot issues like:
/\Failed adding message <5b0900ab.1c69fb81.70ab.2f87@mx.google.com>: ('Non-ascii sender address', <email.message.EmailMessage object at 0x7f05f1606048>)// //Non-ascii sender address from "Stéphane blipblop <stlablipblop at gmail.com>" about [HATLAB][It] Re: A débarrasser ! /
It's probably because "Stéphane".
The 'Non-ascii sender address' message comes from
try: from_str = header_to_unicode(message['From']) from_name, from_email = parseaddr(from_str) from_name = from_name.strip() sender_address =
from_email.encode('ascii').decode("ascii").strip()
except (UnicodeDecodeError, UnicodeEncodeError): raise ValueError("Non-ascii sender address", message)
Possibly the exception is thrown in parseaddr(), but much more likely, the email address, not the display name contains non-ascii.
Some advise to fix it ?
All the other messages should have been imported. If you can find and fix the 'bad' ones in the mbox, you can make a new mbox with just those and import that. You could also import the entire mbox with fixed messages. The already added ones won't be re-imported because of duplicate Message-ID:, but it's extra processing.
If the actual bad message From: header is
From: Stéphane blipblop <stlablipblop@gmail.com>
or
From: Stéphane blipblop <stlablipblop at gmail.com>
That shouldn't cause the error, so I'g guessing that 'blipblop' is not the actual value and the actual value in <stlablipblop at gmail.com> has non-ascii.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 10/31/18 5:57 AM, Thomas G wrote:
Hi,
https://gitlab.com/mailman/hyperkitty/blob/master/hyperkitty/lib/incoming.py...
I changed "sender" for this; # Sender try: from_str = header_to_unicode(message['From']) from_name, from_email = parseaddr(from_str) from_name = unidecode.unidecode(from_name).strip()
What is unidecode?
except (UnicodeDecodeError, UnicodeEncodeError): raise (ValueError("Non-ascii sender address", message)) try: sender_address = from_email.encode('ascii').decode("ascii").strip() except (UnicodeDecodeError, UnicodeEncodeError): if from_name: sender_address = re.sub("[^a-z0-9]", "", from_name.lower()) if not sender_address: sender_address = "unknown@unknown.com" else: sender_address = "unknown@unknown.com" print("Non-ascii sender addre -- Sender address replaced by ",
sender_address)
I got this output (example): |Non-ascii sender address -- Sender address replaced by stephaneblipblopstlablipblopgmailcom It's not really perfect, I'm working to do something better, but basically it's works.
The original code intends to provide some value for the email address where parseaddr returns a null address.
You have an address, and you would do better to remove the non-ascii from the address than to try to make and address out of the sanitized real name.
Your issue is parseaddr() returns an email address with non-ascii.
I would suggest that instead of
try: sender_address = from_email.encode('ascii').decode("ascii").strip() except (UnicodeDecodeError, UnicodeEncodeError): if from_name: sender_address = re.sub("[^a-z0-9]", "", from_name.lower()) if not sender_address: sender_address = "unknown@unknown.com" else: sender_address = "unknown@unknown.com"
you do
try: sender_address = from_email.encode('ascii').decode("ascii").strip() except (UnicodeDecodeError, UnicodeEncodeError): sender_address = from_email.encode('ascii', errors='replace').decode("ascii").strip()
I think that there is a issue in the original code: Because : "" raise ValueError("Non-ascii sender address", message) ""
The second part will be never call:
It will if there is no exception.
if not sender_address: if from_name: sender_address = re.sub("[^a-z0-9]", "", from_name.lower()) if not sender_address: sender_address = "unknown" sender_address = "{}@example.com".format(sender_address) else: sender_address = "unknown@example.com"
If you bypass : "" raise ValueError("Non-ascii sender address", message) "", by a simple print(), you will get an error due to an unassigned variable (sender_address).
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
you do
try: sender_address =
from_email.encode('ascii').decode("ascii").strip()
except (UnicodeDecodeError, UnicodeEncodeError): sender_address = from_email.encode('ascii',
errors='replace').decode("ascii").strip()
It's perfect, thank !
Le sam. 3 nov. 2018 à 19:11, Mark Sapiro <mark@msapiro.net> a écrit :
On 10/31/18 5:57 AM, Thomas G wrote:
Hi,
https://gitlab.com/mailman/hyperkitty/blob/master/hyperkitty/lib/incoming.py...
I changed "sender" for this; # Sender try: from_str = header_to_unicode(message['From']) from_name, from_email = parseaddr(from_str) from_name = unidecode.unidecode(from_name).strip()
What is unidecode?
except (UnicodeDecodeError, UnicodeEncodeError): raise (ValueError("Non-ascii sender address", message)) try: sender_address =
from_email.encode('ascii').decode("ascii").strip()
except (UnicodeDecodeError, UnicodeEncodeError): if from_name: sender_address = re.sub("[^a-z0-9]", "", from_name.lower()) if not sender_address: sender_address = "unknown@unknown.com" else: sender_address = "unknown@unknown.com" print("Non-ascii sender addre -- Sender address replaced by ",
sender_address)
I got this output (example): |Non-ascii sender address -- Sender address replaced by stephaneblipblopstlablipblopgmailcom It's not really perfect, I'm working to do something better, but
basically
it's works.
The original code intends to provide some value for the email address where parseaddr returns a null address.
You have an address, and you would do better to remove the non-ascii from the address than to try to make and address out of the sanitized real name.
Your issue is parseaddr() returns an email address with non-ascii.
I would suggest that instead of
try: sender_address =
from_email.encode('ascii').decode("ascii").strip()
except (UnicodeDecodeError, UnicodeEncodeError): if from_name: sender_address = re.sub("[^a-z0-9]", "", from_name.lower()) if not sender_address: sender_address = "unknown@unknown.com" else: sender_address = "unknown@unknown.com"
you do
try: sender_address =
from_email.encode('ascii').decode("ascii").strip()
except (UnicodeDecodeError, UnicodeEncodeError): sender_address = from_email.encode('ascii',
errors='replace').decode("ascii").strip()
I think that there is a issue in the original code: Because : "" raise ValueError("Non-ascii sender address", message) ""
The second part will be never call:
It will if there is no exception.
if not sender_address: if from_name: sender_address = re.sub("[^a-z0-9]", "", from_name.lower()) if not sender_address: sender_address = "unknown" sender_address = "{}@example.com".format(sender_address) else: sender_address = "unknown@example.com"
If you bypass : "" raise ValueError("Non-ascii sender address", message) "", by a simple print(), you will get an error due to an unassigned variable (sender_address).
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
participants (3)
-
Guiseppin Thomas
-
Mark Sapiro
-
Thomas G