On Fri, 2020-10-02 at 20:28 -0700, Mark Sapiro wrote:
[...]Part of what's going on is some MM 2.1 archives has email addressesobfuscated by replacing
@
withat
so the import process reversesthis and changes the From: toheader isFrom: Firstname Surname@example-UK <user@example.com>
which now contains two email addresses albeit not separated by a commaand picks the first. The message then gets properly archived withsenderFirstname Surname@example-UK
which is a syntactically validemail address and doesn't cause the ResponseNotReady() exception insync_mailman so I don't see that issue.
I saw the message archived also, but in the sync_mailman I did pick up something.
I'm a bit old school so my first goto is normally a strace. From that I saw the sync connect to the rest qrunner, querying each email address for their mailman id. However, when it came to querying the dodgy email, it did not get that far. The connection was made and closed. Initially I thought the rest closed its connection before the sync could even get it's query out (i.e. GET /3.0/users/Firstname Surname@example-UK) but an strace on the rest and the sync shows the connection made and closed immediately afterwards at both ends. Twice in fact (IPv4 and IPv6). Previous calls were as you would expected with the GET and HTTP/1.0 response.
The traceback ResponseNotReady() exception within the client.get_user(self.address) call of Sender.set_mailman_id in hyperkitty/models/sender.py only makes sense if sync was attempting to talk on a closed socket.
So I didn't get it - I still dont. Why did the sync not get as for as the rest "GET /3.0/users/Firstname Surname@example-UK" and "HTTP/1.0 404 Not Found" exchange, as it had with the previous (valid) email? Why was the rest socket closed without a query made? My only thought last night was that the rest connection was established by the sync, when then decided to close the connection because the email was invalid, but it continued attempting to get the mailman_id hence the exception.
Anyway, you then stepped up with your suggestion, I added the correct email to hyperkitty_sender, updated the faulty email record in hyperkitty_email to point to the correct hyperkitty_sender record and deleted the dodgy hyperkitty_sender record, and the sync passed. All good for me anyway.
The difference is either the fact that my test runs with the HEAD of theGitLab branch which may not have the issue or the difference between thereal From: header and the sanitized one is significant.
If the sync works for you in the trunk now, there seems little point figuring out why this does not work in the current debian package. Hopefully there will be an update by debain soon, and for now I can carry on my import and fix the issue if it happens again.
However, it would be interesting to know what ends up in your archive? I suspect it will have the dodgy "Firstname Surname@example- UK" address as the sender. If that is the case then certainly as you mention in https://gitlab.com/mailman/hyperkitty/-/issues/320#note_423030205, HyperKitty's wrapper should be smarter and call email.utils.parseaddr() first and only try replacing at if the returned address doesn't contain @. Otherwise you end up with a bad import and dodgy email in your archives.
Thanks -- Alex