Settings for prevention of spam obfuscation

newer
Google Summer of Code 2024

older
Unable to get Postfix to bounce...

Richard Rosner

Feb. 23, 2024

12:45 p.m.

Are there any settings that may help with making it easier seeing spam as spam? E.g. some mail programs, including Outlook, allow the user to set a from mail address that most mail clients will show instead of the true sender address, but the senders mail address is still present in the sender header field. While some people and companies carelessly abuse that functionality, hiding their true mail address, those bad practices also are abused my spammers and others. Does mailman have any settings that would remove any such obfuscations, e.g. in this case detect the manipulated from address and replace it with the sender address?

While I do run my own spam filter, it's virtually getting harder every day to make sure you detect all spam while not having too many false positives. If it was possible to reverse at least the most obvious manipulations, that would probably already help. Also, it would make it much easier for the users to see at first glance who actually wrote an email. It's always less trustworthy if the senders address is shown as user3168@suspiciousdomain.gg instead of it being manipulated to say support@yourprovider.com.

Show replies by date

Stephen J. Turnbull

February 2024

2:27 p.m.

Richard Rosner writes:

...

Are there any settings that may help with making it easier seeing spam as spam?

I gather you mean that you are letting spam go through to your subscribers, and you want to give them more information? For practical purposes, you've already lost, though. Studies show that users are not very good at picking up on such details.

...

E.g. some mail programs, including Outlook, allow the user to set a from mail address that most mail clients will show instead of the true sender address, but the senders mail address is still present in the sender header field. While some people and companies carelessly abuse that functionality, hiding their true mail address, those bad practices also are abused my spammers and others.

What you consider "abuse" or "bad practice" is often very useful. For example, I have validated four different addresses that I can set in Gmail, each of which I use for specific recipients. Of course, all are "public" identities, if you know me you likely know more than one of those addresses. Of course all such messages have my Gmail address as sender. I would be very angry if you dug that out and substituted it for my public identity.

If you are seriously considering such a strategy, you should do a study to see whether you'd catch more spam than ham this way.

In any case, competent spammers are spoofing *all* of the identities involved in creating and injecting the email. The only way to effectively authenticate email is to use cryptographic signatures. Since very few users do this themselves, you're left with DKIM. Technically speaking DKIM only authenticates the sending host. But on the assumption that the sending host authenticates its users, if the From field is included in that host's DKIM signature, you can trust the authenticity of the author identity in the From field.

Mailman does not perform DKIM verification, but your MTA almost certainly does already. It probably also already performs the DMARC "From alignment" check, which is even more strict than the DKIM test I described above. From alignment requires a valid DKIM signature including the From field, and the domain in the From address must be the same as the domain that signed the message. Your spam filter should be using those before forwarding to Mailman.

...

Does mailman have any settings that would remove any such obfuscations, e.g. in this case detect the manipulated from address and replace it with the sender address?

No. In fact, the only optional manipulations of From that Mailman performs are (a) to remove the address for anonymized lists, and (b) to change the address to that of the list in cases where changes to the message performed by the list would invalidate the DKIM signature.

If you are enabling features that invalidate the original DKIM signature, you can provide more information to your subscribers via the ARC (Authenticated Received Chain) protocol. It's better to have this implemented in the MTA, but as long as the MTA is configured to provide verification of incoming SPF and DKIM results in the Authentication-Results field, Mailman can optionally participate in ARC for you.

...

While I do run my own spam filter, it's virtually getting harder every day to make sure you detect all spam while not having too many false positives. If it was possible to reverse at least the most obvious manipulations, that would probably already help. Also, it would make it much easier for the users to see at first glance who actually wrote an email. It's always less trustworthy if the senders address is shown as user3168@suspiciousdomain.gg instead of it being manipulated to say support@yourprovider.com.

I think the odds are good that you'll annoy some of the more sophisticated subscribers, but not save anyone from spam.

I haven't collected statistics on this, but anomolous message IDs are often noticable, often in messages "from support@someprovider.com". Of course almost none of your users will be checking that, but your spam filter may be able to use failed alignment between Message-ID and DKIM signer.

You haven't described your use case in any detail, so the following advice may not apply for your lists. But generically the strategies that have high success rates are

Human moderation, preferably after a highly accurate filter, of course.
Members-only list.
Crank up the spam points for invalid or missing DKIM signature and for failed From alignment (but this may cause problems for some of your posters).

There are other strategies specific to the list. One list I know had success by giving negative spam points to mentions of the list's topic.

Steve

Richard Rosner

3:56 p.m.

Well, I don't want to give users additional information, just rectify the manipulated information. And yes, while there may be edge cases where these bad practices do come in handy, but the vast majority of mails (at least from what I can tell from my lists) is either spammers or unnecessary abuse of features. I can't tell if it's because Microsoft sets bad defaults for Outlook or admins hired are incompetent and the competent ones either retire or quit, but there's just too much going on in mails that's just nonsense. E.g. encoding all text - including stuff like the subject - with base64 is just ridiculous, when what was sent is only plain text that uses a small subset of UTF-8 characters. And that's just a very common bad practice and far from the only one.

And sure, competent spammers may be able to evade much more, I've seen several emails that must have had manipulated sender data, but that could only be concluded from context, not from analyzing the headers. But thankfully most spammers are far from competent, so spam filters can easily detect far over 50 % of the spam. It's only the slightly more competent spammers that use tactics abused by many non-spammers that give me a headache. I could easily crank up the spam points for various indications, so there would be pretty much no spam left in the false negative category. But sadly, I know from experience that this will lead to a much larger amount of false positives. For the most common senders that won't be sending spammers, I already created whitelisting rules, but that's about it. And until someone can create an ML model that's not gobbling up resources for training and that doesn't need thousands of emails to train with, there's only so much you can do.

While DKIM and DMARC are great, they are useless when the vast majority of senders don't have any DKIM signatures. For ARC it's not as uncommon to be present, but the situation isn't that much better.

Thomas Ward

5:01 p.m.

I think we forget DMARC can operate with SPF only mode. If you truly want stronger antispam controls then deny all posts from nonmembers, and/or implement SpamAssassin at the postfix level with extended OSINT pattern sets and policies. That will catch MOST stuff but also likely have a higher false positive moderation / reject rate

Sent from my Galaxy

...

-------- Original message -------- From: Richard Rosner <rrosner5@gmail.com> Date: 2/24/24 10:57 (GMT-05:00) To: mailman-users@mailman3.org Subject: [MM3-users] Re: Settings for prevention of spam obfuscation Well, I don't want to give users additional information, just rectify the manipulated information. And yes, while there may be edge cases where these bad practices do come in handy, but the vast majority of mails (at least from what I can tell from my lists) is either spammers or unnecessary abuse of features. I can't tell if it's because Microsoft sets bad defaults for Outlook or admins hired are incompetent and the competent ones either retire or quit, but there's just too much going on in mails that's just nonsense. E.g. encoding all text - including stuff like the subject - with base64 is just ridiculous, when what was sent is only plain text that uses a small subset of UTF-8 characters. And that's just a very common bad practice and far from the only one. And sure, competent spammers may be able to evade much more, I've seen several emails that must have had manipulated sender data, but that could only be concluded from context, not from analyzing the headers. But thankfully most spammers are far from competent, so spam filters can easily detect far over 50 % of the spam. It's only the slightly more competent spammers that use tactics abused by many non-spammers that give me a headache. I could easily crank up the spam points for various indications, so there would be pretty much no spam left in the false negative category. But sadly, I know from experience that this will lead to a much larger amount of false positives. For the most common senders that won't be sending spammers, I already created whitelisting rules, but that's about it. And until someone can create an ML model that's not gobbling up resources for training and that doesn't need thousands of emails to train with, there's only so much you can do. While DKIM and DMARC are great, they are useless when the vast majority of senders don't have any DKIM signatures. For ARC it's not as uncommon to be present, but the situation isn't that much better. _______________________________________________ Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/... This message sent to teward@thomas-ward.net

Richard Rosner

5:08 p.m.

Good to know, but even SPF isn't broadly enough used.

And no, locking out all non-members doesn't work in this case, that's not how these lists are supposed to work. And after some trials with SpamAssassin, seeing that it pretty much doesn't seem to have any good documentation - or it's lacking features left and right, which would be even worse - so I'm sticking with rspamd.

Also, what you propose is the exact opposite of what I'm trying to do. I do not want to have the false positive rate skyrocket, that's the whole point of this thread.

Stephen J. Turnbull

7:43 p.m.

Richard Rosner writes:

...

And yes, while there may be edge cases where these bad practices

If you're willing to impose that judgment on your posters, just announce that you're going to start discarding posts that don't satisfy From alignment, give it a week, and start doing it. That will ensure that the Sender is always in the From field. No more, and no less. I wouldn't do it, but it's a perfectly effective way to achieve what you seem to believe is ideal.

...

And until someone can create an ML model that's not gobbling up resources for training and that doesn't need thousands of emails to train with, there's only so much you can do.

SpamBayes has been around for decades.[1] It's not ChatGPT-level, but it also doesn't take "significant resources" or thousands of emails for training (your current spambucket and INBOX will be a decent start) and you can train it on the fly as mail comes in. The only resources required are a couple of hours to install it and learn how to feed ham and spam corpuses into it, and a few minutes every so often to feed increments of ham and spam to it. You can even avoid that amount of effort by feeding the spam and ham it outputs back into it.[2] Almost certainly you can plug it into rspamd.

...

While DKIM and DMARC are great, they are useless when the vast majority of senders don't have any DKIM signatures. For ARC it's not as uncommon to be present, but the situation isn't that much better.

Please stop. These claims are not going to pass on this list. We know better.[3] If you are in a special situation where they bear some resemblance to the truth, please explain. Maybe we can come up with a solution for that situation together. But they are wildly wrong in the context of Internet email in general.

Look, Richard, spam is extremely frustrating to all of us. There's a lot of effort and talent that has gone into combatting spam, and a fair amount of knowledge about it has accumulated in the people who participate in Mailman lists. If you want help with the particular spam that's getting into your lists, we'll be happy to help (although help channels for your preferred filtering software might be more productive). But the way to get help is to explain what your problem is, and what you've done that isn't working, in some detail.

You telling us how we can fix it in Mailman is almost always a non- starter. Either it's going to screw up the mail system with little anti-spam effect, or it's been tried and shown to be ineffective. It's certainly not persuasive if you haven't shown us what the problem is at a more granular level than "we all hate spam!" It's possible that we'll come back to your suggestion sooner or later. Sometimes there's no other way that's better. But that's quite rare. ;-)

Footnotes: [1] In fact, I've recently seen the (otherwise unsupported ;-) claim that spam-filtering was the original motivation for ML research. :-)

[2] This leads to overfitting and increased error rates after a while. YMMV.

[3] For example, in my current spam-bucket, all of which I've eyeballed, there are 7519 Message-IDs and 9521 DKIM-Signatures. Of 206 messages from external sites in my now spam-free INBOX, there are 336 DKIM-Signatures. Since it's common to have more than one DKIM-Signature per message, that doesn't prove that essentially all of those messages are signed. But it sure seems likely! And ARC is much less common: 4903 ARC-Seals in spam, and 155 in INBOX.

Richard Rosner

9:21 p.m.

Stephen J. Turnbull wrote:

...

announce that you're going to start discarding posts that don't satisfy From alignment, give it a week, and start doing it. That will ensure that the Sender is always in the From field. No more, and no less. I wouldn't do it, but it's a perfectly effective way to achieve what you seem to believe is ideal.

Not an option. It's not an internal list. It's a list for "outsiders" to have a single address to write to and reach several relevant people. I wish I could enforce good practices on the world, but that won't be happening any time soon.

...

it also doesn't take "significant resources" or thousands of emails for training (your current spambucket and INBOX will be a decent start) and you can train it on the fly as mail comes in. The only resources required are a couple of hours to install it and learn how to feed ham and spam corpuses into it, and a few minutes every so often to feed increments of ham and spam to it. You can even avoid that amount of effort by feeding the spam and ham it outputs back into it.[2] Almost certainly you can plug it into rspamd.

Sounds interesting, but seems quite unusable. Development has stopped almost two decades ago, as it seems. So it requires Python 2 and I doubt it's capable of handling Python3. Sorry, but I'm not turning my mail server into a dumpster fire.

...

...
Please stop. These claims are not going to pass on this list. We know better.[3] If you are in a special situation where they bear some resemblance to the truth, please explain. Maybe we can come up with a solution for that situation together. But they are wildly wrong in the context of Internet email in general.

Guess what, I do not care about what you think the "internet email in general" is. That's only your view point that doesn't match what I see going through my publicly addressable mailing lists. Sure, I'm not keeping books about every single mail that comes in and what it has in its headers. And sure, the vast majority of mails - that can't have a DKIM signature in the first place because of infrastructural reasons - are already accepted by whitelisting rules so they are irrelevant. At least as long as the whitelisting in rspamd keeps doing its job. The journey to that was hard enough. But still, the number of mails having at least either DKIM or ARC isn't that large. And even worse, DKIM is more common in the spam mails I have in my spam folder right now. Sure, those didn't have manipulated sender addresses, the biggest manipulation there was the almost boring method of putting whatever into the From field before the address so it would be shown as the name. But then you have badly written mail clients that make it easier with such lazy tricks to get by.

But in the end, this simply wasn't the topic of this thread. The question was if reversing at least some of the most ridiculous manipulations of emails was possible through mailman settings.

Stephen J. Turnbull

9:17 a.m.

Richard Rosner writes:

...

But in the end, this simply wasn't the topic of this thread. The question was if reversing at least some of the most ridiculous manipulations of emails was possible through mailman settings.

True. When you post in the future, I promise not to offer any help that you didn't explicitly request.

Steve

Richard Rosner

1:52 p.m.

At least it should be part of a completely new topic if so desired. Commenting something quite unrelated to the question at hand, apart from referencing other topics, usually isn't the most helpful.

511

Age (days ago)

514

Last active (days ago)

List overview

Download

8 comments

3 participants

participants (3)

Richard Rosner
Stephen J. Turnbull
Thomas Ward

Settings for prevention of spam obfuscation

tags

participants (3)