Emergency moderation and clearing out a lot of held messages

newer
message awaits moderator approval´

bob B

April 28, 2022

5:59 p.m.

We had an issue where a program went rogue and sent out 30k email alerts to one of our lists, so now I have a few questions.

When the issue happened I turned on emergency moderation and we have resolved the issue so..

How do I easily delete 30k of held messages? It looks like the GUI only allows me to do 200 at a time? is there a way to flush all held messages easily?
When I enabled Emergency moderation, I expected it to stop notifying the admin on every held message, but it appears that is not the case? Also, does it reply to the sender of every held message when Emergency moderation is on?

it seems like the steps are: turn on "Emergency moderation" turn OFF "Notify users of held messages" turn OFF "Admin immed notify"

Show replies by date

Mark Sapiro

April 2022

6:13 p.m.

On 4/28/22 10:59, bob B via Mailman-users wrote:

...

We had an issue where a program went rogue and sent out 30k email alerts to one of our lists, so now I have a few questions.

When the issue happened I turned on emergency moderation and we have resolved the issue so..

How do I easily delete 30k of held messages? It looks like the GUI only allows me to do 200 at a time? is there a way to flush all held messages easily?

See https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/... and the thread containing it.

...

When I enabled Emergency moderation, I expected it to stop notifying the admin on every held message, but it appears that is not the case? Also, does it reply to the sender of every held message when Emergency moderation is on?

it seems like the steps are: turn on "Emergency moderation" turn OFF "Notify users of held messages" turn OFF "Admin immed notify"

MM 2.1 didn't notify the admins of emergency holds. This is not the case in MM 3. It probably should be, but isn't. Senders are or aren't notified per "Notify users of held messages".

So the steps above are correct.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

bob B

7:31 p.m.

sorry that makes sense, users not senders, but what is the setting for the sender? I have emergency moderation on a list, and if someone sends to the list they get this message:

Your mail to 'maillistalpha@XXXX.com' with the subject test Is being held until the list moderator can review it for approval. The message is being held because: Emergency moderation is in effect for this list Either the message will get posted to the list, or you will receive notification of the moderator's decision.

Can I turn this off for a list?

So if a mail list is getting bombed and I have moderation on, the mailman is then trying to send messages back for each incoming email, when it should just be holding it without notifying the sender.

Mark Sapiro

8:25 p.m.

On 4/28/22 12:31, bob B via Mailman-users wrote:

...

sorry that makes sense, users not senders, but what is the setting for the sender? I have emergency moderation on a list, and if someone sends to the list they get this message:

Your mail to 'maillistalpha@XXXX.com' with the subject test Is being held until the list moderator can review it for approval. The message is being held because: Emergency moderation is in effect for this list Either the message will get posted to the list, or you will receive notification of the moderator's decision.

Can I turn this off for a list?

As you suggested in your OP, set "Notify users of held messages" to No.

Is that not working? It should.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

bob B

9:53 p.m.

sorry, it is working, maybe I did not save the setting and just had the settings page up.

bob B

8:09 p.m.

Mark, Thanks for the Assistance but a little lost in the interpreter, and no luck googling. How do Iget back to the >>> prompt?

I enter

bash-5.0$ mailman shell -l maillistappha@XXXXX.org Welcome to the GNU Mailman shell Use commit() to commit changes. Use abort() to discard changes since the last commit. Exit with ctrl+D does an implicit commit() but exit() does not. The variable 'm' is the maillistappha@XXXXX.org mailing list

...

...
...
from mailman.app.moderator import handle_message requestdb = IListRequests(m) for req in requestdb.held_requests: ... if req.request_type == RequestType.held_message: ... handle_message(m, req.id, Action.discard) ... ********* This is where I am stuck ********** I hit return it takes me to a new line with no characters? how do i get back up a level to the >>> prompt to enter commit()

Mark Sapiro

8:22 p.m.

On 4/28/22 13:09, bob B via Mailman-users wrote:

...

Mark, Thanks for the Assistance but a little lost in the interpreter, and no luck googling. How do Iget back to the >>> prompt?

I enter

bash-5.0$ mailman shell -l maillistappha@XXXXX.org Welcome to the GNU Mailman shell Use commit() to commit changes. Use abort() to discard changes since the last commit. Exit with ctrl+D does an implicit commit() but exit() does not. The variable 'm' is the maillistappha@XXXXX.org mailing list

...
...
...
from mailman.app.moderator import handle_message requestdb = IListRequests(m) for req in requestdb.held_requests: ... if req.request_type == RequestType.held_message: ... handle_message(m, req.id, Action.discard) ... ********* This is where I am stuck ********** I hit return it takes me to a new line with no characters? how do i get back up a level to the >>> prompt to enter commit()

You wait until it finishes processing the 30K messages and then you'll get the >>> prompt.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Dan Caballero

November 2022

11:04 p.m.

We're running Mailman in AWS using and RDS MySQL database. It seems to take unusually long to approve or discard even a few messages. What should be a normal response time for processing as few as 10 messages?

I've done some testing in the pass running Mailman3 in a Docker container and using a local instance of MariaDB. Even in that situation I still found the processing slow.

Please advise.

-- Dan

Stephen J. Turnbull

11:48 a.m.

Dan Caballero writes:

...

We're running Mailman in AWS using and RDS MySQL database. It seems to take unusually long to approve or discard even a few messages.

I assume you are use Postorius for this?

...

What should be a normal response time for processing as few as 10 messages?

I don't know about MySQL/Maria. All of the production instances I have access to are backed by PostgreSQL. Response using Postorius is not instantaneous, but 2-3 seconds is maximum. Also, I don't think I've seen multiple messages with the same treatment in visit in many months (maybe a year), but I don't recall those "feeling" longer than a single message request.

...

I've done some testing in the pass running Mailman3 in a Docker container and using a local instance of MariaDB. Even in that situation I still found the processing slow.

I guess since you talk about "10 at once" you have a much larger lists and/or much larger user databases and/or heavier traffic per user and/or more "spam". It would help to know what the scale of your Mailman installation is in those dimensions.

I don't think Mailman's Python code is likely to be the bottleneck, rather it's probably database queries. Have you looked at MySQL's logs for queries related to those operations?

Steve

Dan Caballero

5:51 p.m.

Thanks for the quick reply. We have over 2000 lists but a relative few receive large amounts of spam or unusual amounts of messages which aren't intended to be delivered to list members. We've been using Mailman for decades so there's been a lot of creative applications for the system which do not correlate to a standard discussion list or listserv.

I'm currently removing 2800 messages from one such list via Python shell and it's been going for 16 hours now. It's only half done.

I had previously increased timeout settings so that Postorius would not cause a browser timeout. That helps the vast majority of our list owners/moderators do what they need for small batches of messages.

I'll try replicating this latest issue in my test environment where logging is easier to inspect.

I'll follow up with my findings and more questions.

Thank you!

Dan Caballero

7:04 p.m.

I'm seeing similar messages in both our prod instance and the test instance I run on my desktop.

Production:

"2022-11-09T18:37:49.106105Z 19100868 [Note] Aborted connection 19100868 to db: 'mailman3_prod' user: 'mailman' host: '10.6.5.187' (Got an error reading communication packets)"

Test:

"2022-11-09 18:46:07 52 [Warning] Aborted connection 52 to db: 'mailman3_prod' user: 'mailman' host: '172.18.0.3' (Got an error reading communication packets)"

I had previously increased the default value for "max_allowed_packet" to 12MB in our production db. The test db has that set to 16MB by default. When we first set this up we were getting errors based on larger than average messages not making it into the database due to that setting. Once I bumped it up those problems were resolved.

Any thoughts on what could be causing these error messages? We run Mailman on a container so is it possible that its related to the Python environment somehow?

Thanks.

-- Dan

Stephen J. Turnbull

3:01 p.m.

I don't have a helpful suggestion yet but would like to clarify:

Dan Caballero writes:

...

"2022-11-09T18:37:49.106105Z 19100868 [Note] Aborted connection 19100868 to db: 'mailman3_prod' user: 'mailman' host: '10.6.5.187' (Got an error reading communication packets)"

db: mailman3_prod is the core database, not the Django (Postorius and Hyperkitty, or mailman-web), is that correct?

Caballero, Danny (Dan)

3:17 p.m.

I don't understand. We have a single database configured for everything. I don't recall reading anything in the set-up documentation about using more than 1 database for Mailman3.

Let me know if I'm missing something!

-- Dan

Get Outlook for Android<https://aka.ms/AAb9ysg>

From: Stephen J. Turnbull <stephenjturnbull@gmail.com> Sent: Friday, November 11, 2022, 7:01 AM To: Caballero, Danny (Dan) <dancab@caltech.edu> Cc: mailman-users@mailman3.org <mailman-users@mailman3.org> Subject: [MM3-users] Re: Emergency moderation and clearing out a lot of held messages

I don't have a helpful suggestion yet but would like to clarify:

Dan Caballero writes:

...

"2022-11-09T18:37:49.106105Z 19100868 [Note] Aborted connection 19100868 to db: 'mailman3_prod' user: 'mailman' host: '10.6.5.187' (Got an error reading communication packets)"

db: mailman3_prod is the core database, not the Django (Postorius and Hyperkitty, or mailman-web), is that correct?

Mark Sapiro

5:12 p.m.

On 11/11/22 07:17, Caballero, Danny (Dan) wrote:

...

I don't understand. We have a single database configured for everything. I don't recall reading anything in the set-up documentation about using more than 1 database for Mailman3.

A single database for everything is fine. Some of our docs, e.g. <https://docs.mailman3.org/en/latest/install/virtualenv.html#setup-database>, suggest two databases, mailman for Mailman core and mailmanweb for Postorius/HyperKitty, but either way is fine.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Stephen J. Turnbull

6:23 p.m.

Caballero, Danny (Dan) writes:

...

I don't understand. We have a single database configured for everything. I don't recall reading anything in the set-up documentation about using more than 1 database for Mailman3.

I was trying to help localize the database request to one of the three major components: core, Postorius, HyperKitty. If you use the two- database scheme documented here: https://docs.mailman3.org/en/latest/install/virtualenv.html#virtualenv-insta... knowing that database would at least identify whether the slowness is associated with core or with the web apps.

The excruciating details:

Conceptually, there are three separate databases: the one used by Mailman core to manage everything but authentication and archives, the one used by Django to manage authentication for HyperKitty and Postorius, and the one used by HyperKitty to manage archives. They don't share tables, and there are no name collisions for the tables so it doesn't really matter how the backend handles them.

The two database configuration is not really a recommendation. I suppose the practice derives from an abundance of caution, as there are concepts like "User" that must be implemented in somewhat different ways across components (core doesn't authenticate users, while Postorius's primary function is authentication of users). I myself would worry that there might be name collisions among tables, which would cause mayhem, but that's probably excessive.

Mark Sapiro

6:58 p.m.

On 11/11/22 10:23, Stephen J. Turnbull wrote:

...

The two database configuration is not really a recommendation. I suppose the practice derives from an abundance of caution, as there are concepts like "User" that must be implemented in somewhat different ways across components (core doesn't authenticate users, while Postorius's primary function is authentication of users). I myself would worry that there might be name collisions among tables, which would cause mayhem, but that's probably excessive.

FWIW, the Mailman3 installations at both mail.python.org and lists.mailman3.org use a single database and with the exception of Mailman core's tables and Postorius' template table, tables are prefixed with application names like auth, django_ and hyperkitty_.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Caballero, Danny (Dan)

12:36 a.m.

Thanks for some clarification.

I cleared out 2800 held messages via Python shell "mailman -l" option and it took about 28 hours. Anything done via Mailman shell is a connection from Mailman core to the database, correct?

Although I can't replicate that many messages via Postorius/DJango, I do see similar slowness there as well at a message per minute or more of processing.

I recreated the above Mailman shell clean up via Docker container running MariaDB and noticed similar processing time for discarding held messages. That leads me to conclude that the issue isn't network related or specific to AWS RDS.

I'm open to testing and switching over to Postgres if that is the straightforward solution.

Dan Caballero Systems Administrator Academic Computing Solutions IMSS - Caltech https://imss.caltech.edu

From: Stephen J. Turnbull <stephenjturnbull@gmail.com> Sent: Friday, November 11, 2022 10:23 AM To: Caballero, Danny (Dan) <dancab@caltech.edu> Cc: mailman-users@mailman3.org <mailman-users@mailman3.org> Subject: [MM3-users] Re: Emergency moderation and clearing out a lot of held messages

Caballero, Danny (Dan) writes:

...

I don't understand. We have a single database configured for everything. I don't recall reading anything in the set-up documentation about using more than 1 database for Mailman3.

The excruciating details:

Mark Sapiro

2:37 p.m.

On 11/11/22 16:36, Caballero, Danny (Dan) wrote:

...

Thanks for some clarification.

I cleared out 2800 held messages via Python shell "mailman -l" option and it took about 28 hours. Anything done via Mailman shell is a connection from Mailman core to the database, correct?

That's correct.

...

Although I can't replicate that many messages via Postorius/DJango, I do see similar slowness there as well at a message per minute or more of processing.

I recreated the above Mailman shell clean up via Docker container running MariaDB and noticed similar processing time for discarding held messages. That leads me to conclude that the issue isn't network related or specific to AWS RDS.

I'm assuming the docker container installation also had all the lists/users?

It seems this must be related to a large number of lists/users and we must be doing a number of queries that doesn't scale well. This seems excessive. We'll have to look at the processing to see what we're doing.

There have been some recent improvements in scaling. What Mailman core version is this?

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Dan Caballero

11:52 p.m.

Thanks Mark!

We're using GNU Mailman 3.3.5 (Tom Sawyer). The tests I did via the Docker container used a database dump from our production MySQL server.

We have 2050 lists. The address table in the database has 95028 rows.

-- Dan

Mark Sapiro

4:45 a.m.

On 11/16/22 15:52, Dan Caballero wrote:

...

Thanks Mark!

We're using GNU Mailman 3.3.5 (Tom Sawyer). The tests I did via the Docker container used a database dump from our production MySQL server.

We have 2050 lists. The address table in the database has 95028 rows.

For what it's worth, mail.python.org has 188 Mailman 3 lists and 85852 rows in the address table, and handling a message waiting moderation takes only a second or two. Thus, whatever the scaling issue is, it's likely related to the number of lists as opposed to the number of list members.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Stephen J. Turnbull

5:46 a.m.

Hi, Mark, Dan

I'm no database expert, but is it possible that there's some index that's missing? (Just a WAG, feel free to ignore if you know more about databases than "adding indicies solves all problems" ;-)

Mark Sapiro writes:

...

On 11/16/22 15:52, Dan Caballero wrote:

...
Thanks Mark!

We're using GNU Mailman 3.3.5 (Tom Sawyer). The tests I did via the Docker container used a database dump from our production MySQL server.

We have 2050 lists. The address table in the database has 95028 rows.

For what it's worth, mail.python.org has 188 Mailman 3 lists and 85852 rows in the address table, and handling a message waiting moderation takes only a second or two. Thus, whatever the scaling issue is, it's likely related to the number of lists as opposed to the number of list members.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Archived at: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...

This message sent to turnbull.stephen.fw@u.tsukuba.ac.jp

Dan Caballero

December 2022

11:24 p.m.

Any further thoughts on this potential 'scaling' issue?

Thanks in advance.

-- Dan

Mark Sapiro

8:20 p.m.

On 12/7/22 15:24, Dan Caballero wrote:

...

Any further thoughts on this potential 'scaling' issue?

This may be related to https://gitlab.com/mailman/mailman/-/issues/1026

Try the following in mailman shell

  >>> pendings = getUtility(IPendings)
  >>> count = 0
  >>> for token, data in pendings.find(pend_type='data'):
 ...     count += 1
 ... 
  >>> count

What is the value printed for count and how long does the for loop take?

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Dan Caballero

6:38 p.m.

Hi Mark,

I ran the commands and it took about 35 seconds for the loop to run.

Here's the result.

...

...
...
count 4168

-- Dan

Mark Sapiro

10:01 p.m.

On 12/13/22 10:38, Dan Caballero wrote:

...

Hi Mark,

I ran the commands and it took about 35 seconds for the loop to run.

Here's the result.

...
...
...
count 4168

So that's at least part of the issue. How does 35 seconds compare to the length of time to process one moderated message through Postorius?

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Abhilash Raj

5:49 a.m.

On 12/14/22 03:31, Mark Sapiro wrote:

...

On 12/13/22 10:38, Dan Caballero wrote:

...
Hi Mark,

I ran the commands and it took about 35 seconds for the loop to run.

Here's the result.

...
...
...
count 4168

So that's at least part of the issue. How does 35 seconds compare to the length of time to process one moderated message through Postorius?

It seems it has to do with in-efficient lookup of the actual entry in the pendings table which needs to be purged after handling a held message. It scales linearly with how many entries are there in pendedkeyvalue and pendings tables.

I was actually browsing through the perf metrics on mail.python.org and found the slowest API response from Core is handling held messages and spent some time last night trying to improve it 1. The linked MR should significantly reduce the time, although there is more that can be done to make it even better.

I haven't had a chance to benchmark my solution yet, but I am hoping it will reduce the latency quite significantly. I'll report on the linked issue 2 after deploying 1 on m.p.o. how much it helped.

-- thanks, Abhilash Raj (maxking)

Abhilash Raj

6:10 a.m.

On 12/14/22 03:31, Mark Sapiro wrote:

...

On 12/13/22 10:38, Dan Caballero wrote:

...
Hi Mark,

I ran the commands and it took about 35 seconds for the loop to run.

Here's the result.

...
...
...
count 4168

So that's at least part of the issue. How does 35 seconds compare to the length of time to process one moderated message through Postorius?

Probably should've read more of the thread before previous reply ;-)

So, the issue _could_ be due to latency to the remote database vs local database that we have on m.p.o. Mailman3 is not very efficient when it comes to total no. of database queries done per operation, which is something I have been (very slowly) tracking and fixing on a case-by-case basis.

So, for example, handling 1 held message in m.p.o made 1.1k postgres database calls in total and even though they were each only few hundred micro-seconds each, the total added up to roughly ~2sec avg response time on the handle message API endpoint. If you handle more than 1, it scales linearly.

My suspicion is that in your case Dan, due to a remote instance, the latency per call is higher (maybe in the order of few or 10s of milliseconds is my guess?) and then depending on the total no. of entries you have in pendings and pendedkeyvalue tables (with some filters, not _all_ entries), it adds up to a high value.

The no. of database calls here is of order n^2, where n is the entries in pendings of type "held message" and "data" and their respective linked relationships in pendedkeyvalue table.

My MR makes it such that we don't need to scan all entries of "type"="data", so that part will become constant time. And the pending of type "held message" will be limited to no. of held messages in a single MailingList(compared to all mailing lists like today), so it will help depending on the distribution of held messages in various lists. I am also exploring ways to make that 2nd query also constant time.

-- thanks, Abhilash Raj (maxking)

Dan Caballero

6:42 p.m.

Thanks Abhilash. Just to clarify, I do observe the latency in my test/dev environment which is a Docker Desktop container running with a local MariaDB instance. I load backup from my production system onto the local MariaDB instance for all dev/test work.

Dan Caballero

6:40 p.m.

I just did another test with our production system in AWS. It took about 70 seconds for Postorius to respond with the page stating "The selected messages were accepted". I did notice that the message that was held was released and delivered in a quarter of that time; about 15 seconds.

I use a test list with a few of my own email accounts subscribed to it.

965

Age (days ago)

1202

Last active (days ago)

List overview

Download

28 comments

6 participants

participants (6)

Abhilash Raj
bob B
Caballero, Danny (Dan)
Dan Caballero
Mark Sapiro
Stephen J. Turnbull

Emergency moderation and clearing out a lot of held messages

tags

participants (6)