Mailman not able to process "big" requests

Jonas Felber

April 24, 2018

5:41 a.m.

Hi,

We migrated from mailman2 to mailman3 a week ago. Since then we are unable to send messages the our students through the mailing list. Here are some symptoms:

I go to mass subscribe and subscribe 400 emails
I click subscribe, the browser (in my case safari) tries to load the page for a few seconds
The request times out
Once the service becomes responsible again, I see that only part of my emails got subscribed (~50 in this case).

For me this really looks like the emails are sent on the "web-thread" that gets killed once the request times out. Looking at this https://gitlab.com/mailman/postorius/blob/master/src/postorius/views/list.py... <https://gitlab.com/mailman/postorius/blob/master/src/postorius/views/list.py...> seems to confirm my suspicions.

Some quick overview of our infrastructure: The TCP endpoints are: browser -> haproxy in kubernetes-entry-router -> nginx in kubernetes-ingress -> nginx in mailman -> mailman

We mitigated the issue by giving it more CPU power and increasing the timeouts in one of the nginx endpoints. We also have a problem with sending out emails to "large" (>50) groups, as they seem to not getting sent out at all. Do you have any idea what the problem here could be?

We would be really glad if you can help us getting our mailman up and running, as I currently have to send emails out by "hand". I will happily provide you with more details about our configuration, just let me know what you need.

Kind regards,

Jonas

Jonas Felber <jonas.felber@vis.ethz.ch <mailto:jonas.felber@vis.ethz.ch>> Aktuar & CTF Präsident

VIS - Verein der Informatik Studierenden an der ETH Zürich CAB E 31, Universitätstr. 6, ETH Zentrum, CH-8092 Zürich https://www.vis.ethz.ch <https://www.vis.ethz.ch/>

Attachments:

signature.asc (application/pgp-signature — 833 bytes)

Show replies by date

Abhilash Raj

April 2018

7:36 a.m.

On Mon, Apr 23, 2018, at 10:41 PM, Jonas Felber wrote:

...

Hi,

We migrated from mailman2 to mailman3 a week ago. Since then we are unable to send messages the our students through the mailing list. Here are some symptoms:

I go to mass subscribe and subscribe 400 emails

I click subscribe, the browser (in my case safari) tries to load the page for a few seconds

The request times out

Once the service becomes responsible again, I see that only part of my emails got subscribed (~50 in this case).

For me this really looks like the emails are sent on the "web-thread" that gets killed once the request times out. Looking at this https://gitlab.com/mailman/postorius/blob/master/src/postorius/views/list.py... <https://gitlab.com/mailman/postorius/blob/master/src/postorius/views/list.py...> seems to confirm my suspicions.

Yes, you are correct. Mass Subscription current just subscribes each address one-by-one and thus times-out.

I have created an issue for this https://gitlab.com/mailman/postorius/issues/264.

...

Some quick overview of our infrastructure: The TCP endpoints are: browser -> haproxy in kubernetes-entry-router -> nginx in kubernetes- ingress -> nginx in mailman -> mailman

I would very much appreciate if you'd share you configuration and/or a blog post on your setup. I have been trying to set Mailman use Kubernetes for a time, but haven't had time to do it.

...

We mitigated the issue by giving it more CPU power and increasing the timeouts in one of the nginx endpoints. We also have a problem with sending out emails to "large" (>50) groups, as they seem to not getting sent out at all. Do you have any idea what the problem here could be?

Which MTA are you using? What do you see in mailman's logs and MTA's logs?

I haven't seen this kind of problem before with Mailman 3. This very list is hosted using Mailman 3 and I believe has more than 50 subscribers.

...

We would be really glad if you can help us getting our mailman up and running, as I currently have to send emails out by "hand". I will happily provide you with more details about our configuration, just let me know what you need.

Kind regards,

Jonas

Jonas Felber <jonas.felber@vis.ethz.ch <mailto:jonas.felber@vis.ethz.ch>> Aktuar & CTF Präsident

VIS - Verein der Informatik Studierenden an der ETH Zürich CAB E 31, Universitätstr. 6, ETH Zentrum, CH-8092 Zürich https://www.vis.ethz.ch <https://www.vis.ethz.ch/>

Mailman-users mailing list mailman-users@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/ Email had 1 attachment:

signature.asc 1k (application/pgp-signature)

-- Abhilash Raj maxking@asynchronous.in

Mark Sapiro

2:47 p.m.

On 04/24/2018 12:36 AM, Abhilash Raj wrote:

...

On Mon, Apr 23, 2018, at 10:41 PM, Jonas Felber wrote:

...

...
We also have a problem with sending out emails to "large" (>50) groups, as they seem to not getting sent out at all. Do you have any idea what the problem here could be?

Which MTA are you using? What do you see in mailman's logs and MTA's logs?

I haven't seen this kind of problem before with Mailman 3. This very list is hosted using Mailman 3 and I believe has more than 50 subscribers.

Nor have I seen this issue. This list currently has 158 non-digest members. There are several @python.org Mailman 3 lists with over 100 non-digest members including the active yt-users@python.org list with 301 members, and I have gotten no reports of an issue like this.

Also, there are over 600 lists at <https://lists.fedoraproject.org/>, and I don't think this is an issue there either.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Darren Smith

3:21 p.m.

Our installation at https://lists.rootsweb.ancestry.com has around 33,000 lists and somewhere around 1.5 million members. We have lists with 5,000+ members.

Note that I haven't tried any mass subscriptions. When we upgraded from mailman 2 to mailman 3, we used the old .pck files and ran the import on each of the lists. It did take about 6 weeks to do the full import all 33,000 lists. I'm not joking.

As for sending messages - there isn't any problem sending that many messages out, but then we are also not trying to subscribe list members at the time. We are, though, in the process of importing the old archives from our old mailman 2 system. This doesn't seem to affect the rest of the system.

We have seen some inefficiencies in the new system, such as the front page of postorius - loading all of the information for 200 lists (if the user set the "lists per page" to 200) was bringing down the system. We have our own metadata system to be able to search for lists by state/region/surname (ours is a genealogy mailing list system), so we had to replace the front page with our own "find a list" functionality.

The other main problem was the page where a user can see what lists they subscribe to. That page was also a denial of service if the user had several hundred lists they are subscribed to. And for a genealogy mailing list system, this is common.

We have actually ended up creating our own API layer (only accessible on the local machine) and make certain calls to it for the most painful of the issues. One of these days (when we get a moment to breathe) we are going to post some of our changes to the project so that we can ditch our own API and just use the internal REST API.

To be fair, the inefficiencies are not necessarily with mailman or the rest API itself - but with Postorius making hundreds of calls to load a single page. Though if the API supported loading multiple lists at once instead of having to load them serially, that might help as well.

So while we have not seen any timeout issues on the call to mass subscribe users, we HAVE found several other calls that are effectively a denial of service and have gotten around them ourselves, with plans to contribute to the project in the near future.

-Darren

On Tue, Apr 24, 2018 at 8:47 AM, Mark Sapiro <mark@msapiro.net> wrote:

...

On 04/24/2018 12:36 AM, Abhilash Raj wrote:

...
On Mon, Apr 23, 2018, at 10:41 PM, Jonas Felber wrote:

...
...
We also have a problem with sending out emails to "large" (>50) groups, as they seem to not getting sent out at all. Do you have any idea what the problem here could be?

Which MTA are you using? What do you see in mailman's logs and MTA's logs?

I haven't seen this kind of problem before with Mailman 3. This very list is hosted using Mailman 3 and I believe has more than 50 subscribers.

Nor have I seen this issue. This list currently has 158 non-digest members. There are several @python.org Mailman 3 lists with over 100 non-digest members including the active yt-users@python.org list with 301 members, and I have gotten no reports of an issue like this.

Also, there are over 600 lists at <https://lists.fedoraproject.org/>, and I don't think this is an issue there either.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mailman-users mailing list mailman-users@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

Jonas Felber

3:43 p.m.

Thank you for your support, I got informed that we are now able to send out emails again. The problem was that the crashes we got while mass subscribing led to database corruptions.

(From what I know, the problem was on our side, as we kubernetes did not properly restore some files after restarting so the database got corrupted (apparently we are running now a cleanup script on each restart))

There are still some problems, but I just wanted to send out a quick update.

Thanks for your support,

Jonas

Jonas Felber <jonas.felber@vis.ethz.ch> Aktuar & CTF Präsident

VIS - Verein der Informatik Studierenden an der ETH Zürich CAB E 31, Universitätstr. 6, ETH Zentrum, CH-8092 Zürich https://www.vis.ethz.ch

...

On 24 Apr 2018, at 17:21, Darren Smith <silas.crutherton@gmail.com> wrote:

Our installation at https://lists.rootsweb.ancestry.com has around 33,000 lists and somewhere around 1.5 million members. We have lists with 5,000+ members.

Note that I haven't tried any mass subscriptions. When we upgraded from mailman 2 to mailman 3, we used the old .pck files and ran the import on each of the lists. It did take about 6 weeks to do the full import all 33,000 lists. I'm not joking.

As for sending messages - there isn't any problem sending that many messages out, but then we are also not trying to subscribe list members at the time. We are, though, in the process of importing the old archives from our old mailman 2 system. This doesn't seem to affect the rest of the system.

We have seen some inefficiencies in the new system, such as the front page of postorius - loading all of the information for 200 lists (if the user set the "lists per page" to 200) was bringing down the system. We have our own metadata system to be able to search for lists by state/region/surname (ours is a genealogy mailing list system), so we had to replace the front page with our own "find a list" functionality.

The other main problem was the page where a user can see what lists they subscribe to. That page was also a denial of service if the user had several hundred lists they are subscribed to. And for a genealogy mailing list system, this is common.

We have actually ended up creating our own API layer (only accessible on the local machine) and make certain calls to it for the most painful of the issues. One of these days (when we get a moment to breathe) we are going to post some of our changes to the project so that we can ditch our own API and just use the internal REST API.

To be fair, the inefficiencies are not necessarily with mailman or the rest API itself - but with Postorius making hundreds of calls to load a single page. Though if the API supported loading multiple lists at once instead of having to load them serially, that might help as well.

So while we have not seen any timeout issues on the call to mass subscribe users, we HAVE found several other calls that are effectively a denial of service and have gotten around them ourselves, with plans to contribute to the project in the near future.

-Darren

...
On Tue, Apr 24, 2018 at 8:47 AM, Mark Sapiro <mark@msapiro.net> wrote:

...
On 04/24/2018 12:36 AM, Abhilash Raj wrote: On Mon, Apr 23, 2018, at 10:41 PM, Jonas Felber wrote:

...
...
We also have a problem with sending out emails to "large" (>50) groups, as they seem to not getting sent out at all. Do you have any idea what the problem here could be?

Which MTA are you using? What do you see in mailman's logs and MTA's logs?

I haven't seen this kind of problem before with Mailman 3. This very list is hosted using Mailman 3 and I believe has more than 50 subscribers.

Nor have I seen this issue. This list currently has 158 non-digest members. There are several @python.org Mailman 3 lists with over 100 non-digest members including the active yt-users@python.org list with 301 members, and I have gotten no reports of an issue like this.

Also, there are over 600 lists at <https://lists.fedoraproject.org/>, and I don't think this is an issue there either.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mailman-users mailing list mailman-users@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

Mailman-users mailing list mailman-users@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/

2663

Age (days ago)

2663

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Abhilash Raj
Darren Smith
Jonas Felber
Mark Sapiro