Hyperkitty messages stuck in spool

Andrew Hodgson

Nov. 4, 2021

12:06 p.m.

Hi.

I am seeing an issue whereby messages get stuck in the /opt/mailman/var/archives/hyperkitty/spool directory. I can sometimes clear these by restarting mailman-web and sending another message through to the list but I am having trouble identifying the root cause.

This is something that started happening moving to a new host running latest Debian. Using venv rather than Debian packages.

Any suggestions? Thanks. Andrew.

Show replies by date

Mark Sapiro

November 2021

2:37 p.m.

On 11/4/21 5:06 AM, Andrew Hodgson wrote:

...

Hi.

I am seeing an issue whereby messages get stuck in the /opt/mailman/var/archives/hyperkitty/spool directory.

This happens when Mailman's attempt to archive the message gets a non-success status from HyperKitty. The pickled message object gets stored in the above spool/ directory. Each subsequent message to archive retries the messages in the spool.

The information from HyperKitty about the non-success is logged in the /opt/mailman/var/logs/mailman.log file and messages tn the spool can be examined with the mailman qfile command.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Andrew Hodgson

2:38 p.m.

Mark Sapiro wrote:

...

On 11/4/21 5:06 AM, Andrew Hodgson wrote:

...
Hi.

I am seeing an issue whereby messages get stuck in the /opt/mailman/var/archives/hyperkitty/spool directory.

This happens when Mailman's attempt to archive the message gets a non-success status from HyperKitty. The pickled message object gets stored in the above spool/ directory. Each subsequent message to archive retries the messages in the spool.

Ok thanks for this. Its happening because the Gunicorn process running the Django server is being killed due to OOM conditions. Is there any settings I can change to troubleshoot/throttle this in Gunicorn or Django?

Thanks. Andrew.

Mark Sapiro

5:01 p.m.

On 11/18/21 6:38 AM, Andrew Hodgson wrote:

...

Ok thanks for this. Its happening because the Gunicorn process running the Django server is being killed due to OOM conditions. Is there any settings I can change to troubleshoot/throttle this in Gunicorn or Django?

I don't think there's anything that can be done in gunicorn or django about this. This is an OS issue. You need to give the server more memory or allocate a (bigger) swap file.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Stephen J. Turnbull

8:50 a.m.

Mark Sapiro writes:

...

On 11/18/21 6:38 AM, Andrew Hodgson wrote:

...
Ok thanks for this. Its happening because the Gunicorn process running the Django server is being killed due to OOM conditions. Is there any settings I can change to troubleshoot/throttle this in Gunicorn or Django?

I don't think there's anything that can be done in gunicorn or django about this. This is an OS issue. You need to give the server more memory or allocate a (bigger) swap file.

If those things are difficult, depending on OS and the hardware, it may also be possible to tell the OS that gunicorn is an import process that should not be killed. But in the long run, the server needs more memory.

In theory there could be a memory leak in Python or one of the Mailman apps, but I haven't heard of it. And at least the Linux kernel seems to be perfectly happy to kill random processes that are well-behaved. So the fact gunicorn is getting killed is not strong evidence that it's causing the memory pressure, we'd really need to see the stats.

Steve

Andrew Hodgson

1:47 p.m.

Stephen J. Turnbull wrote:

...

Mark Sapiro writes:

...
On 11/18/21 6:38 AM, Andrew Hodgson wrote:

...
Ok thanks for this. Its happening because the Gunicorn process > > running the Django server is being killed due to OOM > > conditions. Is there any settings I can change to > > troubleshoot/throttle this in Gunicorn or Django?

I don't think there's anything that can be done in gunicorn or django > about this. This is an OS issue. You need to give the server more memory > or allocate a (bigger) swap file.

If those things are difficult, depending on OS and the hardware, it may also be possible to tell the OS that gunicorn is an import process that should not be killed. But in the long run, the server needs more memory.

Thanks for this it was due to changing cloud providers. My old provider I installed Debian from an ISO file which created the swap partition but on the new provider I used their cloud image with no swap. Swap now created and everything is back to normal. Its running on an instance with 2GB of RAM and based on how I am seeing the usage running I would see this as an absolute minimum for a standard Mailman 3 stack.

Thanks. Andrew.

Mark Dadgar

4:20 p.m.

On Nov 19, 2021, at 5:47 AM, Andrew Hodgson <andrew@hodgson.io> wrote:

...

Stephen J. Turnbull wrote:

...
Mark Sapiro writes:

...
On 11/18/21 6:38 AM, Andrew Hodgson wrote:

...
Ok thanks for this. Its happening because the Gunicorn process > > running the Django server is being killed due to OOM > > conditions. Is there any settings I can change to > > troubleshoot/throttle this in Gunicorn or Django?

I don't think there's anything that can be done in gunicorn or django > about this. This is an OS issue. You need to give the server more memory > or allocate a (bigger) swap file.

If those things are difficult, depending on OS and the hardware, it may also be possible to tell the OS that gunicorn is an import process that should not be killed. But in the long run, the server needs more memory.

Thanks for this it was due to changing cloud providers. My old provider I installed Debian from an ISO file which created the swap partition but on the new provider I used their cloud image with no swap. Swap now created and everything is back to normal. Its running on an instance with 2GB of RAM and based on how I am seeing the usage running I would see this as an absolute minimum for a standard Mailman 3 stack.

I had this same experience on Digital Ocean. The short-term fix was enabling swap and the long-term fix was upping my service level to something with more memory. Without swap, I don’t think 2GB is realistically sufficient to run a full mm3 instance.

The cloud providers really don’t like you to run swap because it hammers their SSDs.

Mark

mark@pdc-racing.net | 408-348-2878

Seth Seeger

4:32 p.m.

...

On Nov 19, 2021, at 11:20 AM, Mark Dadgar <mark@pdc-racing.net> wrote:

On Nov 19, 2021, at 5:47 AM, Andrew Hodgson <andrew@hodgson.io> wrote:

...
Thanks for this it was due to changing cloud providers. My old provider I installed Debian from an ISO file which created the swap partition but on the new provider I used their cloud image with no swap. Swap now created and everything is back to normal. Its running on an instance with 2GB of RAM and based on how I am seeing the usage running I would see this as an absolute minimum for a standard Mailman 3 stack.

I had this same experience on Digital Ocean. The short-term fix was enabling swap and the long-term fix was upping my service level to something with more memory. Without swap, I don’t think 2GB is realistically sufficient to run a full mm3 instance.

I'm running a Linode VM with 2 cores and 4 GB ram with 9 lists that are not high traffic most of the time. I would say this is the very minimum of power. (Yes, I'm using Docker containers which may be adding some overhead.) I'll need to increase my service when the list traffic increases post-pandemic.

Seth

Ruth Ivimey-Cook

9:02 p.m.

There may be some benefit in exploring these notes on configuring the linux memory managers:

https://gist.github.com/JPvRiel/bcc5b20aac0c9cce6eefa6b88c125e03

My thought in suggesting this is that if you can put some back pressure on the processes running, to indicate that memory is indeed at a premium, you may be able to extract more from the hardware you have. For example, use 'ulimit' to prevent specific processes seeing all the memory they would otherwise see.

In any case, to address the last point Mark makes, I would suggest adding quite limited amount of swap - perhaps 200-300MB, rather than 1-2GB, for the same reason. You give the kernel somewhere to put things at need without giving it the scope to go crazy!

Finally, I would suggest going through the installation and remove everything you can manage that's not critical to running the mailing list. For example, mysql often runs many threads with lots of buffers, and on my MM install its virtual memory size (number of bytes it theoretically could use) is just under 3GB. It's 'resident' size (the amount it's actually using now) is however just 139MB, with another 2.3GB shared (with other processes on the system -- think c-lib, tls etc). Tuning mysql to a smaller memory footprint would probably result in several hundred MB lower use.

Similarly, the same VM is running 'snapd', which claims a virtual size of 1.2GB and a resident size of 32K; perhaps it's not necessary to run snapd permanently or even have it installed? It's also using 'lxcfs' the linux containers framework, which while using very little resident is still claiming 290MB virtual space.

My point is that all these virtual allocations add up, and while the kernel will overcommit (to account for memory requested but never used) and it tries hard, it cannot know which things to prioritise. Making some of the choices for it can only help.

FWIW, my mail server VM is set up with 4 virtual CPUs, 5GB RAM (of which about 1.7G is actively used and the rest is file block buffer, some of which will be for shared code files) and 4GB swap (of which it's using 260K). It's running a fairly default setup of mariadb, dovecot, mailman3, exim4, fail2ban, and spamassassin, and is quite stable. NB I don't run postorius or django on this server... they're elsewhere. I do recall adding more RAM a while back, having initially allocated 2GB, and at the same time adding the swap space; it looks from these numbers the swap isn't needed and I could if I wanted have just given it 3GB rather than 5GB RAM (but the server has 64GB and I'm not bothered!).

HTH,

Ruth

On 19/11/2021 16:20, Mark Dadgar wrote:

...

On Nov 19, 2021, at 5:47 AM, Andrew Hodgson <andrew@hodgson.io> wrote:

...
Thanks for this it was due to changing cloud providers. My old provider I installed Debian from an ISO file which created the swap partition but on the new provider I used their cloud image with no swap. Swap now created and everything is back to normal. Its running on an instance with 2GB of RAM and based on how I am seeing the usage running I would see this as an absolute minimum for a standard Mailman 3 stack.

I had this same experience on Digital Ocean. The short-term fix was enabling swap and the long-term fix was upping my service level to something with more memory. Without swap, I don’t think 2GB is realistically sufficient to run a full mm3 instance.

The cloud providers really don’t like you to run swap because it hammers their SSDs.

Mark

mark@pdc-racing.net | 408-348-2878

-- Software Manager & Engineer Tel: 01223 414180 Blog: http://www.ivimey.org/blog LinkedIn: http://uk.linkedin.com/in/ruthivimeycook/

1374

Age (days ago)

1389

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Andrew Hodgson
Mark Dadgar
Mark Sapiro
Ruth Ivimey-Cook
Seth Seeger
Stephen J. Turnbull

Hyperkitty messages stuck in spool

tags

participants (6)