Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Kind regards, Prasanth Nair
TL;DR: quite likely your system has plenty of memory and is operating normally.
I'm not a Mailman expert, and don't know how much memory it needs to work. However, monitoring memory usage on Linux can be subtle. Linux views memory as a resource to be used. Any time data is read from or written to a file the data is kept in memory, just in case it is needed in the future. This feature fills up the memory with recently used "stuff". If the memory is close to full, then data residing in memory that has not been referenced in a while is discarded to make room for newer stuff. Linux does *not* free up memory until there is some demand for it.
For this reason, Linux servers tend to operate with > 90% of memory "used". In this context, "used" simply means some possibly useful data is in that memory. However, just because Linux is keeping a lot of possible old stuff in memory does not mean the system really needs all of that data in memory.
The purpose of memory is to make the system faster by avoiding reading data from files, using a copy in memory instead. If your system is performing well, you don't need to add memory. If the system is performing poorly, and the bottleneck is CPU cycles, adding memory probably won't help much. If, however, your system is not performing well and the bottleneck appears to be storage, adding memory might improve performance, by allowing the system to avoid paging to the swap disk and perhaps to avoid reading data from files.
I use "vmstat" to begin looking at this. Consider:
# vmstat 1 3 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 311260 133136 315388 0 0 143 39 136 242 4 2 94 1 0 0 0 0 309716 133136 315388 0 0 0 0 199 377 2 1 97 0 0 0 0 0 310512 133136 315388 0 0 0 0 200 365 3 1 97 0 0
This is an idle system, so not much going on. Ignore the first line of data, it is historical. The "si" and "so" columns indicate whether data is being paged out to the swap disk. If this is going on, adding more memory will very likely help performance. The "wa" column is the % of time the system has work to do but is blocked waiting on something (storage) before it can make progress. If your system has a high "wa" percentage, adding memory might improve performance. OTOH if "us" + "sy" is approaching 100% you are running out of CPU cycles. Memory is unlikely to help.
You can find a very nice discussion complete with examples of all this here: https://access.redhat.com/solutions/1160343
-- Stephen
On Tue, Feb 15, 2022 at 5:28 AM Prasanth Nair <prasanth.nair@linaro.org> wrote:
Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Kind regards, Prasanth Nair
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 2/15/22 08:11, Stephen Daniel wrote:
TL;DR: quite likely your system has plenty of memory and is operating normally. ... On Tue, Feb 15, 2022 at 5:28 AM Prasanth Nair <prasanth.nair@linaro.org> wrote:
Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Just for reference, mail.python.org supports 231 MM 2 lists and 182 MM 3 lists with 16 GB of ram and no swap.
The server that supports this list and also the www.list.org web site has 4 GB of ram and no swap.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thank you for the replies from Mark and Stephen.
One of the reasons why Prasanth & I are concerned with this specific system is because the first time memory usage crept up, it got to the point where oomkiller stepped in and, as is usual for that particular mechanism, one of the processes it killed was sshd, thus removing access to the system :(. I know that oomkiller can be tuned to stop it killing processes like sshd but it will, nevertheless, start killing *some* processes if it sees an out of memory situation.
So, as Prasanth said, we doubled the memory available and stuck an alarm on it to trigger when usage exceeded 85% for 5 minutes. I've saved a screenshot of the monitored memory usage over the last couple of weeks:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsU-pybJBdckxWU1A?e=BJY1UA
(It is stored on OneDrive in case you don't recognise the URL format)
We changed the server on 4th February to the larger memory size. On 11th Feb, it triggered the alarm and we rebooted the server. Memory usage is starting to climb again, albeit not at the pace that it was previously.
What is particularly curious for us is that we have four Mailman 3 servers. Two of them are single domain and two are hosting two domains. This server is one of the servers hosting two domains, but it is also the ONLY one of the four servers exhibiting this memory pattern. By comparison, the memory usage for the other server hosting two domains looks like this:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsVyEufkeQqit_cGA?e=ifrJHs
The biggest difference we are aware of with the "problematic" server is that some of the list archives are quite large by comparison with our other servers. The largest is 15K messages, then 2K then 1K. The other three servers have lists with relatively small archives by comparison (double or triple digits at worst).
Finally, and just FYI, we're using PostgreSQL for the database and Xapian for the indexing engine.
Regards
Philip
On Tue, 15 Feb 2022 at 16:36, Mark Sapiro <mark@msapiro.net> wrote:
On 2/15/22 08:11, Stephen Daniel wrote:
TL;DR: quite likely your system has plenty of memory and is operating normally. ... On Tue, Feb 15, 2022 at 5:28 AM Prasanth Nair <prasanth.nair@linaro.org> wrote:
Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Just for reference, mail.python.org supports 231 MM 2 lists and 182 MM 3 lists with 16 GB of ram and no swap.
The server that supports this list and also the www.list.org web site has 4 GB of ram and no swap.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
One small correction:
The largest is 15K messages, then 2K then 1K.
That should be *discussions*, not messages. The message count is way higher than the discussion count.
Philip
On Wed, 16 Feb 2022 at 08:18, Philip Colmer <philip.colmer@linaro.org> wrote:
Thank you for the replies from Mark and Stephen.
One of the reasons why Prasanth & I are concerned with this specific system is because the first time memory usage crept up, it got to the point where oomkiller stepped in and, as is usual for that particular mechanism, one of the processes it killed was sshd, thus removing access to the system :(. I know that oomkiller can be tuned to stop it killing processes like sshd but it will, nevertheless, start killing *some* processes if it sees an out of memory situation.
So, as Prasanth said, we doubled the memory available and stuck an alarm on it to trigger when usage exceeded 85% for 5 minutes. I've saved a screenshot of the monitored memory usage over the last couple of weeks:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsU-pybJBdckxWU1A?e=BJY1UA
(It is stored on OneDrive in case you don't recognise the URL format)
We changed the server on 4th February to the larger memory size. On 11th Feb, it triggered the alarm and we rebooted the server. Memory usage is starting to climb again, albeit not at the pace that it was previously.
What is particularly curious for us is that we have four Mailman 3 servers. Two of them are single domain and two are hosting two domains. This server is one of the servers hosting two domains, but it is also the ONLY one of the four servers exhibiting this memory pattern. By comparison, the memory usage for the other server hosting two domains looks like this:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsVyEufkeQqit_cGA?e=ifrJHs
The biggest difference we are aware of with the "problematic" server is that some of the list archives are quite large by comparison with our other servers. The largest is 15K messages, then 2K then 1K. The other three servers have lists with relatively small archives by comparison (double or triple digits at worst).
Finally, and just FYI, we're using PostgreSQL for the database and Xapian for the indexing engine.
Regards
Philip
On Tue, 15 Feb 2022 at 16:36, Mark Sapiro <mark@msapiro.net> wrote:
On 2/15/22 08:11, Stephen Daniel wrote:
TL;DR: quite likely your system has plenty of memory and is operating normally. ... On Tue, Feb 15, 2022 at 5:28 AM Prasanth Nair <prasanth.nair@linaro.org
wrote:
Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Just for reference, mail.python.org supports 231 MM 2 lists and 182 MM 3 lists with 16 GB of ram and no swap.
The server that supports this list and also the www.list.org web site has 4 GB of ram and no swap.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
I'm not an expert, and you might be better served by posting this question in one of the communities that serves sysadmins; however, my reading says that the oom-killer is invoked when the system cannot allocate more virtual memory, not more physical memory. The graph in your screenshot is physical memory usage.
The first thing I would try is increasing the size of your swap partition. That adds to the amount of available virtual memory, without the need for actually buying more RAM. If you really need more RAM the result will be that the system stays up, oom-killer stays quiet, but performance is bad and the "si" and "so" columns in vmstat get into the triple digits or higher.
On Wed, Feb 16, 2022 at 3:18 AM Philip Colmer <philip.colmer@linaro.org> wrote:
Thank you for the replies from Mark and Stephen.
One of the reasons why Prasanth & I are concerned with this specific system is because the first time memory usage crept up, it got to the point where oomkiller stepped in and, as is usual for that particular mechanism, one of the processes it killed was sshd, thus removing access to the system :(. I know that oomkiller can be tuned to stop it killing processes like sshd but it will, nevertheless, start killing *some* processes if it sees an out of memory situation.
So, as Prasanth said, we doubled the memory available and stuck an alarm on it to trigger when usage exceeded 85% for 5 minutes. I've saved a screenshot of the monitored memory usage over the last couple of weeks:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsU-pybJBdckxWU1A?e=BJY1UA
(It is stored on OneDrive in case you don't recognise the URL format)
We changed the server on 4th February to the larger memory size. On 11th Feb, it triggered the alarm and we rebooted the server. Memory usage is starting to climb again, albeit not at the pace that it was previously.
What is particularly curious for us is that we have four Mailman 3 servers. Two of them are single domain and two are hosting two domains. This server is one of the servers hosting two domains, but it is also the ONLY one of the four servers exhibiting this memory pattern. By comparison, the memory usage for the other server hosting two domains looks like this:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsVyEufkeQqit_cGA?e=ifrJHs
The biggest difference we are aware of with the "problematic" server is that some of the list archives are quite large by comparison with our other servers. The largest is 15K messages, then 2K then 1K. The other three servers have lists with relatively small archives by comparison (double or triple digits at worst).
Finally, and just FYI, we're using PostgreSQL for the database and Xapian for the indexing engine.
Regards
Philip
On Tue, 15 Feb 2022 at 16:36, Mark Sapiro <mark@msapiro.net> wrote:
TL;DR: quite likely your system has plenty of memory and is operating normally. ... On Tue, Feb 15, 2022 at 5:28 AM Prasanth Nair <
On 2/15/22 08:11, Stephen Daniel wrote: prasanth.nair@linaro.org>
wrote:
Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Just for reference, mail.python.org supports 231 MM 2 lists and 182 MM 3 lists with 16 GB of ram and no swap.
The server that supports this list and also the www.list.org web site has 4 GB of ram and no swap.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Thanks, Stephen.
I think that the reason Prasanth & I wanted to check with this list first was to see if anyone was seeing similar behaviour. Our servers are *only* running Mailman 3 and associated packages like PostgreSQL and Postfix.
Regards
Philip
On Wed, 16 Feb 2022 at 09:35, Stephen Daniel <swd@pobox.com> wrote:
I'm not an expert, and you might be better served by posting this question in one of the communities that serves sysadmins; however, my reading says that the oom-killer is invoked when the system cannot allocate more virtual memory, not more physical memory. The graph in your screenshot is physical memory usage.
The first thing I would try is increasing the size of your swap partition. That adds to the amount of available virtual memory, without the need for actually buying more RAM. If you really need more RAM the result will be that the system stays up, oom-killer stays quiet, but performance is bad and the "si" and "so" columns in vmstat get into the triple digits or higher.
On Wed, Feb 16, 2022 at 3:18 AM Philip Colmer <philip.colmer@linaro.org> wrote:
Thank you for the replies from Mark and Stephen.
One of the reasons why Prasanth & I are concerned with this specific system is because the first time memory usage crept up, it got to the point where oomkiller stepped in and, as is usual for that particular mechanism, one of the processes it killed was sshd, thus removing access to the system :(. I know that oomkiller can be tuned to stop it killing processes like sshd but it will, nevertheless, start killing *some* processes if it sees an out of memory situation.
So, as Prasanth said, we doubled the memory available and stuck an alarm on it to trigger when usage exceeded 85% for 5 minutes. I've saved a screenshot of the monitored memory usage over the last couple of weeks:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsU-pybJBdckxWU1A?e=BJY1UA
(It is stored on OneDrive in case you don't recognise the URL format)
We changed the server on 4th February to the larger memory size. On 11th Feb, it triggered the alarm and we rebooted the server. Memory usage is starting to climb again, albeit not at the pace that it was previously.
What is particularly curious for us is that we have four Mailman 3 servers. Two of them are single domain and two are hosting two domains. This server is one of the servers hosting two domains, but it is also the ONLY one of the four servers exhibiting this memory pattern. By comparison, the memory usage for the other server hosting two domains looks like this:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsVyEufkeQqit_cGA?e=ifrJHs
The biggest difference we are aware of with the "problematic" server is that some of the list archives are quite large by comparison with our other servers. The largest is 15K messages, then 2K then 1K. The other three servers have lists with relatively small archives by comparison (double or triple digits at worst).
Finally, and just FYI, we're using PostgreSQL for the database and Xapian for the indexing engine.
Regards
Philip
On Tue, 15 Feb 2022 at 16:36, Mark Sapiro <mark@msapiro.net> wrote:
TL;DR: quite likely your system has plenty of memory and is operating normally. ... On Tue, Feb 15, 2022 at 5:28 AM Prasanth Nair <
On 2/15/22 08:11, Stephen Daniel wrote: prasanth.nair@linaro.org>
wrote:
Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Just for reference, mail.python.org supports 231 MM 2 lists and 182 MM 3 lists with 16 GB of ram and no swap.
The server that supports this list and also the www.list.org web site has 4 GB of ram and no swap.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 2/16/22 00:18, Philip Colmer wrote:
Thank you for the replies from Mark and Stephen.
One of the reasons why Prasanth & I are concerned with this specific system is because the first time memory usage crept up, it got to the point where oomkiller stepped in and, as is usual for that particular mechanism, one of the processes it killed was sshd, thus removing access to the system :(. I know that oomkiller can be tuned to stop it killing processes like sshd but it will, nevertheless, start killing *some* processes if it sees an out of memory situation.
It sounds like there is a memory leak somewhere. This is an issue with some versions of Python. I know both Python 3.10.2 and 3.9.10 were released ahead of schedule to fix this. See https://bugs.python.org/issue46347
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Thu, 17 Feb 2022 at 01:02, Mark Sapiro <mark@msapiro.net> wrote:
On 2/16/22 00:18, Philip Colmer wrote:
Thank you for the replies from Mark and Stephen.
One of the reasons why Prasanth & I are concerned with this specific system is because the first time memory usage crept up, it got to the point where oomkiller stepped in and, as is usual for that particular mechanism, one of the processes it killed was sshd, thus removing access to the system :(. I know that oomkiller can be tuned to stop it killing processes like sshd but it will, nevertheless, start killing *some* processes if it sees an out of memory situation.
It sounds like there is a memory leak somewhere. This is an issue with some versions of Python. I know both Python 3.10.2 and 3.9.10 were released ahead of schedule to fix this. See https://bugs.python.org/issue46347
Thanks, Mark. We're using Python 3.8; I've looked at the open issues and don't immediately see any that might be memory-leak related in a way that would affect Mailman 3.
That said, there are quite a few libraries and other "moving parts" involved in this project so we're going to try and track memory usage over time by process or similar to see if we can see whether or not this is a leak and, if it is, what the likeliest culprit might be.
Philip
On 2/17/22 07:44, Philip Colmer wrote:
Thanks, Mark. We're using Python 3.8; I've looked at the open issues and don't immediately see any that might be memory-leak related in a way that would affect Mailman 3.
One of the MM3 installations I support is running Python 3.8.10 and supports several things in addition to Mailman, and I'm not seeing this issue there.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Philip Colmer writes:
[Nothing about individual processes.] *Something* is using that memory. ps(1) should help identify it when your alarm is triggered, top(1) allows monitoring in realtime. For example, the information about the size of archives is interesting only if HyperKitty is using the memory. I guess it might be indicative of core usage since there's an awful lot of traffic, but core won't be reading archives into memory or anything like that, it doesn't even know or care where the archives are. It only knows where on the Internet to send messages so the archiver that lives there can add them. And I find it hard to imagine why Postorius would be consuming memory.
What is particularly curious for us is that we have four Mailman 3 servers.
Are these servers identically configured, or nearly so? If so, perhaps you can find "interesting" differences in memory usage across them.
Finally, and just FYI, we're using PostgreSQL for the database and Xapian for the indexing engine.
Either of those would be a weird cause. The point of an RDBMS is to store data in mass storage, although I suppose it's possible to write queries that would return huge quantites of data in memory. And the indexer is not a long-running process, as far as I know. So it should run, exit, and release memory. Again, if it's related to one of those it should turn up in ps.
participants (5)
-
Mark Sapiro
-
Philip Colmer
-
Prasanth Nair
-
Stephen Daniel
-
Stephen J. Turnbull