I'm not an expert, and you might be better served by posting this question in one of the communities that serves sysadmins; however, my reading says that the oom-killer is invoked when the system cannot allocate more virtual memory, not more physical memory. The graph in your screenshot is physical memory usage.
The first thing I would try is increasing the size of your swap partition. That adds to the amount of available virtual memory, without the need for actually buying more RAM. If you really need more RAM the result will be that the system stays up, oom-killer stays quiet, but performance is bad and the "si" and "so" columns in vmstat get into the triple digits or higher.
On Wed, Feb 16, 2022 at 3:18 AM Philip Colmer <philip.colmer@linaro.org> wrote:
Thank you for the replies from Mark and Stephen.
One of the reasons why Prasanth & I are concerned with this specific system is because the first time memory usage crept up, it got to the point where oomkiller stepped in and, as is usual for that particular mechanism, one of the processes it killed was sshd, thus removing access to the system :(. I know that oomkiller can be tuned to stop it killing processes like sshd but it will, nevertheless, start killing *some* processes if it sees an out of memory situation.
So, as Prasanth said, we doubled the memory available and stuck an alarm on it to trigger when usage exceeded 85% for 5 minutes. I've saved a screenshot of the monitored memory usage over the last couple of weeks:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsU-pybJBdckxWU1A?e=BJY1UA
(It is stored on OneDrive in case you don't recognise the URL format)
We changed the server on 4th February to the larger memory size. On 11th Feb, it triggered the alarm and we rebooted the server. Memory usage is starting to climb again, albeit not at the pace that it was previously.
What is particularly curious for us is that we have four Mailman 3 servers. Two of them are single domain and two are hosting two domains. This server is one of the servers hosting two domains, but it is also the ONLY one of the four servers exhibiting this memory pattern. By comparison, the memory usage for the other server hosting two domains looks like this:
https://1drv.ms/u/s!Aq7dn8GDGLdCodsVyEufkeQqit_cGA?e=ifrJHs
The biggest difference we are aware of with the "problematic" server is that some of the list archives are quite large by comparison with our other servers. The largest is 15K messages, then 2K then 1K. The other three servers have lists with relatively small archives by comparison (double or triple digits at worst).
Finally, and just FYI, we're using PostgreSQL for the database and Xapian for the indexing engine.
Regards
Philip
On Tue, 15 Feb 2022 at 16:36, Mark Sapiro <mark@msapiro.net> wrote:
TL;DR: quite likely your system has plenty of memory and is operating normally. ... On Tue, Feb 15, 2022 at 5:28 AM Prasanth Nair <
On 2/15/22 08:11, Stephen Daniel wrote: prasanth.nair@linaro.org>
wrote:
Hi,
We are having an issue with high memory usage on our Mailman3 server. Initially, our server had 16GB RAM and the memory usage went up from 5% to 92% within a week. So, we increased the memory capacity to 32GB RAM and we can see that the memory usage is going up in the same pattern.
Any idea why it is happening? or any suggestions?
Just for reference, mail.python.org supports 231 MM 2 lists and 182 MM 3 lists with 16 GB of ram and no swap.
The server that supports this list and also the www.list.org web site has 4 GB of ram and no swap.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/