-----Original Message----- From: Stephen J. Turnbull <steve@turnbull.jp> Sent: Thursday, November 13, 2025 7:35 AM Hirayama, Pat writes:
I finally looked at the host (rather than the logs of each container) and realized that the oom-killer was killing django-admin.
I don't have a django-admin process in my installation as far as I know. (I didn't check to see if some process renames itself 'django-admin' though.) Are you sure that's what got killed?
Pretty sure: Nov 12 00:08:00 lists kernel: [60897.148840] systemd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 Nov 12 00:08:00 lists kernel: [60897.149647] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/ docker/bae673e621f3e3c380c360ca891c5737803384f8b87006af73bb9a679909cea6,task=django-admin,pid=552788,uid=8
Sure enough to focus my attention on my django container.
As a temporary workaround, I've limited memory on the mailman-django-uwsgi container to 4 GiB (though, since I just did that this morning, I won't know until after midnight (presumably) if it worked or not), and set the RestartPolicy to always.
What do you mean by "work"? If you've got a process blowing past 4GB that's going probably going to die of ENOMEM. I hope it doesn't manage to try to allocate 10GB.
It seems to take time to have an impact. The instance remains available for several hours -- usually becoming unavailable while I'm sleeping.
I'm thinking that this points to a memory leak or some kind?
I would think not. Something's allocating gobs of memory and it's not getting collected, but I doubt it's a process forgetting to delete garbage. I think it's just a runaway.
That's a good suggestion.
FWIW, the instance has remained available all night, so limiting memory on the container seems to be working for now. <snip>
I have seen reports that uwsgi systems use a lot more memory than gunicorn systems. I don't have hands on to confirm or analyze why, though. I'm not sure using Whoosh (instead of Xapian, Elastic Search, or SOLR) is a good idea -- I found it to be *extremely* slow on initial indexing of a system with lots of archives migrated from Mailman 2, and I wouldn't be surprised if that uses a lot of memory (since then I have stuck to Xapian, so no confirmation or analysis).
I'll take a look at that. Thanks, Steven!
-p
Pat Hirayama Pronouns: he/him/his Systems Engineer IT | Systems Engineering - Infrastructure Fred Hutch Cancer Center O 206.667.4856
phirayam@fredhutch.org