Which folder should I share between pods with kubernetees persistentVolumeClaim ?

antoine.depoisier＠ikmail.com

Feb. 6, 2025

10:37 a.m.

I use mailman with kubernetees, and I have four pods.

I currently use persistentVolumeClaim for two folders in the mailman configuration "cache" and "messages". I share these folders, as mailman needs to be consistent for these two folders, but I'm wondering if I should share other folders as well, like locks and such.

There are all the folders in mailman archives cache data etc lists locks logs master.pid messages queue templates

I ask this question because I often have a problem when creating a mailing list. I think I have a problem with the locks, but I'm not sure if the problem is due to the fact that the locks folder is not shared.

If you want more details, here's the trace I sometimes get, when someone tries to create a mailing list:

File "falcon/app.py", line 365, in falcon.app.App.__call__ File "/usr/lib/python3.12/site-packages/mailman/rest/lists.py", line 322, in on_post mlist = create_list(**validator(request)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/mailman/app/lifecycle.py", line 102, in create_list call_name(config.mta.incoming).create(mlist) File "/usr/lib/python3.12/site-packages/mailman/mta/postfix.py", line 96, in create self.regenerate() File "/usr/lib/python3.12/site-packages/mailman/mta/postfix.py", line 106, in regenerate with Lock(lock_file): ^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/flufl/lock/_lockfile.py", line 467, in __exit__ self.unlock() File "/usr/lib/python3.12/site-packages/flufl/lock/_lockfile.py", line 417, in unlock raise NotLockedError('Already unlocked') flufl.lock._lockfile.NotLockedError: Already unlocked

When I check the lock folder, I have these two files, and I'm wondering if I should delete these two files : -rw-rw---- 2 mailman nogroup 97 Feb 6 2025 master.lck -rw-rw---- 2 mailman nogroup 97 Feb 6 2025 master.lck|mailman-core-deployment-5f59557b66-bkkf4|16|4268829230172994873

Do you have any answer for me ?

Show replies by date

Stephen J. Turnbull

February 2025

12:14 p.m.

antoine.depoisier--- via Mailman-users writes:

...

I use mailman with kubernetees, and I have four pods.

Mailman is known to work in Docker, but the frequently used configuration is basically divided by application: all of the core mail- and list-handling processes in one container, the web administration processes in another, the archiver in another, and perhaps the RDBMS and MTA each have their own. So I don't understand what you mean by "with Kubernetes", and definitely not the node topology. Are you running any of the 5 applications just mentioned across multiple nodes?

...

I currently use persistentVolumeClaim for two folders in the mailman configuration "cache" and "messages". I share these folders, as mailman needs to be consistent for these two folders, but I'm wondering if I should share other folders as well, like locks and such.

All of the processes need access to locks and to their relevant queue directories. I can't tell you offhand how which processes need access to which directories.

...

I ask this question because I often have a problem when creating a mailing list. I think I have a problem with the locks, but I'm not sure if the problem is due to the fact that the locks folder is not shared.

I would guess it's related. It's certainly very bad that a file that is supposed to be locked appears to be unlocked when the time comes for the lock owner to unlock it:

...

flufl.lock._lockfile.NotLockedError: Already unlocked

[...]

...

When I check the lock folder, I have these two files, and I'm wondering if I should delete these two files : -rw-rw---- 2 mailman nogroup 97 Feb 6 2025 master.lck -rw-rw---- 2 mailman nogroup 97 Feb 6 2025 master.lck|mailman-core-deployment-5f59557b66-bkkf4|16|4268829230172994873

Don't do that. If you want those to go away, stop Mailman. As Obi-wan Kenobi said, "These are not the locks you are looking for." The lock in your trace is specific to the Postfix routing files, The master.lck is used by the master process that controls all that others.

...

Do you have any answer for me ?

Need to know the topology of the network of nodes hosting Mailman processes.

Steve

antoine.depoisier＠ikmail.com

1:45 p.m.

Hi,

Thank you for your answer, I think also that it became from lock, but I'll give you more details about my infrastructure. First, my nodes are all an instance of mailman-core only.

I have an existing MTA that handle all email, and sent them to a proxy, that dispatch all email between the existing pods. Then, email are transferred to the MTA and MTA send them.

I also have a custom webapp and API, that sent all request for managing a mailing list to the service that dispatch the request between all pods.

I don't use the archiver.

I only have one RDBMS that all pod use.

For the folder queue, I'm not sure if I should share this folder between all pods, because one instance of mailman core mean one worker if I'm right, and if that's the case, I want to have many workers because I will have high traffic. I don't know exactly how mailman-core manage its queue, but if you think that I can add this file in a persistentVolumeClaim, I'll try it :D

Stephen J. Turnbull

5:15 p.m.

antoine.depoisier--- via Mailman-users writes:

...

First, my nodes are all an instance of mailman-core only. I have an existing MTA that handle all email, and sent them to a proxy, that dispatch all email between the existing pods. Then, email are transferred to the MTA and MTA send them.

So you are distributing mailman core across multiple nodes? And "proxy" = "load balancer" (such as HAProxy)?

That is probably subject to a number of problems, unless you do things that I'm not sure how to do. For example, when a user account is disabled by bounces, every so often Mailman will send a "hey, are you alive again?" message to the user. If enough of those bounce, the user gets unsubscribed. The problem is that that status is recorded in the PostgreSQL database that all Mailman instances have access to, and I think they'll probably all send those messages. At best there's a race condition where more than one might send the message.

Under some conditions moderators may get daily messages about bounces. I suspect they would get one from each Mailman instance.

I think digests will be broken unless you share the lists subfolder, because each Mailman instance will accumulate only the messages it processes, so chances are good on an active list that digest subscribers will get multiple partial digests when they should get only one.

As I'll describe below, Mailman tends to spawn a lot of processes, many of which don't have much work to do. Now you're dividing very little work across multiple nodes, which seems very wasteful.

You haven't said where your MTA lives. If it's Postfix, it needs to share the $data_dir folder with a Mailman instance that is responsible for list creation. Every time you restart Mailman it recreates the files in that folder, so if it's shared among Mailman instances there will be delays due to file locking.

So unless you have an insane amount of list traffic to process (say, a G7 national government ;-), I wonder if the multi-instance approach is necessary. Mailman is not implemented to be operated that way -- you're asking for trouble. The design is such that I imagine it can be done with careful tuning, but current default configuration didn't consider such a use case. You don't need to answer, you know your use case and I don't, but you may save yourself a lot of trouble by just running with a little more resources on a single node.

Mailman uses a lot of memory just to get started (about 2GB unless you're really aggressive and do unsupported tuning of which runners are started and what modules are loaded), but then it easily scales without increasing resources except CPU to some extent. For example I've worked on a system that processes hundreds of thousands of incoming posts a day on a single Linode (2vCPU, 16GB) running core, Postorius, HyperKitty, Xapian, and nginx (PostgreSQL got its own VM for some reason). CPU usage on that system never gets above 25%, active memory usage generally 20-25% and there's usually a substantial amount free despite Linux's strategy of aggressively caching files, load average normally around 2.5 and never more than 5. The only tuning we did there was to bump the out queue's slices to 8 and in queue to 2, but all running on that same Linode (which pushes Mailman's memory usage to over 2GB). Running "ls -R queue" gives all empty subfolders about 2/3 of the time.

...

For the folder queue, I'm not sure if I should share this folder between all pods, because one instance of mailman core mean one worker if I'm right,

No, one instance of Mailman core means a minimum of 15 processes (one master and 14 runners) in the default configuration. About half of those have nothing to do most of the time. You can probably whittle that 15 down to 11 with some care at a cost of a certain amount of functionality.

Most runners have their own queues as subfolders of 'queue'. Each queue consists of message files with times derived from the timestamp of creation and a hash. Each runner processes its queue in order of the timestamps. When its task is complete, it passes the message to the next runner in sequence by moving the file into the appropriate subfolder (with the same filename).

By the nature of email, each message is independent of all the others. So we can process them in parallel, as long as there is a way to assign one and only one runner to each message in a queue. The way we do that is to take the top N bits of the hash component of the filename, which we call a slice. Thus each queue has 1 or 2 or 4 or 8, etc slices. To configure multiple slices for the out runner (usually the bottleneck because it's the one that talks almost directly to the network[1]), add

[runner.out] instances: 4

to your mailman.cfg and restart.

That's what I recommend you do.

Footnotes: [1] At least Postfix optimizes relaying by opening a connection to the remote smtpd while still talking to Mailman, and only queues the file on disk if the remote connection fails.

Mark Sapiro

12:54 a.m.

On 2/6/25 02:37, antoine.depoisier--- via Mailman-users wrote:

...

There are all the folders in mailman archives cache data etc lists locks logs master.pid messages queue templates

Here's my opinion.

archives - If the prototype archiver is enabled for a list, its posts are archived here in maildir format. This should be shared for consistency, but if you don't care about this, you can put enable: no in the [archiver.prototype] section of mailman.cfg to disable it.

cache - you're already sharing

data - doesn't need to be shared unless it contains a sqlite mailman.db that's actually used.

etc - doesn't need to be shared.

lists - should be shared as it contains the mailboxes (digest.mmdf) for accumulating messages for a list's digest

locks - should be shared

master.pid - must not be shared as it contains the pip of the master process running on that node.

messages - you're already sharing which you should.

queue - this is tricky. If it isn't shared once a message gets queued in virgin/ or in/ on a node, only that node's runners will process it through its queues. This may have an impact on load balancing.

templates - doesn't need to be shared, but should be synced between nodes.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

255

Age (days ago)

256

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

antoine.depoisier＠ikmail.com
Mark Sapiro
Stephen J. Turnbull