[MM3-users] Re: Mailman backend maintenance task

Aug. 24, 2021


      Dear Steve:
I really appreciate you also took your time to provide more insights on
how Mailman works, and how their different components interact each
other.
There are two "minor" tasks and one major task that we would need to
accomplish within PostGRESQL, namely rebuilding the index related to
hyperkitty_email table primary key, rebuilding also another index on
the very same table supporting an uniqueness constraint (these would be
the minor); and running vacuum full on the whole database to
effectively reduce the size of precisely hyperkitty_email table, which
is the largest table in this database, and if possible retrieve back to
the SAN the extra space we had to add to cope with that very table size
increasing.
The way vacuum full locks the tables are, among all the locking
mechanisms provided by PostGRESQL to guarantee the transactional
integrity, one of the strongest, if not the strongest one (as far as I
know, the tables are more ore less recreated when running vacuum full).
The idea we were considering was more or less stopping all of the
components, then starting up only PostGRESQL, and then performing each
one of the mentioned tasks. Actually our PG release allow "online"
indices rebuild (drop + rebuild concurrently), but we did not want to
assume the risk if, for whatever reason, new records were added to the
table breaking the uniqueness of both constraints (fixing this cases
turns out to be a real pain in most cases). Alternatively, we could
lock the table ourselves to get more or less the same effect, true,
although, after all, the final impact on potential connections willing
to run DML would be the same. In other words, rather than having lots
of transactions coming from whether core component processes or web
component processes fail, we prefer to stop everything. But it's got of
course the worst downside: we would receive
Let me please now go back to your kind explanation to see if I did get
things correctly. You mention that regardless the connectivity that
Mailman's core processes might have with the database, the emails will
be received by and queued until the connection is back again and they
can be delivered to the distribution lists' members. Could you please
confirm whether this task would be actually delegated to Postfix (or
any MTA integrated with Mailman)? If so, this is actually what we need:
we could afford delaying the reception a couple of days, whilst we are
sure that no mail will be lost.
Once more, thanks a lot for your kind help. Best regards.
-----"Stephen J. Turnbull" <[1]stephenjturnbull@gmail.com> wrote: -----
To: [2]eugenio.jordan@esa.int
From: "Stephen J. Turnbull" <[3]stephenjturnbull@gmail.com>
Date: 08/24/2021 05:33PM
Cc: [4]mailman-users@mailman3.org
Subject: [MM3-users] Mailman backend maintenance task
Hi, Eugenio,
I wrote this earlier but am in the middle of moving my office so my
mail has been intermittent.  I have been following your discussion
with Abhilash, and he's definitely the expert, especially if you are
using the containers he creates and distributes.  But there's some
information here he hasn't mentioned yet.
Please consider the following discussion to be background that allows
you to understand some of the considerations.  I've only run the
Mailman 3 suite with all three running constantly on the same host, so
I have no experience with this kind of issue.
[5]eugenio.jordan@esa.int writes:
> Our customer is currently using PostGRESQL as backend, and we would
> like to perform some maintenance tasks, namely running vacuum full,
> or at least trying to rebuild hyperkitty_email primary key related
> index. We have been asked on the real impact of putting in place
> such initiative. Though the latter is related to archiving, I
> haven't found a way to stop just Hyperkitty or Django related
> processes other than stopping Mailman's core, hence preventing
> mails addressed to distribution lists from being delivered, could
> you please confirm if I am correct?
Mailman core can certainly run without either Postorius or HyperKitty.
Controlling Mailman core (moderation, helping users) without Postorius
is annoying, but it can be done.  If you stop the HyperKitty process,
what should happen, I believe, is that posts for archiving will
accumulate in the 'archive' queue until HyperKitty is available again.
It's been a while since I studied this so I could be completely wrong,
but as I understand it HyperKitty and Postorius are not daemon
processes.  Rather, they are WSGI applications, which means they are
subprocesses spawned by your webserver, and controlled by it.  In
order to keep them from running, you would reconfigure the webserver
to not call those WSGI applications.  How that is done is specific to
the webserver and the WSGI module.  If you are running from Docker
containers, most likely, you can just stop their containers.
The larger problem is that Mailman core uses a RDBMS.  Normally both
Django and Mailman are referring to the same RDBMS, PostgreSQL in your
case.  I'm not familiar with the vaccuum operation; if it requires
taking the whole RDBMS down, and Mailman is running on that RDBMS,
you're out of luck.  Mailman can accept posts and queue them, but it
can't deliver them to subscribers without access to the RDBMS tables.
If it's just a matter of locking some tables while other are
available, then it should work because the tables used by core and
HyperKitty are disjoint as far as I know.  (I think Postorius and
HyperKitty both use Django's user tables for authorization, so at
least for those tables both Postorius and HyperKitty will have to be
down, but core can continue running because its database is completely
separate.)
> Regarding the former, as far as I have read, the "mappings" lists
> -> addresses are stored just in the database, so if we run some
> kind of procedure or task like vacuum which will lock exclusively
> tables, or want anyway to have the database stopped for a cold
> backup or whatever, Mailman willl not work, that is, again the
> mails addressed to the distribution lists will not be
> delivered. Will you please confirm this point, too?
That depends on how "vacuum" works.  If you can tell the RDBMS not to
lock Mailman core's tables, it can continue to run.  Definitely if the
database is not running, Mailman will continue to receive and store
the posts, but it won't be able to distribute mail to subscribers
until its tables are available again.  At that point the queued posts
will be processed normally, except that if there's a large backlog,
they won't go out in a quick burst, it may take a while.
One possible worry, depending on how you are connected to the
Internet, is that your provider may interpret the sudden burst of
outgoing mail when Mailman comes back on line as spam or some other
sort of mischief.  Mailman has no way to throttle this: it feeds
messages to the MTA until the MTA says "stop".  Then it waits until
the MTA is ready again.
Regards,
Steve
This message is intended only for the recipient(s) named above. It may contain p
roprietary information and/or
protected content. Any unauthorised disclosure, use, retention or dissemination
is prohibited. If you have received
this e-mail in error, please notify the sender immediately. ESA applies appropri
ate organisational measures to protect
personal data, in case of data privacy queries, please contact the ESA Data Prot
ection Officer (dpo@esa.int).
References

mailto:stephenjturnbull@gmail.com
mailto:eugenio.jordan@esa.int
mailto:stephenjturnbull@gmail.com
mailto:mailman-users@mailman3.org
mailto:eugenio.jordan@esa.int

[MM3-users] Re: Mailman backend maintenance task

Eugenio Jordan (external)