Healthchecks and Django

older
If you delete a list but not the...

Philip Colmer

Jan. 4, 2022

8:12 a.m.

Hello

We have a number of Mailman 3 servers running on AWS infrastructure. To help us maintain that, we're using Route 53 Health Checks to alert us when something stops responding.

For the MM3 servers, we're probing "/mailman3/lists/".

We've observed that when a server gets patched during a maintenance window, it will get rebooted and there is a phase where Django responds to web queries but the underlying MM3 services aren't quite in place yet. As a result, we then get emails from Django telling us about the service not being available.

Given the number of AWS health checkers, that can result in quite a few emails ...

Is there another URL we could use to check the health of a MM3 server without triggering the emails during service startup? Or, if not, is there a way of stopping Django from sending out the emails? We'd get alerted of a problem via AWS anyway, so I'm not too worried about not receiving the Django emails.

Thank you.

Philip

Show replies by date

Stephen J. Turnbull

January 2022

9:36 a.m.

Philip Colmer writes:

...

For the MM3 servers, we're probing "/mailman3/lists/".

You could create a path in urls.py called "250" that does nothing but respond with a 250 status (and maybe a hello-world page). The issue with /mailman3/lists/ is that it queries core for the lists (which, if successful, proves that core is up as a side effect).

...

Is there another URL we could use to check the health of a MM3 server

What does this mean to you? Our Mailman 3 suite consists of three applications, core, Postorius, and HyperKitty, which might or might not be installed, and if installed might or might not be running on the same host. Core in turn depends on the MTA and starts a bunch of qrunners. Any of the above might fall over on its own. (I don't think there's any way to check on the health of a particular qrunner from the net, though.)

...

Or, if not, is there a way of stopping Django from sending out the emails?

There probably is, but I don't know it. Mark might, but if not asking on a Django list would be a better place.

Philip Colmer

2:15 p.m.

On Tue, 4 Jan 2022 at 09:36, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:

...

Philip Colmer writes:

...
For the MM3 servers, we're probing "/mailman3/lists/".

You could create a path in urls.py called "250" that does nothing but respond with a 250 status (and maybe a hello-world page). The issue with /mailman3/lists/ is that it queries core for the lists (which, if successful, proves that core is up as a side effect).

...
Is there another URL we could use to check the health of a MM3 server

What does this mean to you? Our Mailman 3 suite consists of three applications, core, Postorius, and HyperKitty, which might or might not be installed, and if installed might or might not be running on the same host. Core in turn depends on the MTA and starts a bunch of qrunners. Any of the above might fall over on its own. (I don't think there's any way to check on the health of a particular qrunner from the net, though.)

That's a very good point. It would be nice if, at some point, an actual healthcheck endpoint could be included that returned some information about the state of the different applications as a JSON blob. It is increasingly useful to try and spot application problems before users do :)

...

...
Or, if not, is there a way of stopping Django from sending out the emails?

There probably is, but I don't know it. Mark might, but if not asking on a Django list would be a better place.

Thank you.

Regards

Philip

Stephen J. Turnbull

6:39 a.m.

Philip Colmer writes:

...

That's a very good point. It would be nice if, at some point, an actual healthcheck endpoint could be included that returned some information about the state of the different applications as a JSON blob.

This would have to be a separate application, since Mailman core is not designed to be exposed to the Internet (it has no builtin authn/authz, and so needs to be fronted by ssh, the MTA, and the webserver), and the other two are optional. There's a coverage hole in that it's possible that core and the MTA are up (and therefore posting works), but the webserver is down and so an HTTP probe won't work (you really don't want a separate webserver written by say me running on a production host <shiver/>).

Can you point me to docs (preferably an IETF RFC, and don't forget my unicorn pony if the RFC doesn't exist which it probably doesn't :-) on these healthcheck widgets?

Steve

Philip Colmer

8:30 a.m.

On Wed, 5 Jan 2022 at 06:39, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:

...

Philip Colmer writes:

...
That's a very good point. It would be nice if, at some point, an actual healthcheck endpoint could be included that returned some information about the state of the different applications as a JSON blob.

This would have to be a separate application, since Mailman core is not designed to be exposed to the Internet (it has no builtin authn/authz, and so needs to be fronted by ssh, the MTA, and the webserver), and the other two are optional. There's a coverage hole in that it's possible that core and the MTA are up (and therefore posting works), but the webserver is down and so an HTTP probe won't work (you really don't want a separate webserver written by say me running on a production host <shiver/>).

Can you point me to docs (preferably an IETF RFC, and don't forget my unicorn pony if the RFC doesn't exist which it probably doesn't :-) on these healthcheck widgets?

I've found https://datatracker.ietf.org/doc/html/draft-inadarei-api-health-check which is currently a draft.

Regards

Philip

Stephen J. Turnbull

12:47 p.m.

Philip Colmer writes:

...

I've found https://datatracker.ietf.org/doc/html/draft-inadarei-api-health-check which is currently a draft.

Christmas all over again! Thanks!

Steve

1328

Age (days ago)

1329

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Philip Colmer
Stephen J. Turnbull