Mailman3-Web lots of database queries
Hey!
I recently set up a mailman3 instance and it works quite nice. I use the debian packages from buster. After running the thing for a few days, I realized an unexpected network load (my postgres database is on a separate machine). Turning the postgres log on revealed that mailman3 constantly queries the database:
[...] 2018-11-17 16:33:15.417 UTC [8304] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:15.385901+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:15.663 UTC [8305] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:15.621505+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:15.664 UTC [8306] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:15.624011+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:15.908 UTC [8308] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:15.870547+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:15.910 UTC [8307] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:15.871611+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.152 UTC [8310] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.116762+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.154 UTC [8309] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.114878+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.390 UTC [8311] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.358211+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.393 UTC [8312] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.359704+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.634 UTC [8313] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.596602+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.636 UTC [8314] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.598296+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.882 UTC [8315] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.842795+00:00'::timestamptz) LIMIT 1 2018-11-17 16:33:16.884 UTC [8316] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:16.841916+00:00'::timestamptz) LIMIT 1 [...]
Is this behaviour intended?
Stefan
On 11/17/18 8:36 AM, Stefan Tatschner wrote:
Hey!
I recently set up a mailman3 instance and it works quite nice. I use the debian packages from buster. After running the thing for a few days, I realized an unexpected network load (my postgres database is on a separate machine). Turning the postgres log on revealed that mailman3 constantly queries the database:
[...] 2018-11-17 16:33:15.417 UTC [8304] mailman3web@mailman3web LOG: statement: SELECT "django_q_ormq"."id", "django_q_ormq"."key", "django_q_ormq"."payload", "django_q_ormq"."lock" FROM "django_q_ormq" WHERE ("django_q_ormq"."key" = 'default' AND "django_q_ormq"."lock" < '2018-11-17T16:32:15.385901+00:00'::timestamptz) LIMIT 1 ...
Is this behaviour intended?
No. This is not normal. This is apparently coming from Django qcluster. At least it is qcluster that uses the django_q_ormq table, but these repeated queries are not normal, at least I don't see them in the sites I administer.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Sun, 2018-11-18 at 14:38 -0800, Mark Sapiro wrote:
No. This is not normal. This is apparently coming from Django qcluster. At least it is qcluster that uses the django_q_ormq table, but these repeated queries are not normal, at least I don't see them in the sites I administer.
Thank you for the pointer to qcluster. I don't know so much about these web components and their relationships. According to the documentation 1 it seems that qcluster is not needed as long as cronjobs are configured. Since debian ships the entries in /etc/cron.d [2, 3] I disabled the qcluster components.
Also there is a debian package issue; qcluster is started twice when using the provided service files:
- the uwsgi script ini starts qcluster
- the mailman3-qcluster service starts it as well
I know the debian thing is out of scope here, I just want it documented somewhere.
Stefan
On Mon, 2018-11-19 at 07:31 +0100, Stefan Tatschner wrote:
Since debian ships the entries in /etc/cron.d [2, 3] I disabled the qcluster components.
I think have misread the documentation and I assume this component is needed. :)
I will continue digging to find the root cause of these database load.
Stefan
On Sun, 2018-11-18 at 14:38 -0800, Mark Sapiro wrote:
No. This is not normal. This is apparently coming from Django qcluster. At least it is qcluster that uses the django_q_ormq table, but these repeated queries are not normal, at least I don't see them in the sites I administer.
These calls are normal, since qcluster polls the database at a high rate:
https://django-q.readthedocs.io/en/latest/configure.html#poll
Fortunately this setting can be reduced, but yeah, polling happens though.
Stefan
On Thu, Nov 22, 2018, at 2:14 AM, Stefan Tatschner wrote:
On Sun, 2018-11-18 at 14:38 -0800, Mark Sapiro wrote:
No. This is not normal. This is apparently coming from Django qcluster. At least it is qcluster that uses the django_q_ormq table, but these repeated queries are not normal, at least I don't see them in the sites I administer.
These calls are normal, since qcluster polls the database at a high rate:
https://django-q.readthedocs.io/en/latest/configure.html#poll
Fortunately this setting can be reduced, but yeah, polling happens though.
So, if you are using Django's ORM to keep the task states in Django Q, this is bound to happen depending on how frequently do you run tasks. In Hyperkitty, we have some minutely tasks, which means the polling is going to be high.
You can however, configure Django Q to store the tasks outside of database in a better high-speed in-memory database like Redis for example1 for queuing the tasks. Other options like MongoDB, Amazon SQS is also available.
Stefan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- thanks, Abhilash Raj (maxking)
On Mon, 2018-12-03 at 10:01 -0800, Abhilash Raj wrote:
So, if you are using Django's ORM to keep the task states in Django Q, this is bound to happen depending on how frequently do you run tasks. In Hyperkitty, we have some minutely tasks, which means the polling is going to be high.
You can however, configure Django Q to store the tasks outside of database in a better high-speed in-memory database like Redis for example[1] for queuing the tasks. Other options like MongoDB, Amazon SQS is also available.
Is it really necessary to have these high complexity? Reducing database load by introducing another database seems wrong to me.
Stefan
On Mon, Dec 3, 2018, at 11:44 AM, Stefan Tatschner wrote:
On Mon, 2018-12-03 at 10:01 -0800, Abhilash Raj wrote:
So, if you are using Django's ORM to keep the task states in Django Q, this is bound to happen depending on how frequently do you run tasks. In Hyperkitty, we have some minutely tasks, which means the polling is going to be high.
You can however, configure Django Q to store the tasks outside of database in a better high-speed in-memory database like Redis for example[1] for queuing the tasks. Other options like MongoDB, Amazon SQS is also available.
Is it really necessary to have these high complexity? Reducing database load by introducing another database seems wrong to me.
Redis is not exactly a database, it is used as a message broker.
The ORM backend in DjangoQ exists for low performance workloads where you can afford to poll the database that frequently and not have negative affects on rest of your system. It also lets you avoid installing a message broker like Redis.
If your system does handle a ton of traffic, then it is recommended to use Redis or similar higher-performant systems.
-- thanks, Abhilash Raj (maxking)
On Mon, 2018-12-03 at 11:56 -0800, Abhilash Raj wrote:
The ORM backend in DjangoQ exists for low performance workloads where you can afford to poll the database that frequently and not have negative affects on rest of your system. It also lets you avoid installing a message broker like Redis.
If your system does handle a ton of traffic, then it is recommended to use Redis or similar higher-performant systems.
My system does handle very low traffic (at max 10 mails per day). I noticed the database polls, since after migrating from mailman2 to mailman3 my VPN traffic between the web frontend and the database server increased like crazy.
This architecture might scale wonderfully for large and distributed systems, but it is questionable for low traffic sites. Maintaining multiple databases (or call it broker) for a few small mailing lists is just too much.
Stefan
On Mon, Dec 3, 2018, at 12:04 PM, Stefan Tatschner wrote:
On Mon, 2018-12-03 at 11:56 -0800, Abhilash Raj wrote:
The ORM backend in DjangoQ exists for low performance workloads where you can afford to poll the database that frequently and not have negative affects on rest of your system. It also lets you avoid installing a message broker like Redis.
If your system does handle a ton of traffic, then it is recommended to use Redis or similar higher-performant systems.
My system does handle very low traffic (at max 10 mails per day). I noticed the database polls, since after migrating from mailman2 to mailman3 my VPN traffic between the web frontend and the database server increased like crazy.
This architecture might scale wonderfully for large and distributed systems, but it is questionable for low traffic sites. Maintaining multiple databases (or call it broker) for a few small mailing lists is just too much.
And I agree to that :)
As the current system relies on polling and cron to run some routine tasks, it is quite in-efficient when it comes to low-usage systems. We can fix that with a more event-based system, and that is on my things to change in Hyperkitty.
Stefan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- thanks, Abhilash Raj (maxking)
Stefan Tatschner writes:
This architecture might scale wonderfully for large and distributed systems, but it is questionable for low traffic sites. Maintaining multiple databases (or call it broker) for a few small mailing lists is just too much.
But the problem this thread is talking about arises precisely when you use a simple single-database approach. I'm not saying there is no way to optimize the current architecture, but this is what we have.
Unfortunately the original developers did the work for their employer, and no longer have that support. Looking at the git log, every year since support ended there's a spate of commits in the summer. If you want to write up an RFE in a paragraph or so and post it to the gitlab tracker, this might make a good Google Summer of Code proposal -- we'd have support for an intern and the experts would be more available for consultation. We didn't participate last year, but we've had good success in getting projects approved in the past.
Note that if we get GSoC support, the *student* is responsible for planning detailed requirements and design. (For example, there may be better backend databases, or we could implement a cache of some kind in HyperKitty itself. That level of detail isn't needed yet.) I believe that there's also an active third party project to work on the UI of HyperKitty, so maybe we can make this the Summer of HyperKitty.
Steve
-- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN
On Wed, 2018-12-05 at 10:29 +0900, Stephen J. Turnbull wrote:
If you want to write up an RFE in a paragraph or so and post it to the gitlab tracker, this might make a good Google Summer of Code proposal -- we'd have support for an intern and the experts would be more available for consultation. We didn't participate last year, but we've had good success in getting projects approved in the past.
Here you are: https://gitlab.com/mailman/hyperkitty/issues/208
Stefan
On 12/3/18 12:04 PM, Stefan Tatschner wrote:
My system does handle very low traffic (at max 10 mails per day). I noticed the database polls, since after migrating from mailman2 to mailman3 my VPN traffic between the web frontend and the database server increased like crazy.
Relevant to this thread, see <https://gitlab.com/mailman/hyperkitty/issues/207>.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Abhilash Raj
-
Mark Sapiro
-
Stefan Tatschner
-
Stephen J. Turnbull