Database deadlock when trying to delete a member.
Hi,
I’ve got a strange error. I migrated from mailman 2.1 to 3 using the migration guide on the website and things have been working fine. But I’m in the process of testing the new syncmembers command to replace some inhouse mailman API code we wrote. I’ve tracked my problem down to what appears to be a deadlock in the postgresql database. When trying to delete certain users the SQL query hangs and never returns.
After doing some tracing and tracking it down to the database, I enabled DB logging of all queries and saw that the following query was never returning:
DELETE FROM member WHERE member.id = 166814
If I try to run it manually I get the same issue, in psql it never returns. mailman=# explain analyze DELETE FROM member WHERE member.id = 166814; [… 60 seconds passes…] ^CCancel request sent ERROR: canceling statement due to user request CONTEXT: while deleting tuple (2998,72) in relation "member" I’m just wondering if there are any commands or scripts I can run to verify the database and it’s constraints to see if there is some error in the database data? Or where you’d recommend I go from here? It only impacts the user on this specific list. If I add this user to another list and remove them, everything works fine.
This is on a test system so I can do various testing. I’m trying to find the root cause so that I can verify my production system isn’t impacted by the same thing.
Setup Details: OS: Oracle Linux 7 (Redhat) Installed using pip via virtualenv talking to a local pgsql database on the same box. Data was migrated from a mailman 2.1 install into mailman3 using migration instructions on the website.
PIP Modules versions: django-mailman3 1.3.5 mailman 3.3.3 mailman-hyperkitty 1.1.0 mailmanclient 3.3.2 postorius 1.3.4 psycopg2-binary 2.8.5
RPM packages: postgresql12-12.6-1PGDG.rhel7.x86_64 postgresql12-server-12.6-1PGDG.rhel7.x86_64 postgresql12-libs-12.6-1PGDG.rhel7.x86_64 postgresql12-devel-12.6-1PGDG.rhel7.x86_64
-Simon
Hi Simon,
On Mar 17, 2021, at 10:16 PM, Simon Coggins <s.coggins@cqu.edu.au> wrote:
Hi,
I’ve got a strange error. I migrated from mailman 2.1 to 3 using the migration guide on the website and things have been working fine. But I’m in the process of testing the new syncmembers command to replace some inhouse mailman API code we wrote. I’ve tracked my problem down to what appears to be a deadlock in the postgresql database. When trying to delete certain users the SQL query hangs and never returns.
After doing some tracing and tracking it down to the database, I enabled DB logging of all queries and saw that the following query was never returning:
DELETE FROM member WHERE member.id = 166814
If I try to run it manually I get the same issue, in psql it never returns. mailman=# explain analyze DELETE FROM member WHERE member.id = 166814; [… 60 seconds passes…] ^CCancel request sent ERROR: canceling statement due to user request CONTEXT: while deleting tuple (2998,72) in relation "member" I’m just wondering if there are any commands or scripts I can run to verify the database and it’s constraints to see if there is some error in the database data? Or where you’d recommend I go from here? It only impacts the user on this specific list. If I add this user to another list and remove them, everything works fine.
Are you able to retrieve that Member object using API?
This1 the Python code which would try to delete a Member object and I am assuming is being called from syncmembers command. I am not entire sure about what would cause the database to hang, if there was some database constraint that failed, I would imagine some error to be returned.
One thing you can try is see if you can unsubscribe the user using the API or
mailman shell
command.
You can unsubscribe from API by calling a DELETE
method at
http://localhost:9001/3.1/members/<member_id>
This would give us some indication about whether there is some bug in the implementation of syncmembers command or if there is something else going on.
This is on a test system so I can do various testing. I’m trying to find the root cause so that I can verify my production system isn’t impacted by the same thing.
Setup Details: OS: Oracle Linux 7 (Redhat) Installed using pip via virtualenv talking to a local pgsql database on the same box. Data was migrated from a mailman 2.1 install into mailman3 using migration instructions on the website.
PIP Modules versions: django-mailman3 1.3.5 mailman 3.3.3 mailman-hyperkitty 1.1.0 mailmanclient 3.3.2 postorius 1.3.4 psycopg2-binary 2.8.5
What is the version of sqlalchemy package? There is a new 1.4.0 version that came out recently and is incompatible with Core right now. There are backwards incompatible changes which might need some changes in Core to add support.
RPM packages: postgresql12-12.6-1PGDG.rhel7.x86_64 postgresql12-server-12.6-1PGDG.rhel7.x86_64 postgresql12-libs-12.6-1PGDG.rhel7.x86_64 postgresql12-devel-12.6-1PGDG.rhel7.x86_64
-Simon
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- thanks, Abhilash Raj (maxking)
On 18 Mar 2021, at 5:05 pm, Abhilash Raj <maxking@asynchronous.in> wrote:
Hi Simon,
On Mar 17, 2021, at 10:16 PM, Simon Coggins <s.coggins@cqu.edu.au> wrote:
Hi,
I’ve got a strange error. I migrated from mailman 2.1 to 3 using the migration guide on the website and things have been working fine. But I’m in the process of testing the new syncmembers command to replace some inhouse mailman API code we wrote. I’ve tracked my problem down to what appears to be a deadlock in the postgresql database. When trying to delete certain users the SQL query hangs and never returns.
After doing some tracing and tracking it down to the database, I enabled DB logging of all queries and saw that the following query was never returning:
DELETE FROM member WHERE member.id = 166814
If I try to run it manually I get the same issue, in psql it never returns. mailman=# explain analyze DELETE FROM member WHERE member.id = 166814; [… 60 seconds passes…] ^CCancel request sent ERROR: canceling statement due to user request CONTEXT: while deleting tuple (2998,72) in relation "member" I’m just wondering if there are any commands or scripts I can run to verify the database and it’s constraints to see if there is some error in the database data? Or where you’d recommend I go from here? It only impacts the user on this specific list. If I add this user to another list and remove them, everything works fine.
Are you able to retrieve that Member object using API?
This1 the Python code which would try to delete a Member object and I am assuming is being called from syncmembers command. I am not entire sure about what would cause the database to hang, if there was some database constraint that failed, I would imagine some error to be returned.
Ok I think I’m getting closer. It appears to be an open transaction that is locking the database record. While mailman is running, if I try to execute "DELETE FROM member WHERE member.id = 166507;” on the mailman table in psql the query hangs and never returned. But if I shutdown mailman and try to run the query again, it executed immediately. So I’m pretty sure there is an open transaction that is causing the entry to be locked.
One thing you can try is see if you can unsubscribe the user using the API or
mailman shell
command.You can unsubscribe from API by calling a
DELETE
method athttp://localhost:9001/3.1/members/<member_id>
This would give us some indication about whether there is some bug in the implementation of syncmembers command or if there is something else going on.
Trying via shell or via API to delete the user all fail and hang. I’m able to receive the user/membership/list data without any issue both via mailman command and via the API.
This is on a test system so I can do various testing. I’m trying to find the root cause so that I can verify my production system isn’t impacted by the same thing.
Setup Details: OS: Oracle Linux 7 (Redhat) Installed using pip via virtualenv talking to a local pgsql database on the same box. Data was migrated from a mailman 2.1 install into mailman3 using migration instructions on the website.
PIP Modules versions: django-mailman3 1.3.5 mailman 3.3.3 mailman-hyperkitty 1.1.0 mailmanclient 3.3.2 postorius 1.3.4 psycopg2-binary 2.8.5
What is the version of sqlalchemy package? There is a new 1.4.0 version that came out recently and is incompatible with Core right now. There are backwards incompatible changes which might need some changes in Core to add support.
SQLAlchemy 1.3.19
RPM packages: postgresql12-12.6-1PGDG.rhel7.x86_64 postgresql12-server-12.6-1PGDG.rhel7.x86_64 postgresql12-libs-12.6-1PGDG.rhel7.x86_64 postgresql12-devel-12.6-1PGDG.rhel7.x86_64
-Simon
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- thanks, Abhilash Raj (maxking)
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
Simon Coggins, Principal Systems Engineer Digital Services Directorate CQUniversity, Lvl 6.14, 160 Ann Street Brisbane, QLD, 4000 Email: s.coggins@cqu.edu.au Web: http://www.cqu.edu.au/
Just to add to this. Looking at the active queries in PostgreSQL there appear to be a bunch of commits that haven't finished. I don't know how it all fits together but it looks like the commits aren't finishing and transactions are staying open, so when I hit an open transaction all write operations for the DB hang until the transaction has finished.
postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
pid | age | usename |
query
--------+-----------------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15767 | 00:01:58.30353 | mailman | SELECT member.id AS member_id, member._member_id AS member__member_id, member.role AS member_role, member.list_id AS member_list_id, member.moderation_action AS member_moderation_action, mem
ber.address_id AS member_address_id, member.preferences_id AS member_preferences_id, member.user_id AS member_user_id, member.bounce_score AS member_bounce_score, member.last_bounce_received AS member_last_bounce_received, member.la
st_warning_sent AS member_last_warning_sent, member.total_warnings_sent AS member_total_warnings_sent +
| | | FROM member JOIN mailinglist ON member.list_id = mailinglist.list_id JOIN preferences ON preferences.id = member.preferences_id
+
| | | WHERE mailinglist.process_bounces = true AND member.total_warnings_sent >= mailinglist.bounce_you_are_disabled_warnings AND preferences.delivery_status = 3
18033 | 00:05:28.255636 | mailman | DELETE FROM member WHERE member.id = 169331
15762 | 00:11:42.842237 | mailman | COMMIT
15740 | 00:11:59.12324 | mailman | COMMIT
15760 | 00:12:01.212976 | mailman | COMMIT
15757 | 00:12:01.542144 | mailman | COMMIT
15755 | 00:12:01.708328 | mailman | COMMIT
15754 | 00:12:01.804737 | mailman | COMMIT
15751 | 00:12:01.921644 | mailman | COMMIT
15745 | 00:12:02.72934 | mailman | COMMIT
15743 | 00:12:02.795885 | mailman | COMMIT
15742 | 00:12:02.948795 | mailman | COMMIT
15738 | 00:12:03.3471 | mailman | COMMIT
15664 | 00:12:08.480084 | mailman | COMMIT
(21 rows)
I've checked my production system and I can see the same COMMIT queries running. Some for 2 days (since the last restart).
This is helpful.
The two pending queries looks like the job of bounce runner. Since it runs every 2m to check if any Member’s bounce score is above the threshold and if any warning needs to be sent or delivery needs to be disabled.
It looks to me that the database query has been running for 1min 58sec, which is longer than I’d expect. Are you seeing that query *always* there when you dump the list of queries?
I am not sure what the empty COMMIT queries are really.
Abhilash
On Mar 18, 2021, at 5:31 PM, s.coggins@cqu.edu.au wrote:
Just to add to this. Looking at the active queries in PostgreSQL there appear to be a bunch of commits that haven't finished. I don't know how it all fits together but it looks like the commits aren't finishing and transactions are staying open, so when I hit an open transaction all write operations for the DB hang until the transaction has finished.
postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, query FROM pg_stat_activity WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' ORDER BY query_start desc; pid | age | usename |
query--------+-----------------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15767 | 00:01:58.30353 | mailman | SELECT member.id AS member_id, member._member_id AS member__member_id, member.role AS member_role, member.list_id AS member_list_id, member.moderation_action AS member_moderation_action, mem ber.address_id AS member_address_id, member.preferences_id AS member_preferences_id, member.user_id AS member_user_id, member.bounce_score AS member_bounce_score, member.last_bounce_received AS member_last_bounce_received, member.la st_warning_sent AS member_last_warning_sent, member.total_warnings_sent AS member_total_warnings_sent + | | | FROM member JOIN mailinglist ON member.list_id = mailinglist.list_id JOIN preferences ON preferences.id = member.preferences_id
+ | | | WHERE mailinglist.process_bounces = true AND member.total_warnings_sent >= mailinglist.bounce_you_are_disabled_warnings AND preferences.delivery_status = 3
18033 | 00:05:28.255636 | mailman | DELETE FROM member WHERE member.id = 169331 15762 | 00:11:42.842237 | mailman | COMMIT 15740 | 00:11:59.12324 | mailman | COMMIT 15760 | 00:12:01.212976 | mailman | COMMIT 15757 | 00:12:01.542144 | mailman | COMMIT 15755 | 00:12:01.708328 | mailman | COMMIT 15754 | 00:12:01.804737 | mailman | COMMIT 15751 | 00:12:01.921644 | mailman | COMMIT 15745 | 00:12:02.72934 | mailman | COMMIT 15743 | 00:12:02.795885 | mailman | COMMIT 15742 | 00:12:02.948795 | mailman | COMMIT 15738 | 00:12:03.3471 | mailman | COMMIT 15664 | 00:12:08.480084 | mailman | COMMIT (21 rows)
I've checked my production system and I can see the same COMMIT queries running. Some for 2 days (since the last restart).
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- thanks, Abhilash Raj (maxking)
You are correct it’s the bounce runner locking the tables. If I kill -15 the bounce runner, the session lock releases and my sync continues correctly.
├─python3,60157 /opt/mailman/venv/bin/runner -C /etc/mailman.cfg --runner=bounces:0:1
So something in the bounce runner is not closing transactions correctly from the looks of it.
-Simon
[signature_201226327]<https://www.cqu.edu.au/> Mr Simon Coggins Principal Systems Engineer | Digital Services Directorate CQUniversity Australia, Level 6.13, 160 Ann Street, Brisbane, QLD 4000 P +61 7 3295 1182 | X 51182 | M 0408 115 861 | E s.coggins@cqu.edu.au<mailto:.coggins@cqu.edu.au> [cquconnect]<https://www.cqu.edu.au/social-media> I respectfully acknowledge the Traditional Owners of the land on which we work and learn, and pay respect to the First Nations Peoples and their Elders, past, present and future.
This communication may contain privileged or confidential information. If you have received this in error, please return to sender and delete. CRICOS: 00219C | RTO Code 40939
From: Abhilash Raj <maxking@asynchronous.in> Date: Friday, 19 March 2021 at 4:58 pm To: Simon Coggins <s.coggins@cqu.edu.au> Cc: "mailman-users@mailman3.org" <mailman-users@mailman3.org> Subject: [EXTERNAL] Re: [MM3-users] Database deadlock when trying to delete a member.
This is an external email from: maxking@asynchronous.in
This is helpful.
The two pending queries looks like the job of bounce runner. Since it runs every 2m to check if any Member’s bounce score is above the threshold and if any warning needs to be sent or delivery needs to be disabled.
It looks to me that the database query has been running for 1min 58sec, which is longer than I’d expect. Are you seeing that query *always* there when you dump the list of queries?
I am not sure what the empty COMMIT queries are really.
Abhilash
On Mar 18, 2021, at 5:31 PM, s.coggins@cqu.edu.au wrote:
Just to add to this. Looking at the active queries in PostgreSQL there appear to be a bunch of commits that haven't finished. I don't know how it all fits together but it looks like the commits aren't finishing and transactions are staying open, so when I hit an open transaction all write operations for the DB hang until the transaction has finished.
postgres=# SELECT pid, age(clock_timestamp(), query_start), usename, query FROM pg_stat_activity WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%' ORDER BY query_start desc; pid | age | usename | query
--------+-----------------+-------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
15767 | 00:01:58.30353 | mailman | SELECT member.id AS member_id, member._member_id AS member__member_id, member.role AS member_role, member.list_id AS member_list_id, member.moderation_action AS member_moderation_action, mem ber.address_id AS member_address_id, member.preferences_id AS member_preferences_id, member.user_id AS member_user_id, member.bounce_score AS member_bounce_score, member.last_bounce_received AS member_last_bounce_received, member.la st_warning_sent AS member_last_warning_sent, member.total_warnings_sent AS member_total_warnings_sent + | | | FROM member JOIN mailinglist ON member.list_id = mailinglist.list_id JOIN preferences ON preferences.id = member.preferences_id
- | | | WHERE mailinglist.process_bounces = true AND member.total_warnings_sent >= mailinglist.bounce_you_are_disabled_warnings AND preferences.delivery_status = 3 18033 | 00:05:28.255636 | mailman | DELETE FROM member WHERE member.id = 169331 15762 | 00:11:42.842237 | mailman | COMMIT 15740 | 00:11:59.12324 | mailman | COMMIT 15760 | 00:12:01.212976 | mailman | COMMIT 15757 | 00:12:01.542144 | mailman | COMMIT 15755 | 00:12:01.708328 | mailman | COMMIT 15754 | 00:12:01.804737 | mailman | COMMIT 15751 | 00:12:01.921644 | mailman | COMMIT 15745 | 00:12:02.72934 | mailman | COMMIT 15743 | 00:12:02.795885 | mailman | COMMIT 15742 | 00:12:02.948795 | mailman | COMMIT 15738 | 00:12:03.3471 | mailman | COMMIT 15664 | 00:12:08.480084 | mailman | COMMIT (21 rows)
I've checked my production system and I can see the same COMMIT queries running. Some for 2 days (since the last restart).
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/<https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org>
-- thanks, Abhilash Raj (maxking)
Ok after bit tracking I’ve worked out that the bouncer is dangling transactions when it’s sending warning messages. When https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#... is called and it get to update last_warn_sent it does the following in the DB:
2021-03-23 14:42:50.622 AEST [11693] LOG: statement: UPDATE member SET last_warning_sent='2021-03-23T04:42:50.622015'::timestamp, total_warnings_sent=1 WHERE member.id = 169331
But it never COMMITs this transaction. So all write operations are now hung waiting for the transction to be closed as the member record has been modified. As a test I added config.db.commit() after https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#... and restarted. And my mailman syncmembers worked perfectly and I didn’t have any hangs it correctly removed everyone it should have. In the database logs I can see the COMMIT.
I’m pretty sure this is not the right spot for a config.db.commit() but hopefully this helps point to what might be going wrong.
-Simon
[signature_201226327]<https://www.cqu.edu.au/> Mr Simon Coggins Principal Systems Engineer | Digital Services Directorate CQUniversity Australia, Level 6.13, 160 Ann Street, Brisbane, QLD 4000 P +61 7 3295 1182 | X 51182 | M 0408 115 861 | E s.coggins@cqu.edu.au<mailto:.coggins@cqu.edu.au> [cquconnect]<https://www.cqu.edu.au/social-media> I respectfully acknowledge the Traditional Owners of the land on which we work and learn, and pay respect to the First Nations Peoples and their Elders, past, present and future.
This communication may contain privileged or confidential information. If you have received this in error, please return to sender and delete. CRICOS: 00219C | RTO Code 40939
From: Simon Coggins <s.coggins@cqu.edu.au> Date: Tuesday, 23 March 2021 at 11:01 am To: Abhilash Raj <maxking@asynchronous.in> Cc: "mailman-users@mailman3.org" <mailman-users@mailman3.org> Subject: Re: [EXTERNAL] Re: [MM3-users] Database deadlock when trying to delete a member.
You are correct it’s the bounce runner locking the tables. If I kill -15 the bounce runner, the session lock releases and my sync continues correctly.
├─python3,60157 /opt/mailman/venv/bin/runner -C /etc/mailman.cfg --runner=bounces:0:1
So something in the bounce runner is not closing transactions correctly from the looks of it.
-Simon
[signature_201226327]<https://www.cqu.edu.au/> Mr Simon Coggins Principal Systems Engineer | Digital Services Directorate CQUniversity Australia, Level 6.13, 160 Ann Street, Brisbane, QLD 4000 P +61 7 3295 1182 | X 51182 | M 0408 115 861 | E s.coggins@cqu.edu.au<mailto:.coggins@cqu.edu.au> [cquconnect]<https://www.cqu.edu.au/social-media> I respectfully acknowledge the Traditional Owners of the land on which we work and learn, and pay respect to the First Nations Peoples and their Elders, past, present and future.
This communication may contain privileged or confidential information. If you have received this in error, please return to sender and delete. CRICOS: 00219C | RTO Code 40939
From: Abhilash Raj <maxking@asynchronous.in> Date: Friday, 19 March 2021 at 4:58 pm To: Simon Coggins <s.coggins@cqu.edu.au> Cc: "mailman-users@mailman3.org" <mailman-users@mailman3.org> Subject: [EXTERNAL] Re: [MM3-users] Database deadlock when trying to delete a member.
This is an external email from: maxking@asynchronous.in
This is helpful.
The two pending queries looks like the job of bounce runner. Since it runs every 2m to check if any Member’s bounce score is above the threshold and if any warning needs to be sent or delivery needs to be disabled.
It looks to me that the database query has been running for 1min 58sec, which is longer than I’d expect. Are you seeing that query *always* there when you dump the list of queries?
I am not sure what the empty COMMIT queries are really.
Abhilash
On 3/22/21 9:53 PM, Simon Coggins wrote:
Ok after bit tracking I’ve worked out that the bouncer is dangling transactions when it’s sending warning messages. When https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#... is called and it get to update last_warn_sent it does the following in the DB:
2021-03-23 14:42:50.622 AEST [11693] LOG: statement: UPDATE member SET last_warning_sent='2021-03-23T04:42:50.622015'::timestamp, total_warnings_sent=1 WHERE member.id = 169331
But it never COMMITs this transaction. So all write operations are now hung waiting for the transction to be closed as the member record has been modified. As a test I added config.db.commit() after https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#... and restarted. And my mailman syncmembers worked perfectly and I didn’t have any hangs it correctly removed everyone it should have. In the database logs I can see the COMMIT.
I’m pretty sure this is not the right spot for a config.db.commit() but hopefully this helps point to what might be going wrong.
Thank you for all the research on this. It is extremely helpful.
Instead of adding a commit after line 277 please try adding the @transactional decorator to the send_warnings_and_remove method at line 210
@transactional
@dbconnection
def send_warnings_and_remove(self, store):
I think that is the proper fix. Let us know if that works.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks I’ll give that a try. Also in my tracing of the code I noticed https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/interfaces/data... it starts with a single quote and ends with triple quotes so the single quote is never terminated.
-Simon
From: Mark Sapiro <mark@msapiro.net> Date: Wednesday, 24 March 2021 at 4:10 am To: "mailman-users@mailman3.org" <mailman-users@mailman3.org> Subject: [MM3-users] Re: [EXTERNAL] Re: Database deadlock when trying to delete a member.
On 3/22/21 9:53 PM, Simon Coggins wrote:
Ok after bit tracking I’ve worked out that the bouncer is dangling transactions when it’s sending warning messages. When https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L258<https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L258> is called and it get to update last_warn_sent it does the following in the DB:
2021-03-23 14:42:50.622 AEST [11693] LOG: statement: UPDATE member SET last_warning_sent='2021-03-23T04:42:50.622015'::timestamp, total_warnings_sent=1 WHERE member.id = 169331
But it never COMMITs this transaction. So all write operations are now hung waiting for the transction to be closed as the member record has been modified. As a test I added config.db.commit() after https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L277<https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L277> and restarted. And my mailman syncmembers worked perfectly and I didn’t have any hangs it correctly removed everyone it should have. In the database logs I can see the COMMIT.
I’m pretty sure this is not the right spot for a config.db.commit() but hopefully this helps point to what might be going wrong.
Thank you for all the research on this. It is extremely helpful.
Instead of adding a commit after line 277 please try adding the @transactional decorator to the send_warnings_and_remove method at line 210
@transactional @dbconnection def send_warnings_and_remove(self, store):
I think that is the proper fix. Let us know if that works.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/<https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org>
On 3/23/21 3:14 PM, Simon Coggins wrote:
Thanks I’ll give that a try. Also in my tracing of the code I noticed https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/interfaces/data... it starts with a single quote and ends with triple quotes so the single quote is never terminated.
Actually, that's a typo, but it's not an error. in Python, two strings without an intervening operator are just concatenated, so
"Interface for creating new databases."""
is just
"Interface for creating new databases." concatenated with "" (the null string).
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 3/23/21 3:23 PM, Mark Sapiro wrote:
Actually, that's a typo, but it's not an error. in Python, two strings without an intervening operator are just concatenated, so
"Interface for creating new databases."""
is just
"Interface for creating new databases." concatenated with "" (the null string).
I looked and found a total of 3 which are being fixed by <https://gitlab.com/mailman/mailman/-/merge_requests/809>
You can see these are really inconsequential with something like
$ python3 Python 3.9.1 (default, Feb 6 2021, 11:54:30) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.
print("Interface for creating new databases.""") Interface for creating new databases.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
After making that change and doing some testing. So far I’ve not had any issues and the sync are working as expected now.
-Simon
From: Mark Sapiro <mark@msapiro.net> Date: Wednesday, 24 March 2021 at 4:10 am To: "mailman-users@mailman3.org" <mailman-users@mailman3.org> Subject: [MM3-users] Re: [EXTERNAL] Re: Database deadlock when trying to delete a member.
On 3/22/21 9:53 PM, Simon Coggins wrote:
Ok after bit tracking I’ve worked out that the bouncer is dangling transactions when it’s sending warning messages. When https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L258<https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L258> is called and it get to update last_warn_sent it does the following in the DB:
2021-03-23 14:42:50.622 AEST [11693] LOG: statement: UPDATE member SET last_warning_sent='2021-03-23T04:42:50.622015'::timestamp, total_warnings_sent=1 WHERE member.id = 169331
But it never COMMITs this transaction. So all write operations are now hung waiting for the transction to be closed as the member record has been modified. As a test I added config.db.commit() after https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L277<https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L277> and restarted. And my mailman syncmembers worked perfectly and I didn’t have any hangs it correctly removed everyone it should have. In the database logs I can see the COMMIT.
I’m pretty sure this is not the right spot for a config.db.commit() but hopefully this helps point to what might be going wrong.
Thank you for all the research on this. It is extremely helpful.
Instead of adding a commit after line 277 please try adding the @transactional decorator to the send_warnings_and_remove method at line 210
@transactional @dbconnection def send_warnings_and_remove(self, store):
I think that is the proper fix. Let us know if that works.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/<https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org>
I’ve been running with this change as you suggested for a few weeks now in our test environment and I’ve not hit any issues. So this fixes my problem. Will this change be added to mailman core or should I look at manually patching during my upgrades?
-Simon
From: Mark Sapiro <mark@msapiro.net> Date: Wednesday, 24 March 2021 at 4:10 am To: "mailman-users@mailman3.org" <mailman-users@mailman3.org> Subject: [MM3-users] Re: [EXTERNAL] Re: Database deadlock when trying to delete a member.
On 3/22/21 9:53 PM, Simon Coggins wrote:
Ok after bit tracking I’ve worked out that the bouncer is dangling transactions when it’s sending warning messages. When https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L258<https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L258> is called and it get to update last_warn_sent it does the following in the DB:
2021-03-23 14:42:50.622 AEST [11693] LOG: statement: UPDATE member SET last_warning_sent='2021-03-23T04:42:50.622015'::timestamp, total_warnings_sent=1 WHERE member.id = 169331
But it never COMMITs this transaction. So all write operations are now hung waiting for the transction to be closed as the member record has been modified. As a test I added config.db.commit() after https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L277<https://gitlab.com/mailman/mailman/-/blob/3.3.4/src/mailman/model/bounce.py#L277> and restarted. And my mailman syncmembers worked perfectly and I didn’t have any hangs it correctly removed everyone it should have. In the database logs I can see the COMMIT.
I’m pretty sure this is not the right spot for a config.db.commit() but hopefully this helps point to what might be going wrong.
Thank you for all the research on this. It is extremely helpful.
Instead of adding a commit after line 277 please try adding the @transactional decorator to the send_warnings_and_remove method at line 210
@transactional @dbconnection def send_warnings_and_remove(self, store):
I think that is the proper fix. Let us know if that works.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/<https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org>
On 4/8/21 6:04 PM, Simon Coggins wrote:
I’ve been running with this change as you suggested for a few weeks now in our test environment and I’ve not hit any issues. So this fixes my problem. Will this change be added to mailman core or should I look at manually patching during my upgrades?
The fix is at <https://gitlab.com/mailman/mailman/-/merge_requests/811> and will be in the next (3.3.5) release of core.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (4)
-
Abhilash Raj
-
Mark Sapiro
-
s.coggins@cqu.edu.au
-
Simon Coggins