Mails are stuck in Mailman Pipeline Queue
Hi Team,
I am currently running Mailman 3 with version 3.3.1 and in-built
postfix version 2.10.1-6 on RHEL 7.5 with PostgreSQL 11.7 version with default values. I am almost running 2000 lists with 64 GB RAM and 8 Cores on a single VM and on which all the mailman 3 components are running. I have been running this setup in production for the past 2 years and 6 months.
I have been seeing a special scenario suddenly: most of the
mails that are sent to this server are going to the pipeline queue and not getting released from there.
No configuration changes are done and I'm not sure why the mails
are getting stuck there. I have recreated the list but no luck.
Could not identify the root cause. Please help in this regard.
-- Thanks & Regards, Shashi Kanth.K 9052671936
Hi Team,
We have restarted the services as well but no change.
We tried manually moving the pck file from pipeline queue to
out queue but the file disappeared from out queue but is not getting delivered.
We moved the pck file from pipeline queue to out queue with the
extension .bak still no use.
Not sure what to try next. Please help.
On Sat, Mar 26, 2022 at 10:55 PM Shashikanth Komandoor < shashikanth.komandoor@gmail.com> wrote:
Hi Team,
I am currently running Mailman 3 with version 3.3.1 and
in-built postfix version 2.10.1-6 on RHEL 7.5 with PostgreSQL 11.7 version with default values. I am almost running 2000 lists with 64 GB RAM and 8 Cores on a single VM and on which all the mailman 3 components are running. I have been running this setup in production for the past 2 years and 6 months.
I have been seeing a special scenario suddenly: most of the
mails that are sent to this server are going to the pipeline queue and not getting released from there.
No configuration changes are done and I'm not sure why the
mails are getting stuck there. I have recreated the list but no luck.
Could not identify the root cause. Please help in this regard.
-- Thanks & Regards, Shashi Kanth.K 9052671936
-- Thanks & Regards, Shashi Kanth.K 9052671936
On 3/26/22 10:28, Shashikanth Komandoor wrote:
Hi Team,
We have restarted the services as well but no change.
Odd.
We tried manually moving the pck file from pipeline queue to
out queue but the file disappeared from out queue but is not getting delivered.
This is expected because if the message hasn't been processed through the pipeline, it has no recipients in the metadata, so the outgoing runner delivered it to no one. Also, it skipped archiving and digesting.
We moved the pck file from pipeline queue to out queue with the
extension .bak still no use.
Same as above.
Not sure what to try next. Please help.
Mi first guess is the pipeline runner isn't running, but you restarted the services, presumably Mailman core, and that might fix it, but there are various ways the pipeline runner could have died that would cause the master to not restart it.
You need to stop Mailman core and then start it rather than just restarting. That may help.
Also look in mailman.log. Assuming the pipeline runner died at some point, there should be a log entry and possible an exception with traceback indicating why.
You can see which runners are running with something like
ps -fwwA|grep runner=|sed s/.*runner=//
This should produce a list something like
archive:0:1
bounces:0:1
command:0:1
in:0:1
lmtp:0:1
nntp:0:1
out:0:1
pipeline:0:1
rest:0:1
retry:0:1
task:0:1
virgin:0:1
digest:0:1
rest:0:1
rest:0:1
except depending on version, task
may not be there.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thank you Mark for your response.
As you have listed all the runners are shown working fine.
I don't find any traceback during mailman service starting.
But the below trace back is observed during mailman service stopping:
Mar 27 09:01:19 2022 (5859) out runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5859) out runner exiting. Mar 27 09:01:19 2022 (5860) pipeline runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5864) digest runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5846) Master watcher caught SIGTERM. Exiting. Mar 27 09:01:19 2022 (5858) nntp runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5853) archive runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5858) nntp runner exiting. Mar 27 09:01:19 2022 (5853) archive runner exiting. Mar 27 09:01:19 2022 (5854) bounces runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5854) bounces runner exiting. Mar 27 09:01:19 2022 (5855) command runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5855) command runner exiting. Mar 27 09:01:19 2022 (5863) virgin runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5863) virgin runner exiting. Mar 27 09:01:19 2022 (5862) retry runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5862) retry runner exiting. Mar 27 09:01:19 2022 (5857) lmtp runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5857) lmtp runner exiting. Mar 27 09:01:19 2022 (5856) in runner caught SIGTERM. Stopping. Mar 27 09:01:19 2022 (5856) in runner exiting. [2022-03-27 09:01:19 +0530] [5861] [INFO] Handling signal: term [2022-03-27 09:01:19 +0530] [18009] [INFO] Worker exiting (pid: 18009) [2022-03-27 09:01:19 +0530] [18004] [INFO] Worker exiting (pid: 18004) [2022-03-27 09:01:21 +0530] [5861] [INFO] Shutting down: Master Mar 27 09:01:22 2022 (5864) Uncaught runner exception: Mar 27 09:01:22 2022 (5864) Traceback (most recent call last): File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 173, in _one_iteration self._process_one_file(msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 266, in _process_one_file keepqueued = self._dispose(mlist, msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/runners/digest.py", line 332, in _dispose digest_members = set(mlist.digest_members.members) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 254, in members DeliveryMode.summary_digests) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 226, in _get_members if member.delivery_mode in delivery_modes: File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 205, in delivery_mode return self._lookup('delivery_mode') File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 167, in _lookup pref = getattr(self.address.preferences, preference) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 117, in address if self._address is None File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 287, in __get__ return self.impl.get(instance_state(instance), dict_) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 723, in get value = self.callable_(state, passive) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 760, in _load_for_state session, state, primary_key_identity, passive File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 114, in signal_handler raise RunnerInterrupt mailman.interfaces.runner.RunnerInterrupt Mar 27 09:01:22 2022 (5864) SHUNTING: 1648351882.6693938+ c2e6ed709646a7680c73059b52d2f0451836e591 Mar 27 09:01:22 2022 (5864) digest runner exiting. Mar 27 09:01:22 2022 (5860) Uncaught runner exception: Mar 27 09:01:22 2022 (5860) Traceback (most recent call last): File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 173, in _one_iteration self._process_one_file(msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 266, in _process_one_file keepqueued = self._dispose(mlist, msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/runners/pipeline.py", line 37, in _dispose process(mlist, msg, msgdata, pipeline) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/pipelines.py", line 50, in process handler.process(mlist, msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/ mailman/handlers/member_recipients.py", line 84, in process for member in mlist.regular_members.members File "/var/lib/mailman/venv/lib64/python3.6/site-packages/ mailman/handlers/member_recipients.py", line 83, in <genexpr> recipients = set(member.address.email File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 239, in members yield from self._get_members(DeliveryMode.regular) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 226, in _get_members if member.delivery_mode in delivery_modes: File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 205, in delivery_mode return self._lookup('delivery_mode') File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 167, in _lookup pref = getattr(self.address.preferences, preference) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 287, in __get__ return self.impl.get(instance_state(instance), dict_) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 723, in get value = self.callable_(state, passive) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 760, in _load_for_state session, state, primary_key_identity, passive File "<string>", line 1, in <lambda> File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 850, in _emit_lazyload session.query(self.mapper), primary_key_identity File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/ext/baked.py", line 603, in _load_on_pk_identity setup, tuple(elem is None for elem in primary_key_identity) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 114, in signal_handler raise RunnerInterrupt mailman.interfaces.runner.RunnerInterrupt Mar 27 09:01:22 2022 (5860) SHUNTING: 1648351882.8537986+ 9bf59cb23a71a4bae65536bfdbec0abb261adaa0 Mar 27 09:01:22 2022 (5860) pipeline runner exiting. Mar 27 09:01:23 2022 (5846) Master stopped
I have shared more log because I could not understand where can the clue be to resolve the issue. Please suggest me.
On Sat, Mar 26, 2022, 11:33 PM Mark Sapiro <mark@msapiro.net> wrote:
On 3/26/22 10:28, Shashikanth Komandoor wrote:
Hi Team,
We have restarted the services as well but no change.
Odd.
We tried manually moving the pck file from pipeline queue to
out queue but the file disappeared from out queue but is not getting delivered.
This is expected because if the message hasn't been processed through the pipeline, it has no recipients in the metadata, so the outgoing runner delivered it to no one. Also, it skipped archiving and digesting.
We moved the pck file from pipeline queue to out queue with
the
extension .bak still no use.
Same as above.
Not sure what to try next. Please help.
Mi first guess is the pipeline runner isn't running, but you restarted the services, presumably Mailman core, and that might fix it, but there are various ways the pipeline runner could have died that would cause the master to not restart it.
You need to stop Mailman core and then start it rather than just restarting. That may help.
Also look in mailman.log. Assuming the pipeline runner died at some point, there should be a log entry and possible an exception with traceback indicating why.
You can see which runners are running with something like
ps -fwwA|grep runner=|sed s/.*runner=//
This should produce a list something like
archive:0:1 bounces:0:1 command:0:1 in:0:1 lmtp:0:1 nntp:0:1 out:0:1 pipeline:0:1 rest:0:1 retry:0:1 task:0:1 virgin:0:1 digest:0:1 rest:0:1 rest:0:1
except depending on version,
task
may not be there.-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 3/26/22 21:01, Shashikanth Komandoor wrote:
But the below trace back is observed during mailman service stopping:
This doesn't say much ...
...> Mar 27 09:01:22 2022 (5864) Uncaught runner exception:
Mar 27 09:01:22 2022 (5864) Traceback (most recent call last): File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 173, in _one_iteration ... File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 114, in signal_handler raise RunnerInterrupt mailman.interfaces.runner.RunnerInterrupt Mar 27 09:01:22 2022 (5864) SHUNTING: 1648351882.6693938+ c2e6ed709646a7680c73059b52d2f0451836e591 Mar 27 09:01:22 2022 (5864) digest runner exiting.
Above, digest runner was interrupted in processing and shunted the message it was working on.
and below, pipeline runner was interrupted in processing and shunted the message it was working on.
Mar 27 09:01:22 2022 (5860) Uncaught runner exception: Mar 27 09:01:22 2022 (5860) Traceback (most recent call last): File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 173, in _one_iteration self._process_one_file(msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 266, in _process_one_file keepqueued = self._dispose(mlist, msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/runners/pipeline.py", line 37, in _dispose process(mlist, msg, msgdata, pipeline) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/pipelines.py", line 50, in process handler.process(mlist, msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/ mailman/handlers/member_recipients.py", line 84, in process for member in mlist.regular_members.members File "/var/lib/mailman/venv/lib64/python3.6/site-packages/ mailman/handlers/member_recipients.py", line 83, in <genexpr> recipients = set(member.address.email File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 239, in members yield from self._get_members(DeliveryMode.regular) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 226, in _get_members if member.delivery_mode in delivery_modes: File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 205, in delivery_mode return self._lookup('delivery_mode') File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 167, in _lookup pref = getattr(self.address.preferences, preference) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 287, in __get__ return self.impl.get(instance_state(instance), dict_) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 723, in get value = self.callable_(state, passive) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 760, in _load_for_state session, state, primary_key_identity, passive File "<string>", line 1, in <lambda> File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 850, in _emit_lazyload session.query(self.mapper), primary_key_identity File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/ext/baked.py", line 603, in _load_on_pk_identity setup, tuple(elem is None for elem in primary_key_identity)
The above says the runner was doing a database lookup. This may be a coincidence or there may be some database locking issue that underlies the issue with the pipeline runner.
File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 114, in signal_handler raise RunnerInterrupt mailman.interfaces.runner.RunnerInterrupt Mar 27 09:01:22 2022 (5860) SHUNTING: 1648351882.8537986+ 9bf59cb23a71a4bae65536bfdbec0abb261adaa0 Mar 27 09:01:22 2022 (5860) pipeline runner exiting.
I suggest doing the following assuming your Mailman database is not sqlite3 (if it is, just do step 2):
- Stop mailman core and maybe other mailman services too.
- Assuming the two shunted messages are still in Mailman's shunt queue,
run
mailman unshunt
to unshunt them. If there are more shunted messages, inspect them withmailman qfile
to see if you want them, They may be spam. - Restart your database server.
- Start Mailman core and any other stopped services.
Also, look in the database server logs for any issues.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thank you Mark. This solution worked fine for me.
On Sun, Mar 27, 2022 at 9:58 AM Mark Sapiro <mark@msapiro.net> wrote:
On 3/26/22 21:01, Shashikanth Komandoor wrote:
But the below trace back is observed during mailman service stopping:
This doesn't say much ...
...> Mar 27 09:01:22 2022 (5864) Uncaught runner exception:
Mar 27 09:01:22 2022 (5864) Traceback (most recent call last): File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 173, in _one_iteration ... File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 114, in signal_handler raise RunnerInterrupt mailman.interfaces.runner.RunnerInterrupt Mar 27 09:01:22 2022 (5864) SHUNTING: 1648351882.6693938+ c2e6ed709646a7680c73059b52d2f0451836e591 Mar 27 09:01:22 2022 (5864) digest runner exiting.
Above, digest runner was interrupted in processing and shunted the message it was working on.
and below, pipeline runner was interrupted in processing and shunted the message it was working on.
Mar 27 09:01:22 2022 (5860) Uncaught runner exception: Mar 27 09:01:22 2022 (5860) Traceback (most recent call last): File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 173, in _one_iteration self._process_one_file(msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 266, in _process_one_file keepqueued = self._dispose(mlist, msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/runners/pipeline.py", line 37, in _dispose process(mlist, msg, msgdata, pipeline) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/pipelines.py", line 50, in process handler.process(mlist, msg, msgdata) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/ mailman/handlers/member_recipients.py", line 84, in process for member in mlist.regular_members.members File "/var/lib/mailman/venv/lib64/python3.6/site-packages/ mailman/handlers/member_recipients.py", line 83, in <genexpr> recipients = set(member.address.email File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 239, in members yield from self._get_members(DeliveryMode.regular) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/roster.py", line 226, in _get_members if member.delivery_mode in delivery_modes: File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 205, in delivery_mode return self._lookup('delivery_mode') File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/model/member.py", line 167, in _lookup pref = getattr(self.address.preferences, preference) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 287, in __get__ return self.impl.get(instance_state(instance), dict_) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/attributes.py", line 723, in get value = self.callable_(state, passive) File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 760, in _load_for_state session, state, primary_key_identity, passive File "<string>", line 1, in <lambda> File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/orm/strategies.py", line 850, in _emit_lazyload session.query(self.mapper), primary_key_identity File "/var/lib/mailman/venv/lib64/python3.6/site-packages/sqlalchemy/ext/baked.py", line 603, in _load_on_pk_identity setup, tuple(elem is None for elem in primary_key_identity)
The above says the runner was doing a database lookup. This may be a coincidence or there may be some database locking issue that underlies the issue with the pipeline runner.
File "/var/lib/mailman/venv/lib64/python3.6/site-packages/mailman/core/runner.py", line 114, in signal_handler raise RunnerInterrupt mailman.interfaces.runner.RunnerInterrupt Mar 27 09:01:22 2022 (5860) SHUNTING: 1648351882.8537986+ 9bf59cb23a71a4bae65536bfdbec0abb261adaa0 Mar 27 09:01:22 2022 (5860) pipeline runner exiting.
I suggest doing the following assuming your Mailman database is not sqlite3 (if it is, just do step 2):
- Stop mailman core and maybe other mailman services too.
- Assuming the two shunted messages are still in Mailman's shunt queue, run
mailman unshunt
to unshunt them. If there are more shunted messages, inspect them withmailman qfile
to see if you want them, They may be spam.- Restart your database server.
- Start Mailman core and any other stopped services.
Also, look in the database server logs for any issues.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- Thanks & Regards, Shashi Kanth.K 9052671936
participants (2)
-
Mark Sapiro
-
Shashikanth Komandoor