LTMP to Postfix problem

On one of my mailman 3 servers, I am having a problem with Postfix failing to deliver email to Mailman. I keep seeing this in the mail.log:

status=deferred (connect to 127.0.0.1[127.0.0.1]:8024: Connection refused)

What is causing this and how can I fix it?

Do you have

lmtp_host: 127.0.0.1 lmtp_port: 8024

in the [mta] section of mailman.cfg? (these are the defaults)

$ /opt/mailman/mm/bin/mailman conf|grep lmtp [mta] lmtp_host: 127.0.0.1 [mta] lmtp_port: 8024

Is Mailman's lmtp runner listening on port 8024?

$ sudo netstat -lntp|grep python tcp 0 0 127.0.0.1:8024 0.0.0.0:* LISTEN 1867/python tcp 0 0 127.0.0.1:8000 0.0.0.0:* LISTEN 1199/python tcp 0 0 127.0.0.1:8001 0.0.0.0:* LISTEN 1871/python

Is there some firewall issue blocking access to the port even from localhost?

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Brian Carpenter

8:23 p.m.

On 6/18/20 4:09 PM, Mark Sapiro wrote:

...

Do you have

lmtp_host: 127.0.0.1 lmtp_port: 8024

in the [mta] section of mailman.cfg? (these are the defaults)

$ /opt/mailman/mm/bin/mailman conf|grep lmtp [mta] lmtp_host: 127.0.0.1 [mta] lmtp_port: 8024

I do

...

Is Mailman's lmtp runner listening on port 8024?

$ sudo netstat -lntp|grep python tcp 0 0 127.0.0.1:8024 0.0.0.0:* LISTEN 1867/python tcp 0 0 127.0.0.1:8000 0.0.0.0:* LISTEN 1199/python tcp 0 0 127.0.0.1:8001 0.0.0.0:* LISTEN 1871/python

It is.

...

Is there some firewall issue blocking access to the port even from localhost?

Let me describe the problem a little better. Mail delivery from postfix to mailman works most of the time. However in the past week or so, lmtp starts refusing connections via postfix about once a day. I have port 8024 open in the firewall (UFW). There are no errors in the smtp.log file. I think restarting Mailman core seems to get things running again.

-- Please let me know if you need further assistance.

Thank you for your business. We appreciate our clients. Brian Carpenter EMWD.com

-- EMWD's Knowledgebase: https://clientarea.emwd.com/index.php/knowledgebase

EMWD's Community Forums http://discourse.emwd.com/

Mark Sapiro

9:16 p.m.

On 6/18/20 1:23 PM, Brian Carpenter wrote:

...

Let me describe the problem a little better. Mail delivery from postfix to mailman works most of the time. However in the past week or so, lmtp starts refusing connections via postfix about once a day. I have port 8024 open in the firewall (UFW). There are no errors in the smtp.log file. I think restarting Mailman core seems to get things running again.

When this happens is the LMTP runner still running or has it died?. Is there anything relevant in mailman.log regarding lmtp runner?

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Brian Carpenter

9:24 p.m.

On 6/18/20 5:16 PM, Mark Sapiro wrote:

...

When this happens is the LMTP runner still running or has it died?. Is there anything relevant in mailman.log regarding lmtp runner?

The log entry posted today was:

Jun 18 15:07:32 2020 (29862) lmtp runner started.

That was when I restarted Mailman core. If it happens again, I will see if the lmtp runner is running. My feeling is it died but I don't know for sure.

-- Please let me know if you need further assistance.

Thank you for your business. We appreciate our clients. Brian Carpenter EMWD.com

-- EMWD's Knowledgebase: https://clientarea.emwd.com/index.php/knowledgebase

EMWD's Community Forums http://discourse.emwd.com/

Mark Sapiro

10:19 p.m.

On 6/18/20 2:24 PM, Brian Carpenter wrote:

...

The log entry posted today was:

Jun 18 15:07:32 2020 (29862) lmtp runner started.

That was when I restarted Mailman core. If it happens again, I will see if the lmtp runner is running. My feeling is it died but I don't know for sure.

If you have it, look through the mailman.log for entries around the last time the connections started to be refused. See if there are entries about the runner exiting. Also, just before you restarted, there will be several runner caught SIGTERM. Stopping. and runner exiting. messages. If lmtp runner is among those, it was still running when you restarted.

What you are looking for is some exception/traceback from the time connections start to be refused.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Brian Carpenter

11:17 p.m.

On 6/18/20 6:19 PM, Mark Sapiro wrote:

...

If you have it, look through the mailman.log for entries around the last time the connections started to be refused. See if there are entries about the runner exiting. Also, just before you restarted, there will be several runner caught SIGTERM. Stopping. and runner exiting. messages. If lmtp runner is among those, it was still running when you restarted.

So lmtp runner was NOT running when I restarted mailman core, however I saw no error messages in the mailman.log at the time of the first instance of the connection refused message.

-- Please let me know if you need further assistance.

Thank you for your business. We appreciate our clients. Brian Carpenter EMWD.com

-- EMWD's Knowledgebase: https://clientarea.emwd.com/index.php/knowledgebase

EMWD's Community Forums http://discourse.emwd.com/

Mark Sapiro

11:36 p.m.

On 6/18/20 4:17 PM, Brian Carpenter wrote:

...

So lmtp runner was NOT running when I restarted mailman core, however I saw no error messages in the mailman.log at the time of the first instance of the connection refused message.

Check your system logs to see if there's anything about the OS killing the process. Also check the Postfix log for deliveries to mailman just before it died. Just stabbing in the dark, but maybe some huge message caused it to grow beyond some memory limit and get killed by the OS.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Brian Carpenter

12:38 a.m.

On 6/18/20 7:36 PM, Mark Sapiro wrote:

...

Check your system logs to see if there's anything about the OS killing the process. Also check the Postfix log for deliveries to mailman just before it died. Just stabbing in the dark, but maybe some huge message caused it to grow beyond some memory limit and get killed by the OS.

I am replying off list at this point. It is definitely a OOM issue. Here are the log entries right before ltmp crashed:

Jun 18 00:00:12 mm4 kernel: [19548549.158382] postgres invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 Jun 18 00:00:12 mm4 kernel: [19548549.160719] postgres cpuset=/ mems_allowed=0 Jun 18 00:00:12 mm4 kernel: [19548549.161524] CPU: 0 PID: 7243 Comm: postgres Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2 Jun 18 00:00:12 mm4 kernel: [19548549.163047] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 Jun 18 00:00:12 mm4 kernel: [19548549.164885] Call Trace: Jun 18 00:00:12 mm4 kernel: [19548549.165402] dump_stack+0x5c/0x80 Jun 18 00:00:12 mm4 kernel: [19548549.166097] dump_header+0x6b/0x283 Jun 18 00:00:12 mm4 kernel: [19548549.167019] ? do_try_to_free_pages+0x2ec/0x370 Jun 18 00:00:12 mm4 kernel: [19548549.167864] oom_kill_process.cold.30+0xb/0x1cf Jun 18 00:00:12 mm4 kernel: [19548549.168760] ? oom_badness+0x23/0x140 Jun 18 00:00:12 mm4 kernel: [19548549.169534] out_of_memory+0x1a5/0x430 Jun 18 00:00:12 mm4 kernel: [19548549.170174] __alloc_pages_slowpath+0xbd8/0xcb0 Jun 18 00:00:12 mm4 kernel: [19548549.171060] __alloc_pages_nodemask+0x28b/0x2b0 Jun 18 00:00:12 mm4 kernel: [19548549.171973] filemap_fault+0x3bd/0x780 Jun 18 00:00:12 mm4 kernel: [19548549.172643] ? alloc_set_pte+0xf2/0x560 Jun 18 00:00:12 mm4 kernel: [19548549.173369] ? filemap_map_pages+0x1ed/0x3a0 Jun 18 00:00:12 mm4 kernel: [19548549.174404] ext4_filemap_fault+0x2c/0x40 [ext4] Jun 18 00:00:12 mm4 kernel: [19548549.175259] __do_fault+0x36/0x130 Jun 18 00:00:12 mm4 kernel: [19548549.175930] __handle_mm_fault+0xe6c/0x1270 Jun 18 00:00:12 mm4 kernel: [19548549.176765] handle_mm_fault+0xd6/0x200 Jun 18 00:00:12 mm4 kernel: [19548549.177644] __do_page_fault+0x249/0x4f0 Jun 18 00:00:12 mm4 kernel: [19548549.178407] ? async_page_fault+0x8/0x30 Jun 18 00:00:12 mm4 kernel: [19548549.179156] async_page_fault+0x1e/0x30 Jun 18 00:00:12 mm4 kernel: [19548549.179915] RIP: 0033:0x55d875a3fe10 Jun 18 00:00:12 mm4 kernel: [19548549.180569] Code: Bad RIP value. Jun 18 00:00:12 mm4 kernel: [19548549.181181] RSP: 002b:00007ffcad32f248 EFLAGS: 00010206 Jun 18 00:00:12 mm4 kernel: [19548549.182132] RAX: 000055d877d512b0 RBX: 000055d877d60b93 RCX: 00007ffcad32f2c0 Jun 18 00:00:12 mm4 kernel: [19548549.183365] RDX: 0000000000000005 RSI: 000055d877d60b93 RDI: 000055d877d512b0 Jun 18 00:00:12 mm4 kernel: [19548549.184712] RBP: 00007ffcad32f270 R08: 0000000000000001 R09: 00007ffcad32f388 Jun 18 00:00:12 mm4 kernel: [19548549.186338] R10: 00007f2585002d40 R11: 0000000000000000 R12: 0000000000000005 Jun 18 00:00:12 mm4 kernel: [19548549.187583] R13: 000055d877d4b3a0 R14: 00007f2581ad05b8 R15: 0000000000000000 Jun 18 00:00:12 mm4 kernel: [19548549.188952] Mem-Info: Jun 18 00:00:12 mm4 kernel: [19548549.189468] active_anon:197899 inactive_anon:206498 isolated_anon:0 Jun 18 00:00:12 mm4 kernel: [19548549.189468] active_file:253 inactive_file:278 isolated_file:23 Jun 18 00:00:12 mm4 kernel: [19548549.189468] unevictable:0 dirty:0 writeback:0 unstable:0 Jun 18 00:00:12 mm4 kernel: [19548549.189468] slab_reclaimable:21867 slab_unreclaimable:55017 Jun 18 00:00:12 mm4 kernel: [19548549.189468] mapped:17008 shmem:24450 pagetables:4890 bounce:0 Jun 18 00:00:12 mm4 kernel: [19548549.189468] free:13184 free_pcp:340 free_cma:0 Jun 18 00:00:12 mm4 kernel: [19548549.195739] Node 0 active_anon:791596kB inactive_anon:825992kB active_file:1012kB inactive_file:1112kB unevictable:0kB isolated(anon):0kB isolated(file):92kB mapped:68032kB dirty$ Jun 18 00:00:12 mm4 kernel: [19548549.200559] Node 0 DMA free:8152kB min:352kB low:440kB high:528kB active_anon:1056kB inactive_anon:5048kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB prese$ Jun 18 00:00:12 mm4 kernel: [19548549.204993] lowmem_reserve[]: 0 1950 1950 1950 1950 Jun 18 00:00:12 mm4 kernel: [19548549.205877] Node 0 DMA32 free:44584kB min:44700kB low:55872kB high:67044kB active_anon:790540kB inactive_anon:820944kB active_file:1012kB inactive_file:1112kB unevictable:0kB wri$ Jun 18 00:00:12 mm4 kernel: [19548549.211109] lowmem_reserve[]: 0 0 0 0 0 Jun 18 00:00:12 mm4 kernel: [19548549.211826] Node 0 DMA: 8*4kB (UE) 31*8kB (UME) 20*16kB (UME) 22*32kB (UME) 13*64kB (UME) 5*128kB (UME) 5*256kB (UM) 4*512kB (UM) 2*1024kB (UM) 0*2048kB 0*4096kB = 8152kB Jun 18 00:00:12 mm4 kernel: [19548549.214414] Node 0 DMA32: 378*4kB (UMEH) 472*8kB (UEH) 412*16kB (UEH) 196*32kB (UMEH) 63*64kB (UMEH) 63*128kB (UE) 16*256kB (UE) 4*512kB (ME) 0*1024kB 2*2048kB (M) 1*4096kB (M) =$ Jun 18 00:00:12 mm4 kernel: [19548549.217336] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Jun 18 00:00:12 mm4 kernel: [19548549.218937] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jun 18 00:00:12 mm4 kernel: [19548549.220513] 39729 total pagecache pages Jun 18 00:00:12 mm4 kernel: [19548549.221286] 14715 pages in swap cache Jun 18 00:00:12 mm4 kernel: [19548549.222022] Swap cache stats: add 10809560, delete 10794845, find 18524127383/18527544787 Jun 18 00:00:12 mm4 kernel: [19548549.223482] Free swap = 0kB Jun 18 00:00:12 mm4 kernel: [19548549.224081] Total swap = 524284kB Jun 18 00:00:12 mm4 kernel: [19548549.224712] 524154 pages RAM Jun 18 00:00:12 mm4 kernel: [19548549.225222] 0 pages HighMem/MovableOnly Jun 18 00:00:12 mm4 kernel: [19548549.225894] 13385 pages reserved

Then there was this little bit of info:

Jun 18 00:00:12 mm4 kernel: [19548549.442555] Out of memory: Kill process 29676 (python3) score 37 or sacrifice child Jun 18 00:00:12 mm4 kernel: [19548549.444508] Killed process 29676 (python3) total-vm:192224kB, anon-rss:93084kB, file-rss:0kB, shmem-rss:0kB Jun 18 00:00:12 mm4 kernel: [19548549.464907] oom_reaper: reaped process 29676 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I suspect the above process was ltmp.

So the server has 2 gig of ram with about a dozen MM3 lists on it. Is this an issue with tuning performance of the Postgresql server or do I need to add more memory. I am curious how much ram your server is running and if you had run into any OOM of problems.

If you want me to post this to the list, I will.

-- Please let me know if you need further assistance.

Thank you for your business. We appreciate our clients. Brian Carpenter EMWD.com

-- EMWD's Knowledgebase: https://clientarea.emwd.com/index.php/knowledgebase

EMWD's Community Forums http://discourse.emwd.com/

Brian Carpenter

12:40 a.m.

On 6/18/20 8:38 PM, Brian Carpenter wrote:

...

I am replying off list at this point. It is definitely a OOM issue. Here are the log entries right before ltmp crashed:

Obviously I didn't. It's been a looong day.

-- Please let me know if you need further assistance.

Thank you for your business. We appreciate our clients. Brian Carpenter EMWD.com

-- EMWD's Knowledgebase: https://clientarea.emwd.com/index.php/knowledgebase

EMWD's Community Forums http://discourse.emwd.com/

Mark Sapiro

5:23 a.m.

On 6/18/20 5:38 PM, Brian Carpenter wrote:

...

Then there was this little bit of info:

Jun 18 00:00:12 mm4 kernel: [19548549.442555] Out of memory: Kill process 29676 (python3) score 37 or sacrifice child Jun 18 00:00:12 mm4 kernel: [19548549.444508] Killed process 29676 (python3) total-vm:192224kB, anon-rss:93084kB, file-rss:0kB, shmem-rss:0kB Jun 18 00:00:12 mm4 kernel: [19548549.464907] oom_reaper: reaped process 29676 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I suspect the above process was ltmp.

So the server has 2 gig of ram with about a dozen MM3 lists on it. Is this an issue with tuning performance of the Postgresql server or do I need to add more memory. I am curious how much ram your server is running and if you had run into any OOM of problems.

I haven't seen any OOM issues on the servers I manage. Those include

mail.python.org 16 GB RAM 261 MM2.1 lists, 139 MM 3 lists

lists.mailman3.org 4 GB RAM and other than tests and demos, only this list

www.grizz.org 8 GB RAM This is my bicycling club's server. 7 MM 2.1 lists, only 2 of which are really active, 7 MM 3 lists including 2 test/demo - MySQL for the club database and Postgres for Mailman 3. It also supports some non Mailman club email and the club's web site including a MoinMoin wiki.

The grizz.org domain was recently migrated from from a larger server in a colo that supported multiple domains to a DigitalOcean droplet which initially had only 4 GB RAM, but that seemed marginally enough at best, so I upgraded it to 8 GB.

So the bottom line is I haven't run Mailman 3 on a server with only 2 GB and that may well not be enough.

-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

Mark Dadgar

6:30 a.m.

On Jun 18, 2020, at 10:23 PM, Mark Sapiro <mark@msapiro.net> wrote:

...

The grizz.org domain was recently migrated from from a larger server in a colo that supported multiple domains to a DigitalOcean droplet which initially had only 4 GB RAM, but that seemed marginally enough at best, so I upgraded it to 8 GB.

My 4GB DO droplet runs a couple of mm3 lists, including one that does ~4000 posts/month times ~400 recipients, running postgres, with no signs of the OOM killer.

In fact, the only time I’ve ever seen it is when I was using the mailman3 shell to search for bad characters in the archives the other day.

Mark

mark@pdc-racing.net | 408-348-2878

sunil soprey

11:50 a.m.

Are you running in VM or dockerized process?

For VM:
Did you tune your kernel settings? I've increased my TCP & number-of-files/processes, do not have deferred issues - 300 lists, 500k daily email volume, lists over 5k recipients.

Other tuning

django threads
recipients chunk size for MTA
RDS DB connection-timeout

1864

Age (days ago)

1865

Last active (days ago)

List overview

Download

12 comments

4 participants

participants (4)

Brian Carpenter
Mark Dadgar
Mark Sapiro
sunil soprey

LTMP to Postfix problem

tags

participants (4)