Splitting hosts: mta&core vs postorius
I'm trying to set up mailman3 on two hosts, one of which runs core + exim, the other runs postorius (&possibly hyperkitty).
I have I think got core set up correctly, and postorius is sort of usable, but anything in the web ui that talks to core ends up with a white screen.
I think what is happening is a postorius URL ends up with the core (or perhaps no) hostname, and as there's nothing there to answer it fails.
I've checked the settings and I don't think I have misconfigured postorius. Is it possible that the bad url is being passed from core to postorius in some way and reused without changing the host?
In the config at present, both hosts are visible to the browser, and each host can communicate with the other.
Any ideas on how to pin down the problem better?
Ruth
On 6/19/20 4:34 PM, Ruth Ivimey-Cook wrote:
I'm trying to set up mailman3 on two hosts, one of which runs core + exim, the other runs postorius (&possibly hyperkitty).
I have I think got core set up correctly, and postorius is sort of usable, but anything in the web ui that talks to core ends up with a white screen.
I think what is happening is a postorius URL ends up with the core (or perhaps no) hostname, and as there's nothing there to answer it fails.
What are your Django settings for MAILMAN_REST_API_URL, MAILMAN_REST_API_USER and MAILMAN_REST_API_PASS and do these agree with the [webservice] settings for hostname, port, use_https, admin_user and admin_pass in mailman.cfg in core and is the port open to the outside on the core machine?
I've checked the settings and I don't think I have misconfigured postorius. Is it possible that the bad url is being passed from core to postorius in some way and reused without changing the host?
In the config at present, both hosts are visible to the browser, and each host can communicate with the other.
Any ideas on how to pin down the problem better?
Do you see REST requests from the Postorius machine in core's mailman.log and access.log?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks Mark,
On 20/06/2020 03:53, Mark Sapiro wrote:
On 6/19/20 4:34 PM, Ruth Ivimey-Cook wrote:
I'm trying to set up mailman3 on two hosts, one of which runs core + exim, the other runs postorius (&possibly hyperkitty).
I have I think got core set up correctly, and postorius is sort of usable, but anything in the web ui that talks to core ends up with a white screen.
I think what is happening is a postorius URL ends up with the core (or perhaps no) hostname, and as there's nothing there to answer it fails.
What are your Django settings for MAILMAN_REST_API_URL, MAILMAN_REST_API_USER and MAILMAN_REST_API_PASS and do these agree with the [webservice] settings for hostname, port, use_https, admin_user and admin_pass in mailman.cfg in core and is the port open to the outside on the core machine?
greyarea-post is the mailman-core and exim server.
greyarea-web1 is the postorius and hyperkitty server.
On greyarea-web1, in django-settings.py,
MAILMAN_REST_API_URL = '[1]http://greyarea-post:8870'
MAILMAN_REST_API_USER = 'restadmin'
MAILMAN_REST_API_PASS = XXXX
and both the above-named hosts are also included in ALLOWED_HOSTS along with localhost.
On greyarea-post, in mailman.cfg:
[webservice]
admin_pass: XXXXX
admin_user: restadmin
api_version: 3.1
hostname: 0.0.0.0
port: 8870
use_https: no
The password and username are as on web1.
Do you see REST requests from the Postorius machine in core's
mailman.log and access.log?
Yes, there are lots of requests of the form:
[07/Jun/2020:15:00:01 +0000] "GET /3.1/lists?count=10&page=1
HTTP/1.1" 200 470 "-" "GNU Mailman REST client v3.3.1"
I made a new list using the core CLI, and as a result of navigating directly to the list archive page I get to see:
[06/Jun/2020:23:00:02 +0000] "GET /3.1/lists/chbcdaily@ch-bc.org.uk
HTTP/1.1" 200 365 "-" "GNU Mailman REST client v3.3.1"
... but in neither case do I get a complete screen on the browser or 'follow-up' requests.
I don't have a file access.log on the greyarea-post.
On the greyarea-web1 I have mailmansuite.log with the last few lines:
File
"/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/ba
se.py", line 74, in rest_data
response, content = self._connection.call(self._url)
File
"/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/co
nnection.py", line 124, in call
'Could not connect to Mailman API: ', repr(e))
mailmanclient.restbase.connection.MailmanConnectionError: ('Could
not connect to Mailman API: ',
'ConnectionError(MaxRetryError("HTTPConnectionPool(host=\'0.0.0.0\',
port=8870): Max retries exceeded with url: /3.1/domains/ch-bc.org.uk
(Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection
object at 0x7f5416db8518>: Failed to establish a new connection:
[Errno 111] Connection refused\',))",),)')
ERROR 2020-06-03 01:02:06,115 29002 django.request Service
Unavailable: /postorius/lists/
the uwsgi-error_*.log contains many lines of the form:
api: url is
[2]http://greyarea-post.cam.ivimey.org:8870/3.1/lists?count=10&page=
1
api: url is
[3]http://greyarea-post.cam.ivimey.org:8870/3.1/lists?count=10&page=
1
My reason for thinking, in my original post, that the wrong host address is being used is that the ConnectionError above contains 0.0.0.0 in its url for /3.1/domains/... but 0.0.0.0 is only present as the listen address on the core web service and does not appear in the config for postorius.
I hope this helps,
Ruth
--
Software Manager & Engineer Tel: 01223 414180 Blog: [4]http://www.ivimey.org/blog LinkedIn: [5]http://uk.linkedin.com/in/ruthivimeycook/
References
On 6/20/20 1:37 PM, Ruth Ivimey-Cook wrote:
Thanks Mark,
Ignore bracketed numbers below ([1], [2], etc.) They come from conversion of the HTML only post to plain text.
On 20/06/2020 03:53, Mark Sapiro wrote:
What are your Django settings for MAILMAN_REST_API_URL, MAILMAN_REST_API_USER and MAILMAN_REST_API_PASS and do these agree with the [webservice] settings for hostname, port, use_https, admin_user and admin_pass in mailman.cfg in core and is the port open to the outside on the core machine?
greyarea-post is the mailman-core and exim server.
greyarea-web1 is the postorius and hyperkitty server.
On greyarea-web1, in django-settings.py,
MAILMAN_REST_API_URL = '[1]http://greyarea-post:8870' MAILMAN_REST_API_USER = 'restadmin' MAILMAN_REST_API_PASS = XXXX
and both the above-named hosts are also included in ALLOWED_HOSTS along with localhost.
On greyarea-post, in mailman.cfg:
[webservice] admin_pass: XXXXX admin_user: restadmin api_version: 3.1 hostname: 0.0.0.0 port: 8870 use_https: no
The password and username are as on web1.
That all looks good.
Do you see REST requests from the Postorius machine in core's
mailman.log and access.log?
Yes, there are lots of requests of the form:
[07/Jun/2020:15:00:01 +0000] "GET /3.1/lists?count=10&page=1 HTTP/1.1" 200 470 "-" "GNU Mailman REST client v3.3.1"
I made a new list using the core CLI, and as a result of navigating directly to the list archive page I get to see:
[06/Jun/2020:23:00:02 +0000] "GET /3.1/lists/chbcdaily@ch-bc.org.uk HTTP/1.1" 200 365 "-" "GNU Mailman REST client v3.3.1"
... but in neither case do I get a complete screen on the browser or 'follow-up' requests.
I don't have a file access.log on the greyarea-post.
Yes, that's actually a gunicorn log so would only be on greyarea-web1, but the mailman.log tells us the requests are getting to the core REST API, so we don't need that info.
On the greyarea-web1 I have mailmansuite.log with the last few lines:
File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/ba se.py", line 74, in rest_data response, content = self._connection.call(self._url) File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/co nnection.py", line 124, in call 'Could not connect to Mailman API: ', repr(e)) mailmanclient.restbase.connection.MailmanConnectionError: ('Could not connect to Mailman API: ', 'ConnectionError(MaxRetryError("HTTPConnectionPool(host=\'0.0.0.0\', port=8870): Max retries exceeded with url: /3.1/domains/ch-bc.org.uk (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7f5416db8518>: Failed to establish a new connection: [Errno 111] Connection refused\',))",),)') ERROR 2020-06-03 01:02:06,115 29002 django.request Service Unavailable: /postorius/lists/
OK. This is telling us that mailmanclient (the REST API Python bindings that Postorius uses to access core) is having trouble establishing a connestion to the core REST server, but it clearly gets there some of the time based on the mailman.log entries.
the uwsgi-error_*.log contains many lines of the form:
api: url is [2]http://greyarea-post.cam.ivimey.org:8870/3.1/lists?count=10&page= 1 api: url is [3]http://greyarea-post.cam.ivimey.org:8870/3.1/lists?count=10&page= 1
My reason for thinking, in my original post, that the wrong host address is being used is that the ConnectionError above contains 0.0.0.0 in its url for /3.1/domains/... but 0.0.0.0 is only present as the listen address on the core web service and does not appear in the config for postorius.
The ConnectionError above comes from requests.request (docs at <https://requests.readthedocs.io/en/master/>). Client only provides a URL in the call to request and the URL is the baseurl + the path and the baseurl is MAILMAN_REST_API_URL which you have as http://greyarea-post:8870. Entries above have http://greyarea-post.cam.ivimey.org:8870/. Is that the correct FQDN?
I suspect a DNS lookup issue. How does greyarea-web1 resolve greyarea-post? You could try a FQDN that greyarea-web1 can lookup in DNS rather than relying on /etc/resolv.conf and/or /etc/hosts.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
My reason for thinking, in my original post, that the wrong host address is being used is that the ConnectionError above contains 0.0.0.0 in its url for /3.1/domains/... but 0.0.0.0 is only present as the listen address on the core web service and does not appear in the config for postorius.
The ConnectionError above comes from requests.request (docs at <https://requests.readthedocs.io/en/master/>). Client only provides a URL in the call to request and the URL is the baseurl + the path and the baseurl is MAILMAN_REST_API_URL which you have as http://greyarea-post:8870. Entries above have http://greyarea-post.cam.ivimey.org:8870/. Is that the correct FQDN?
I suspect a DNS lookup issue. How does greyarea-web1 resolve greyarea-post? You could try a FQDN that greyarea-web1 can lookup in DNS rather than relying on /etc/resolv.conf and/or /etc/hosts.
No, ignore that... I edited the FQDN down to the hostname in the email for simplicity, but even if it was in the log the hostname does map properly (i.e. host greyarea-post and host greyarea.cam.ivimey.org return the same thing).
In referring to the ConnectionError from:
'ConnectionError(MaxRetryError("HTTPConnectionPool(host=\'0.0.0.0\', port=8870): Max retries exceeded with url: /3.1/domains/ch-bc.org.uk (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7f5416db8518>: Failed to establish a new connection: [Errno 111] Connection refused\',))",),)')
if you reconstruct the url passed in it was presumably:
"http://0.0.0.0:8870/3.1/domains/ch-bc.org.uk"
... and if that is correct I am not surprised to see an error message.
But perhaps that isn't what it was...
Ruth
On 6/20/20 5:41 PM, Ruth Ivimey-Cook wrote:
My reason for thinking, in my original post, that the wrong host address is being used is that the ConnectionError above contains 0.0.0.0 in its url for /3.1/domains/... but 0.0.0.0 is only present as the listen address on the core web service and does not appear in the config for postorius. The ConnectionError above comes from requests.request (docs at <https://requests.readthedocs.io/en/master/>). Client only provides a URL in the call to request and the URL is the baseurl + the path and the baseurl is MAILMAN_REST_API_URL which you have as http://greyarea-post:8870. Entries above have http://greyarea-post.cam.ivimey.org:8870/. Is that the correct FQDN?
I suspect a DNS lookup issue. How does greyarea-web1 resolve greyarea-post? You could try a FQDN that greyarea-web1 can lookup in DNS rather than relying on /etc/resolv.conf and/or /etc/hosts.
No, ignore that... I edited the FQDN down to the hostname in the email for simplicity, but even if it was in the log the hostname does map properly (i.e. host greyarea-post and host greyarea.cam.ivimey.org return the same thing).
Is the above a typo. You are saying greyarea.cam.ivimey.org. Did you mean that or greyarea-post.cam.ivimey.org as in the http://greyarea-post.cam.ivimey.org:8870/3.1/lists?count=10&page=1 log messages?
In referring to the ConnectionError from:
'ConnectionError(MaxRetryError("HTTPConnectionPool(host=\'0.0.0.0\', port=8870): Max retries exceeded with url: /3.1/domains/ch-bc.org.uk (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7f5416db8518>: Failed to establish a new connection: [Errno 111] Connection refused\',))",),)')
if you reconstruct the url passed in it was presumably:
I don't think so. I think it was http://greyarea-post:8870/3.1/domains/ch-bc.org.uk and requests.request couldn't resolve that to the correct IP.
If you look at Mailman client <https://gitlab.com/mailman/mailmanclient/-/blob/master/src/mailmanclient/res...> you'll see it makes the URL from the baseurl passed to the instantiation of the Connection class which occurs at <https://gitlab.com/mailman/mailmanclient/-/blob/master/src/mailmanclient/cli...> which in turn is called from Postorius at <https://gitlab.com/mailman/postorius/-/blob/master/src/postorius/utils.py#L4...> which makes the baseurl argument from '%s/3.1' % settings.MAILMAN_REST_API_URL.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
In referring to the ConnectionError from:
'ConnectionError(MaxRetryError("HTTPConnectionPool(host=\'0.0.0.0\', port=8870): Max retries exceeded with url: /3.1/domains/ch-bc.org.uk (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7f5416db8518>: Failed to establish a new connection: [Errno 111] Connection refused\',))",),)')
if you reconstruct the url passed in it was presumably:
I don't think so. I think it was http://greyarea-post:8870/3.1/domains/ch-bc.org.uk and requests.request couldn't resolve that to the correct IP.
If you look at Mailman client <https://gitlab.com/mailman/mailmanclient/-/blob/master/src/mailmanclient/res...> you'll see it makes the URL from the baseurl passed to the instantiation of the Connection class which occurs at <https://gitlab.com/mailman/mailmanclient/-/blob/master/src/mailmanclient/cli...> which in turn is called from Postorius at <https://gitlab.com/mailman/postorius/-/blob/master/src/postorius/utils.py#L4...> which makes the baseurl argument from '%s/3.1' % settings.MAILMAN_REST_API_URL.
I set up tcpdump looking at port 8870 traffic. This is what I saw (TCP ACK packets omitted): 02:10:44.099300 IP greyarea-web1.cam.ivimey.org.38690 > greyarea-post.cam.ivimey.org.8870: Flags [P.], seq 1:255, ack 1, win 502, options [nop,nop,TS val 77293118 ecr 279301477], length 254 E..2..@.@.XX.. C.. B.""..Z...l............. ..f>...eGET /3.1/domains HTTP/1.1 Host: greyarea-post.cam.ivimey.org:8870 User-Agent: GNU Mailman REST client v3.3.1 Accept-Encoding: gzip, deflate Accept: */* Connection: keep-alive Authorization: Basic cmVzdGFkbWluOj1ldWJocXA5cjgzcnZucWxKZWl1ZnY6YWI= Connect to greyarea-post with a URL like http://greyarea-post/3.1/domains 02:10:44.099748 IP greyarea-post.cam.ivimey.org.8870 > greyarea-web1.cam.ivimey.org.38690: Flags [.], ack 255, win 508, options [nop,nop,TS val 279301479 ecr 77293118], length 0 E..4.?@.@..... B.. C"..".l...Z......:...... ...g..f> 02:10:44.105704 IP greyarea-post.cam.ivimey.org.8870 > greyarea-web1.cam.ivimey.org.38690: Flags [P.], seq 1:154, ack 255, win 508, options [nop,nop,TS val 279301485 ecr 77293118], length 153 E....@@.@..... B.. C"..".l...Z......`...... ...m..f>HTTP/1.1 200 OK Server: gunicorn/20.0.4 Date: Sun, 21 Jun 2020 02:10:44 GMT Connection: close content-length: 299 content-type: application/json Reply from greyarea-post saying OK.... (more) 02:10:44.105753 IP greyarea-post.cam.ivimey.org.8870 > greyarea-web1.cam.ivimey.org.38690: Flags [P.], seq 154:453, ack 255, win 508, options [nop,nop,TS val 279301485 ecr 77293118], length 299 E.._.A@.@..... B.. C"..".l._.Z............. ...m..f>{"start": 0, "total_size": 1, "entries": [{"alias_domain": null, "description": null, "mail_host": "ch-bc.org.uk", "self_link": "http://0.0.0.0:8870/3.1/domains/ch-bc.org.uk", "http_etag": "\"5d474f0be89222e6122b6dc375a18ff01190419d\""}], "http_etag": "\"c6f15186189ff5cf5b67f2d06dcf4ea97ea447a9\""} 02:10:44.105757 IP greyarea-web1.cam.ivimey.org.38690 > greyarea-post.cam.ivimey.org.8870: Flags [.], ack 453, win 501, options [nop,nop,TS val 77293124 ecr 279301485], length 0 E..4..@.@.YT.. C.. B.""..Z...l............. ..fD...m ... the Body of reply, with the JSON text: { "start": 0, "total_size": 1, "entries": [ { "alias_domain": null, "description": null, "mail_host": "ch-bc.org.uk", "self_link": "http://0.0.0.0:8870/3.1/domains/ch-bc.org.uk", "http_etag": "\"5d474f0be89222e6122b6dc375a18ff01190419d\"" } ], "http_etag": "\"c6f15186189ff5cf5b67f2d06dcf4ea97ea447a9\"" }
On 6/20/20 7:24 PM, Ruth Ivimey-Cook wrote:
... the Body of reply, with the JSON text:
{ "start": 0, "total_size": 1, "entries": [ { "alias_domain": null, "description": null, "mail_host": "ch-bc.org.uk", "self_link": "http://0.0.0.0:8870/3.1/domains/ch-bc.org.uk", "http_etag": "\"5d474f0be89222e6122b6dc375a18ff01190419d\"" } ], "http_etag": "\"c6f15186189ff5cf5b67f2d06dcf4ea97ea447a9\"" }
All these self_link references are generated from self.api.path_t which is a reference to (in the case of 3.1) <https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/core/api.py#L65... which gets the host portion from config.webservice.hostname, i.e. the value of hostname in the webservice section of mailman.cfg, and actually <sigh>, looking back at <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...> that's exactly where you said you set it to 0.0.0.0.
Change that to greyarea-post
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 21/06/2020 04:52, Mark Sapiro wrote:
On 6/20/20 7:24 PM, Ruth Ivimey-Cook wrote:
... the Body of reply, with the JSON text:
{ "start": 0, "total_size": 1, "entries": [ { "alias_domain": null, "description": null, "mail_host": "ch-bc.org.uk", "self_link": "http://0.0.0.0:8870/3.1/domains/ch-bc.org.uk", "http_etag": "\"5d474f0be89222e6122b6dc375a18ff01190419d\"" } ], "http_etag": "\"c6f15186189ff5cf5b67f2d06dcf4ea97ea447a9\"" }
All these self_link references are generated from self.api.path_t which is a reference to (in the case of 3.1) <https://gitlab.com/mailman/mailman/-/blob/master/src/mailman/core/api.py#L65... which gets the host portion from config.webservice.hostname, i.e. the value of hostname in the webservice section of mailman.cfg, and actually <sigh>, looking back at <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...> that's exactly where you said you set it to 0.0.0.0.
Change that to greyarea-post
Problem is, if I do that (which I did in the beginning), the core rest service listens on localhost / 127.0.0.1 (it obviously recognises that as the , and so cannot be connected to by anything. That is why I set it to 0.0.0.0 to begin with, and what the documentation implies (fairly strongly) that this parameter is about.
IMO, the hostname parameter needs to be split up into a "what address should the rest of the world use to address me" and separately "what address(es) should I listen on". Multiple listen addresses enable multihomed hosts to get the right connectivity, too.
Would that be hard?
Ruth
--
Software Manager & Engineer Tel: 01223 414180 Blog: http://www.ivimey.org/blog LinkedIn: http://uk.linkedin.com/in/ruthivimeycook/
On 6/21/20 9:38 AM, Ruth Ivimey-Cook wrote:
Problem is, if I do that (which I did in the beginning), the core rest service listens on localhost / 127.0.0.1 (it obviously recognises that as the , and so cannot be connected to by anything. That is why I set it to 0.0.0.0 to begin with, and what the documentation implies (fairly strongly) that this parameter is about.
What documentation? I do understand what you are saying and I see the issue. I don't understand why it works for this poster <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...> but maybe that's specific to Docker containers.
IMO, the hostname parameter needs to be split up into a "what address should the rest of the world use to address me" and separately "what address(es) should I listen on". Multiple listen addresses enable multihomed hosts to get the right connectivity, too.
Try something like this and let us know if it works? If so, we can do something like add a config setting for this.
--- a/src/mailman/rest/gunicorn.py +++ b/src/mailman/rest/gunicorn.py @@ -53,7 +53,7 @@ def make_gunicorn_server(): # We load some options from config.webservice, while other extended options # for gunicorn can be defined in configuration: section. We also load up # some logging options since gunicorn sets up it's own loggers. - host = config.webservice.hostname + host = '0.0.0.0' port = int(config.webservice.port) log_path = os.path.join(config.LOG_DIR, config.logging.http['path']) options = {
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Ruth Ivimey-Cook writes:
On 21/06/2020 04:52, Mark Sapiro wrote:
Change that to greyarea-post
Problem is, if I do that (which I did in the beginning), the core rest service listens on localhost / 127.0.0.1
I don't understand. I'm not in a position to test with Mailman itself right now, but the standalone gunicorn[1] does the right thing (takes an IP address and binds to the corresponding interface, or takes a domain name, resolves it to the IP address, and binds that to the corresponding interface). It does not substitute localhost.
It could be a Mailman bug (Mailman 3 is usually configured with all components running on the same host, and that's how it's tested), but first, does greyarea-post resolve to a routable address on the relevant network when queried from that host?
IMO, the hostname parameter needs to be split up into a "what address should the rest of the world use to address me" and separately "what address(es) should I listen on". Multiple listen addresses enable multihomed hosts to get the right connectivity, too.
I'm not clear on what you're saying. An address is a rendezvous point; if you're not listening on the address that others are using, you won't meet. What good does it do to split them up?
It's not clear to me why you think Mailman might want to be multihomed (which is different from the host wanting to be multihomed). I don't object to the work to make it so. But my paranoia says it's best if Mailman is cordoned off into the DMZ with webservers and other hazmat. By design, Mailman core doesn't need to talk to the Internet or to the secure internal network -- Internet functions are delegated to the MTA and the webapps, and local configuration is via shell access which in principle could be console.
I'd rather not enable the complexity of a multihomed application unless there's an important use case. (I don't make the final decision, but I am listened to by Those Who Do.)
Footnotes: [1] gunicorn is the Python module that manages web services for the REST interface.
Stephen J. Turnbull wrote:
Ruth Ivimey-Cook writes:
On 21/06/2020 04:52, Mark Sapiro wrote:
Change that to greyarea-post
Problem is, if I do that (which I did in the beginning), the core rest > service listens on localhost / 127.0.0.1
I don't understand. I'm not in a position to test with Mailman itself right now, but the standalone gunicorn[1] does the right thing (takes an IP address and binds to the corresponding interface, or takes a domain name, resolves it to >the IP address, and binds that to the corresponding interface). It does not substitute localhost.
Sorry not to reply to ruth on this before, my main experience of this issue was running in a Kubernetes cluster using Docker, using https://github.com/maxking/docker-mailman as the basis for the images. What follows is my own research on this issue and it may be completely wrong.
Taking the default Mailman Core setting:
[webservice] hostname: localhost
This makes Mailman-Core listen on the loopback address (via name lookup) on the configured port for incoming REST requests. When it responds to these requests, it sends out in the response a self_link address which matches the [webservice] hostname Parameter. Connecting clients (such as Postorius) need to connect to Mailman-Core on the given hostname (eg http://localhost in the default case where Mailman Core and Web components are installed to the same machine.
However if the [webservice] hostname: parameter is set to something like 0.0.0.0, then although you can direct Postorius (via DNS lookup or IP address) at the Mailman Core instance, it fails to work properly because the resulting response sent back from Mailman Core will include self_links pointing at 0.0.0.0 which will not be usable as a URL by the REST client.
A fix for this issue is to set the [webservice] hostname parameter to a FQDN or IP address which can be resolved by the machine running the Web components, and as long as the base URL that Postorius tries to use for the Mailman Core access is the same as in the [webservice] hostname parameter, you are good to go.
It appears that when specifying a FQDN in the [webservice] hostname parameter, that the system will look up the IP address for that host using the systems name resolution function. This has the side affect of potentially listening on the wrong interface (localhost) if the system hosts file is set such that the machine's FQDN points to 127.0.0.1. In my case my containers were behind CoreDNS and this wasn't an issue for me, but had I tried this for example on a Debian system I would have come against the same issue that Ruth has as it will automatically set the FQDN of the system to resolve to 127.0.0.1 by way of the Hosts file. Adding the network IP address of the system with the FQDN in the hosts file should fix this issue.
IMO, the hostname parameter needs to be split up into a "what address > should the rest of the world use to address me" and separately "what > address(es) should I listen on". Multiple listen addresses enable > multihomed >hosts to get the right connectivity, too.
I'm not clear on what you're saying. An address is a rendezvous point; if you're not listening on the address that others are using, you won't meet. What good does it do to split them up?
I agree with this, we need a setting for which interface for Gunicorn to listen on, and what Mailman Core sends out in the self_link attribute when responding to REST requests.
Thanks. Andrew.
IMO, the hostname parameter needs to be split up into a "what address > should the rest of the world use to address me" and separately "what > address(es) should I listen on". Multiple listen addresses enable > multihomed >hosts to get the right connectivity, too.
I'm not clear on what you're saying. An address is a rendezvous point; if you're not listening on the address that others are using, you won't meet. What good does it do to split them up? I agree with this, we need a setting for which interface for Gunicorn to listen on, and what Mailman Core sends out in the self_link attribute when responding to REST requests.
Exactly! :-)
Ruth
Stephen J. Turnbull wrote:
I don't understand. I'm not in a position to test with Mailman itself right now, but the standalone gunicorn[1] does the right thing (takes an IP address and binds to the corresponding interface, or takes a domain name, resolves it to the IP address, and binds that to the corresponding interface). It does not substitute localhost.
The localhost address is coming in via the hosts file, which is a relatively standard practice IMO. I could change it, but then perhaps other things would not work. I would prefer to be able to specify what I need. There are also things like systemd-resolvd which will do the localhost-substitution for you even if /etc/hosts doesn't.
I did try with the gunicorn patch Mark mentioned (host = "0.0.0.0") and changing the cfg file hostname back to the full name of the host, and that now works!! -- in that the problem I showed earlier is gone.
However I then get another issue. From the mailman.log file I now see in the last few lines. For clarity I've numbered them like this "{1}":
{1} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains HTTP/1.1" 200 320 "-" "GNU Mailman REST client v3.3.1" {2} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/ch-bc.org.uk HTTP/1.1" 200 215 "-" "GNU Mailman REST client v3.3.1" {3} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/greyarea-web1.cam.ivimey.org/lists?advertised=true&count=0&page=1 HTTP/1.1" 404 26 "-" "GNU Mailman REST client v3.3.1"
At this point the browser screen reports "Something went wrong" and the message "HTTP Error 404: {"title": "404 Not Found"}"
Previously, the process stopped at line {1}. Now that and line {2} is executed (AFAIK) correctly, but then line {3} is isssued. I have no idea why something is looking for "domains/greyarea-web1":
- Only the "ch-bc.org.uk" domain is listed in the reply to line {1}, as expected;
- "greyarea-web1" doesn't appear in any lists I can show with the "mailman" cli command (and I think I tried them all);
- "greyarea-web1" doesn't appear anywhere in the source or config other than as the name of the host its running on or as an ALLOWED_HOST;
- I tried using "curl -X DELETE" (from the api docs) to delete a domain called that and it says no such list exists.
My only possible clue is that I think, at some point in the journey of configuring mailman, I might have created a list called that on the host. However since then I have wiped the install files and reinstalled. I cannot make sense of this being the answer given the checks just listed.
I went looking through the python source and can see that the get() function does indeed throw a 404 exception if the domain isn't found, but I cannot see which code is parsing the reply from the first line
So I have two questions:
How can I fix this problem?
why does getting an 404 reply here cause the postorius web page to produce a "something went wrong" /404 and nothing else, when it could have simply reported that something had gone wrong with one specific domain;
I'd rather not enable the complexity of a multihomed application unless there's an important use case. (I don't make the final decision, but I am listened to by Those Who Do.)
You wonder about multihome hosts, but should a host have more than one interface (as several of my VMs and physical hosts do), being able to specify which of the possible addresses to listen on could be very useful in the case that "all of them" is the wrong answer.
There is no complexity for the developers beyond what is already supported and working. The only point I can see wrong here is the mailman3-core is using one hostname value for two distinct purposes, and IMO it shouldn't.
Ideally, and this is icing on the cake, a listen address specification should allow multiple IPs to be passed in, in which case the server listens on all of them. I expect the python libraries already support this.
Thanks to all for the help so far,
Ruth
On 6/22/20 12:21 PM, Ruth Ivimey-Cook wrote:
The localhost address is coming in via the hosts file, which is a relatively standard practice IMO. I could change it, but then perhaps other things would not work. I would prefer to be able to specify what I need. There are also things like systemd-resolvd which will do the localhost-substitution for you even if /etc/hosts doesn't.
I have done further testing. See my comment at <https://gitlab.com/mailman/mailman/-/issues/735#note_366074014>.
You need to set hostname: in mailman.cfg to the actual external IP address of the Mailman host machine or to a name which resolves to that IP both internally and externally, and also set that host name/IP in the MAILMAN_REST_API_URL setting on the Postorius machine.
I did try with the gunicorn patch Mark mentioned (host = "0.0.0.0") and changing the cfg file hostname back to the full name of the host, and that now works!! -- in that the problem I showed earlier is gone.
Remove that patch and do the above.
However I then get another issue. From the mailman.log file I now see in the last few lines. For clarity I've numbered them like this "{1}":
{1} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains HTTP/1.1" 200 320 "-" "GNU Mailman REST client v3.3.1" {2} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/ch-bc.org.uk HTTP/1.1" 200 215 "-" "GNU Mailman REST client v3.3.1" {3} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/greyarea-web1.cam.ivimey.org/lists?advertised=true&count=0&page=1 HTTP/1.1" 404 26 "-" "GNU Mailman REST client v3.3.1"
At this point the browser screen reports "Something went wrong" and the message "HTTP Error 404: {"title": "404 Not Found"}"
Make the above changes and see if that solves this. If not, let us know and we'll try to help.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark,
Thank you for your thoughts so far.
On 6/23/20 3:22 PM, Mark Sapiro wrote:
I have done further testing. See my comment at <https://gitlab.com/mailman/mailman/-/issues/735#note_366074014>.
You need to set hostname: in mailman.cfg to the actual external IP address of the Mailman host machine or to a name which resolves to that IP both internally and externally, and also set that host name/IP in the MAILMAN_REST_API_URL setting on the Postorius machine.
I've done that, but there is no observed change in the browser (still get same 404 error).
Specifically, I reset hostname to be a newly-added CNAME of the actual host address, and ensured that the 0.0.0.0 addresses were no longer there.
However I then get another issue. From the mailman.log file I now see in the last few lines. For clarity I've numbered them like this "{1}":
{1} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains HTTP/1.1" 200 320 "-" "GNU Mailman REST client v3.3.1" {2} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/ch-bc.org.uk HTTP/1.1" 200 215 "-" "GNU Mailman REST client v3.3.1" {3} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/greyarea-web1.cam.ivimey.org/lists?advertised=true&count=0&page=1 HTTP/1.1" 404 26 "-" "GNU Mailman REST client v3.3.1"
At this point the browser screen reports "Something went wrong" and the message "HTTP Error 404: {"title": "404 Not Found"}" Make the above changes and see if that solves this. If not, let us know and we'll try to help.
A tcpdump now shows the GET of
/3.1/domains/mailman-web.ivimey.org/lists&advertised=true&count=0&page=1
directed at mailman-core and the reply being 404 not found.
The extract from core's mailman.log is now:
[24/Jun/2020:20:41:11 +0000] "GET /3.1/domains/mailman-web.ivimey.org/lists?advertised=true&count=0&page=1 HTTP/1.1" 404 26 "-" "GNU Mailman REST client v3.3.1"
And the entry for postorius in mailmansuite.log:
ERROR 2020-06-24 21:41:11,191 13310 postorius Un-handled exception: HTTP Error 404: {"title": "404 Not Found"} Traceback (most recent call last): File "/opt/mailman3/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) [snip] File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/page.py", line 37, in __init__ self._create_page() File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/page.py", line 62, in _create_page response, content = self._connection.call(self._build_url()) File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/connection.py", line 116, in call error_msg, response, None) urllib.error.HTTPError: HTTP Error 404: {"title": "404 Not Found"}
Note that the mailman-web.ivimey.org address is the newly-added CNAME of greyarea-web1, mailman-core is likewise the CNAME of greyarea-post.
There is nothing I can see in the other logs that looks out of the ordinary.
I guess this indicates that my earlier wondering if I had something left over from a past install is dispelled, as that name didn't exist until today.
On 6/24/20 2:10 PM, Ruth Ivimey-Cook wrote:
Mark,
You need to set hostname: in mailman.cfg to the actual external IP address of the Mailman host machine or to a name which resolves to that IP both internally and externally, and also set that host name/IP in the MAILMAN_REST_API_URL setting on the Postorius machine.
I've done that, but there is no observed change in the browser (still get same 404 error).
Specifically, I reset hostname to be a newly-added CNAME of the actual host address, and ensured that the 0.0.0.0 addresses were no longer there.
OK, So setting hostname: to this CNAME which resolves to the external IP on the Mailman core host allows the REST interface to listen on that external interface and solves that issue without patching the rest runner. Correct?
A tcpdump now shows the GET of
/3.1/domains/mailman-web.ivimey.org/lists&advertised=true&count=0&page=1
directed at mailman-core and the reply being 404 not found.
The extract from core's mailman.log is now:
[24/Jun/2020:20:41:11 +0000] "GET /3.1/domains/mailman-web.ivimey.org/lists?advertised=true&count=0&page=1 HTTP/1.1" 404 26 "-" "GNU Mailman REST client v3.3.1"
And the entry for postorius in mailmansuite.log:
ERROR 2020-06-24 21:41:11,191 13310 postorius Un-handled exception: HTTP Error 404: {"title": "404 Not Found"} Traceback (most recent call last): File "/opt/mailman3/lib/python3.6/site-packages/django/core/handlers/base.py", line 113, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) [snip] File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/page.py", line 37, in __init__ self._create_page() File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/page.py", line 62, in _create_page response, content = self._connection.call(self._build_url()) File "/opt/mailman3/lib/python3.6/site-packages/mailmanclient/restbase/connection.py", line 116, in call error_msg, response, None) urllib.error.HTTPError: HTTP Error 404: {"title": "404 Not Found"}
Note that the mailman-web.ivimey.org address is the newly-added CNAME of greyarea-web1, mailman-core is likewise the CNAME of greyarea-post.
There is nothing I can see in the other logs that looks out of the ordinary.
I guess this indicates that my earlier wondering if I had something left over from a past install is dispelled, as that name didn't exist until today.
OK, but you still haven't reported whether you looked at the domains in Django. These are in the Django database on the Postorius server, and I don't know what's there, but if there is a domain there which is not known to core, that could cause this 404. Further if that domain is not fully qualified, something could just be appending the hostname: to it causing that name to be the one requested.
Some of this is guesswork and possibly overreaching on my part, but I would like to be assured that there are no extraneous domains in Django's database.
I wrote in the post at https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
Go to your Django admin web UI and look at your Django Mailman 3 Mail domains. This will be at a URL like https://greyarea-web1.cam.ivimey.org/admin/django_mailman3/maildomain/
Have you done that?
I want to help resolve this issue, but right now I want to rule out the possibility that there are domains known to Django that aren't known to core.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 6/22/20 12:23 PM, Ruth Ivimey-Cook wrote:
I agree with this, we need a setting for which interface for Gunicorn to listen on, and what Mailman Core sends out in the self_link attribute when responding to REST requests.
Exactly! :-)
Except if there were a bind_host
setting to specify the host to bind
the REST API to and you set that to 0.0.0.0
you would have exactly
what leads to the situation below, so we need to understand what's
happening there (although I think I do - see below).
On 6/22/20 12:21 PM, Ruth Ivimey-Cook wrote:
I did try with the gunicorn patch Mark mentioned (host = "0.0.0.0") and changing the cfg file hostname back to the full name of the host, and that now works!! -- in that the problem I showed earlier is gone.
However I then get another issue. From the mailman.log file I now see in the last few lines. For clarity I've numbered them like this "{1}":
{1} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains HTTP/1.1" 200 320 "-" "GNU Mailman REST client v3.3.1" {2} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/ch-bc.org.uk HTTP/1.1" 200 215 "-" "GNU Mailman REST client v3.3.1" {3} [22/Jun/2020:00:58:24 +0000] "GET /3.1/domains/greyarea-web1.cam.ivimey.org/lists?advertised=true&count=0&page=1 HTTP/1.1" 404 26 "-" "GNU Mailman REST client v3.3.1"
At this point the browser screen reports "Something went wrong" and the message "HTTP Error 404: {"title": "404 Not Found"}"
Tracebacks would really help. You should be able to find them in the log configured as LOGGING['handlers']['file']['filename'] in your Django settings.
Previously, the process stopped at line {1}. Now that and line {2} is executed (AFAIK) correctly, but then line {3} is isssued. I have no idea why something is looking for "domains/greyarea-web1":
- Only the "ch-bc.org.uk" domain is listed in the reply to line {1}, as expected;
Go to your Django admin web UI and look at your Django Mailman 3 Mail domains. This will be at a URL like https://greyarea-web1.cam.ivimey.org/admin/django_mailman3/maildomain/
Do you see this domain there? If so, delete it.
- "greyarea-web1" doesn't appear in any lists I can show with the "mailman" cli command (and I think I tried them all);
But that's looking at Mailman core which is not where Postorius is getting this list.
- "greyarea-web1" doesn't appear anywhere in the source or config other than as the name of the host its running on or as an ALLOWED_HOST;
- I tried using "curl -X DELETE" (from the api docs) to delete a domain called that and it says no such list exists.
Again, you're talking to core, not Django.
My only possible clue is that I think, at some point in the journey of configuring mailman, I might have created a list called that on the host. However since then I have wiped the install files and reinstalled. I cannot make sense of this being the answer given the checks just listed.
Did you wipe the database on greyarea-web1?
I went looking through the python source and can see that the get() function does indeed throw a 404 exception if the domain isn't found, but I cannot see which code is parsing the reply from the first line
So I have two questions:
- How can I fix this problem?
See above.
- why does getting an 404 reply here cause the postorius web page to produce a "something went wrong" /404 and nothing else, when it could have simply reported that something had gone wrong with one specific domain;
Because the code isn't robust enough :(
...
Ideally, and this is icing on the cake, a listen address specification should allow multiple IPs to be passed in, in which case the server listens on all of them. I expect the python libraries already support this.
We could implement a bind_address setting that would pass to gunicorns --bind parameter anything acceptable to gunicorn, although multiple values are tricky as they apparently require multiple -b/--bind options and a single -b/--bind doesn't support a list of values.
But, I'm not convinced that this is something anyone needs. I'm still waiting for a response to <https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...>.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 24/06/2020 00:20, Mark Sapiro wrote:
Go to your Django admin web UI and look at your Django Mailman 3 Mail domains. This will be at a URL like https://greyarea-web1.cam.ivimey.org/admin/django_mailman3/maildomain/
Do you see this domain there? If so, delete it.
I hadn't logged in as admin before, so having done so I can now confirm that having done so this screen shows "0 mail domains"
Because I could, I then visited "VIEW SITE" and see a mailing lists page, which is I think the same one I have been trying to visit, except of course now I am logged in.
I do now see the correct, and only, mailing list configured on the core server.
Returning to the previous "maildomain" page, that now shows the expected mailing list as well.
Other admin pages seem to be working fine as well.
I think, possibly, that it is now installed and working. ... But I don't know why.
Ruth
-- Software Manager & Engineer Tel: 01223 414180 Blog: http://www.ivimey.org/blog LinkedIn: http://uk.linkedin.com/in/ruthivimeycook/
On 6/24/20 6:14 PM, Ruth Ivimey-Cook wrote:
I think, possibly, that it is now installed and working. ... But I don't know why.
It's working because it should. The question is why wasn't it ;)
It may have something to do with being logged in. It may fail again under the right circumstances. If so, try to narrow down what those are.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Ruth Ivimey-Cook writes:
The localhost address is coming in via the hosts file, which is a relatively standard practice IMO.
OK, so I did a little research on this. You don't need to read this to continue discussion, but since other users have expressed interest in splitting hostname from listener IP address, I am posting what I've learned about aliasing hostname to localhost. This doesn't apply directly to multihomed hosts as also mentioned in this thread, and additions and corrections are, of course, welcome.
It had been my understanding that conventional wisdom is that "127.0.0.1 localhost" is *all* that you should put in /etc/hosts in most cases, so I did a couple of web searches. As far as my search-fu got me, there is no standard beyond putting localhost itself in there, which is used by a lot of boot scripts. But as a matter of common practice, the Debian family supplies an /etc/hosts that defaults to aliasing the host's FQDN to 127.0.1.1 (not 127.0.0.1), apparently as a kludge to keep GNOME happy: http://www.leonardoborda.com/blog/127-0-1-1-ubuntu-debian/
systemd-resolvd does something different; in the case that there is no interface configured *and* no /etc/hosts entry, it resolves the "local, configured hostname" to 127.0.0.2 (again, not 127.0.0.1). But if there is an interface configured, it resolves to that interface's IP. Apparently this is intended to speed up systemd-based boot processes by allowing them to run in parallel. Fedora issue tracker reports it leads to a problem similar to the one being discussed, eg, https://bugzilla.redhat.com/show_bug.cgi?id=1116538#c25
Bottom line: neither the Debian family nor the Red Hat family aliases hostname to localhost; by default it's a separate "interface". And Red Hat does it conditionally on no real interface found.
As far as I can tell, these workarounds are about placating software that breaks in one way or another if the host's own FQDN does not resolve at all when they start up, presumably because the resolver can't reach the network -- but they don't care if it's consistent with DNS as long as it connects to the local host, which leads to a fairly arbitrary practice of aliasing the hostname to localhost. All of which suggests to me that such software should be special-casing the host's various identifiers (hostname, canonical FQDN, /etc/hosts aliases, all of which it can easily discover) and translate them to "localhost", rather than asking the resolver to do it and creating an inconsistency with DNS.
Of course, getting other software to Do The Right Thing is pretty hopeless, but the solution is simple: wait for the network to come up and use the configured interface's address by looking up the canonical hostname. This works for GNOME and Postfix (mentioned in the references above), and for Mailman.
AFAICS, the exceptions are (1) running a high-availability "cloud" system that needs to scale very quickly on-demand (in which case you want to start things in parallel and need to deal with the real race conditions involved), and (2) a need to do work while disconnected. The former is over my pay grade but apparently Fedora has not managed to get it 100% reliable even with systemd. The latter case can be dealt with by putting your static IP address in /etc/hosts when you have one, which I presume
I could change it, but then perhaps other things would not work. I would prefer to be able to specify what I need.
Notwithstanding the discussion above about why it "shouldn't" break anything, I wouldn't be surprised if something breaks -- such configuration choices tend to ramify. I sympathize with you not wanting to change a configuration that works for you.
Steve
On Thu, Jun 25, 2020, at 12:33 AM, Stephen J. Turnbull wrote:
Ruth Ivimey-Cook writes:
The localhost address is coming in via the hosts file, which is a relatively standard practice IMO.
OK, so I did a little research on this. You don't need to read this to continue discussion, but since other users have expressed interest in splitting hostname from listener IP address, I am posting what I've learned about aliasing hostname to localhost. This doesn't apply directly to multihomed hosts as also mentioned in this thread, and additions and corrections are, of course, welcome.
It had been my understanding that conventional wisdom is that "127.0.0.1 localhost" is *all* that you should put in /etc/hosts in most cases, so I did a couple of web searches. As far as my search-fu got me, there is no standard beyond putting localhost itself in there, which is used by a lot of boot scripts. But as a matter of common practice, the Debian family supplies an /etc/hosts that defaults to aliasing the host's FQDN to 127.0.1.1 (not 127.0.0.1), apparently as a kludge to keep GNOME happy: http://www.leonardoborda.com/blog/127-0-1-1-ubuntu-debian/
systemd-resolvd does something different; in the case that there is no interface configured *and* no /etc/hosts entry, it resolves the "local, configured hostname" to 127.0.0.2 (again, not 127.0.0.1). But if there is an interface configured, it resolves to that interface's IP. Apparently this is intended to speed up systemd-based boot processes by allowing them to run in parallel. Fedora issue tracker reports it leads to a problem similar to the one being discussed, eg, https://bugzilla.redhat.com/show_bug.cgi?id=1116538#c25
Bottom line: neither the Debian family nor the Red Hat family aliases hostname to localhost; by default it's a separate "interface". And Red Hat does it conditionally on no real interface found.
As far as I can tell, these workarounds are about placating software that breaks in one way or another if the host's own FQDN does not resolve at all when they start up, presumably because the resolver can't reach the network -- but they don't care if it's consistent with DNS as long as it connects to the local host, which leads to a fairly arbitrary practice of aliasing the hostname to localhost. All of which suggests to me that such software should be special-casing the host's various identifiers (hostname, canonical FQDN, /etc/hosts aliases, all of which it can easily discover) and translate them to "localhost", rather than asking the resolver to do it and creating an inconsistency with DNS.
Of course, getting other software to Do The Right Thing is pretty hopeless, but the solution is simple: wait for the network to come up and use the configured interface's address by looking up the canonical hostname. This works for GNOME and Postfix (mentioned in the references above), and for Mailman.
AFAICS, the exceptions are (1) running a high-availability "cloud" system that needs to scale very quickly on-demand (in which case you want to start things in parallel and need to deal with the real race conditions involved), and (2) a need to do work while disconnected. The former is over my pay grade but apparently Fedora has not managed to get it 100% reliable even with systemd. The latter case can be dealt with by putting your static IP address in /etc/hosts when you have one, which I presume
Sorry to chime in pretty late on this topic, but I didn't know about the research you did above. Thanks Steve!
I have had some of these issues in the past, but they were related to Mail routing from Postfix to the Mailman container. The problem was that docker would assign these DNS names to each container based on their names and magically update every container's /etc/hsots
so they can talk to other containers through these docker given DNS names. However, the postfix running on the host won't be because docker doesn't put this DNS name anywhere else.
So, if I put 0.0.0.0 for LMTP server to bind to, it would use that address to generate Postfix's transport maps and of course it won't be able to reach Core running at an IP assigned at runtime. But the way I got around the problem was by adding regex based routing for Postfix. Exim worked because all I had to do was make the directories visible on the host.
About HTTP requests, I would say Core doesn't really need to know the external address that it is reachable at, only the interface it needs to bind to. In a containerized environment, the IP/FQDN could actually change, so if we are optimizing this for situations like containers/multi-host, it would require an Admin to update the configs.
Instead, what I was thinking was to just ignore the host, port in the self_link in Mailmanclient. The purpose of self_link is mostly to reflect the path to the resource itself in the API and I am not sure if it requires the complete URL with scheme, host and port. That should allow Web components to reach the resources without Core knowing or caring about what address it is externally reachable at.
Thoughts?
I could change it, but then perhaps other things would not work. I would prefer to be able to specify what I need.
Notwithstanding the discussion above about why it "shouldn't" break anything, I wouldn't be surprised if something breaks -- such configuration choices tend to ramify. I sympathize with you not wanting to change a configuration that works for you.
Steve
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
-- thanks, Abhilash Raj (maxking)
On 26/06/2020 06:56, Abhilash Raj wrote:
About HTTP requests, I would say Core doesn't really need to know the external address that it is reachable at, only the interface it needs to bind to. In a containerized environment, the IP/FQDN could actually change, so if we are optimizing this for situations like containers/multi-host, it would require an Admin to update the configs.
Instead, what I was thinking was to just ignore the host, port in the self_link in Mailmanclient. The purpose of self_link is mostly to reflect the path to the resource itself in the API and I am not sure if it requires the complete URL with scheme, host and port. That should allow Web components to reach the resources without Core knowing or caring about what address it is externally reachable at.
That seems very sensible to me.
Indeed, I would make the self_link text itself exclude host, port, etc... django already has to have a working address so there is nothing gained by needing to repeat it.
Regards,
Ruth.
-- Software Manager & Engineer Tel: 01223 414180 Blog: http://www.ivimey.org/blog LinkedIn: http://uk.linkedin.com/in/ruthivimeycook/
Stephen,
On 6/22/20 6:23 PM, Stephen J. Turnbull wrote:
By design, Mailman core doesn't need to talk to the Internet or to the secure internal network -- Internet functions are delegated to the MTA and the webapps, and local configuration is via shell access which in principle could be console.
Sorry, I missed this earlier.
Ironically I am doing exactly what you are suggesting -- splitting the django frontend web app from the mailman core. In my current case I have put the mailman core on the same host as my mailserver, but that was for the sake of simplicity while I got that split sorted -- I am using exim and there is a big benefit when exim can see the list directory names. I may eventually move mailman-core elsewhere.
However, I find the mailman shell program not very intuitive, and so would prefer to have the web app for admin purposes.
Regards,
Ruth
Ruth Ivimey-Cook writes:
On 6/22/20 6:23 PM, Stephen J. Turnbull wrote:
By design, Mailman core doesn't need to talk to the Internet or to the secure internal network -- Internet functions are delegated to the MTA and the webapps, and local configuration is via shell access which in principle could be console.
Sorry, I missed this earlier.
I was more or less trying to describe what I thought you were trying to do, and noting that Mailman is designed for it. Thank you for the confirmation.
I am using exim and there is a big benefit when exim can see the list directory names.
Indeed. (I wrote the proof of concept Exim config for Mailman 3 on Debian.) But that's *all* it needs to see. It should be possible to get that information to Exim in some other way, perhaps through a small app that talks to the REST API. If that's generalizable to other MTAs (especially Postfix, but also Sendmail and maybe qmail) it's worth us looking into (and if it's Exim-specific, at this stage I'd say "patches welcome"). This is also the kind of thing that an NFS read-only mount does fine (of course you may not want NFS on a system in the DMZ :-/ ).
However, I find the mailman shell program not very intuitive, and so would prefer to have the web app for admin purposes.
By "local configuration" I mean anything that requires access to the .cfg files or installing code. Anything you can do via Postorius (or soon, via Affinity, hi Brian!) is not local configuration in this sense.
Hi Ruth,
I had a similar issue before and noted how the Docker system does this.
In mailman.cfg do something like:
[webservice] hostname: mailman-core.Domain.tld port: 8001
Set hostname to something that resolves to the current IP that you want external clients to connect to the REST interface on, potentially manipulate the Hosts file on the local machine if needed. Set port as desired.
This fixed a similar issue for me.
Thanks. Andrew.
I've created https://gitlab.com/mailman/mailman/-/issues/735 for this and I've also done an experiment and found what I think is a workaround. If you set
[webservice]
hostname: greyarea-post.cam.ivimey.org
or maybe just greyarea-post
would do, and then add to /etc/hosts
on greyarea-post
0.0.0.0 greyarea-post.cam.ivimey.org
or whatever name you used for hostname
this will get gunicorn to bind to the public interface and will include the proper hostname in the self_link
responses.
The current implementation of [webservice] hostname is confusing and difficult to work with. It's not uncommon on a Linux server to have the following in the /etc/hosts file so that the machine resolves it's own name. This is not connected to mailman at all, it could just be general practice.
/etc/hosts:
127.0.1.1 mailman.example.com mailman
However, if you have that in place, and then this mailman config file. /etc/mailman/mailman.cfg
[webservice] hostname: mailman.example.com port: 8001
mailman-core will only listen on localhost. Maybe you don't want that. It shouldn't be necessary to play games with /etc/hosts because the problem could be resolved this way:
[webservice] hostname: mailman.example.com bind_to: 0.0.0.0 port: 8001
or
[webservice] hostname: mailman.example.com listen: 0.0.0.0 port: 8001
Then any outgoing api messages or other communications would reference the so-called hostname, which remote clients would use, "mailman.example.com" .
If the "listen" or "bind_to" field is absent, fall-back to the current behavior.
To add some context, we had intended to split up mailman-core and mailman-web on separate machines. With such a configuration mailman-core should definitely listen on the public IP or 0.0.0.0.
At the same time, this configuration is a standard way to provision servers "in general":
/etc/hosts:
127.0.1.1 server1.example.com server1
There are multiple articles about the topic if you google "127.0.1.1". This technique allows a server's name to always be resolvable locally, it's always correct and doesn't need to be updated if the external IP address changes, it is robust to complexities about NAT or proxies. This allows an application on server to resolve its own hostname consistently. At the same time, a remote client would use traditional DNS, discover an IP (not "127.0.1.1"), and connect to the server remotely. Otherwise, every time the server's IP changes, you need take the extra step of modifying /etc/hosts. With server automation you'd need to discover the correct IP first.
So, adding a "listen" or "bind_to" field is a "bug report/request for enhancement". It would be helpful in allowing mailman-core and mailman-web to run on separate machines.
However our latest plan as of this moment is to revert back to a one-server implementation. mailman-core and mailman-web will be on the same machine. Thus, the enhancement is not critical. It still seems "theoretically" useful to implement the fix, unless the documentation states "it is recommended that core and web should be on the same machine" or similar.
On 1/18/24 05:36, samuel.d.darwin@gmail.com wrote:
However our latest plan as of this moment is to revert back to a one-server implementation. mailman-core and mailman-web will be on the same machine. Thus, the enhancement is not critical. It still seems "theoretically" useful to implement the fix, unless the documentation states "it is recommended that core and web should be on the same machine" or similar.
See the fourth paragraph at https://docs.mailman3.org/projects/mailman/en/latest/src/mailman/rest/docs/r...
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
the fourth paragraph: "Because the REST server has full administrative access, it should never be exposed to the public internet. By default it only listens to connections on localhost. Don’t change this unless you really know what you’re doing."
Let's say that you are trying to embed Hyperkitty into a larger Django project that's horizontally deployed in an autoscaling group or K8S. The mailman-core component would be better served from a fixed IP address and a single server. In order for mailman-web to communicate with mailman-core, it needs to listen on a network interface, not localhost. So, you'd like to have a config field, and listen on a reachable interface and then configure the AWS Security Groups or Firewall to limit access. The proposed "listen" field is optional, and in its absence, it defaults to the current behavior, which is localhost. This scenario matches the "really know what you’re doing" case. You'd be facilitating a more advanced architecture, although not requiring it, and still default to localhost. However, again I say "theoretical", because we ran into other difficulties with embedding Hyperkitty in a larger Django project. It seems easier to break that into a bite sized pieces, and only address them one at a time.
participants (6)
-
Abhilash Raj
-
Andrew Hodgson
-
Mark Sapiro
-
Ruth Ivimey-Cook
-
samuel.d.darwin@gmail.com
-
Stephen J. Turnbull