[MM3-users] Re: REST API notes

June 29, 2017 · *that*

      ...
On Jun 28, 2017, at 01:47 AM, tlhackque--- via Mailman-users wrote:
...
As a beginner with the REST API, I found it necessary to read code and
experiment to figure out how to get things done.
Thanks for the feedback.  Other than the work supporting Postorius and
HyperKitty, I think you've done some of the more extensive exploration of the
REST API, so your comments here (and greatly appreciated bug reports) are
quite valuable, and I think will help us make the REST API both more usable
and more discoverable.
I didn't plan to become so involved, but that's what happens when one
...
It's important to point out some history of the REST API.

I understand/expected that there was history; there is still art and
evolution in software design.
I hope the tone doesn't come across as critical; I only mean to report
what I found so you see it
from outside the development bubble.
based traversal, I was able to port the extensive existing REST implementation
to the new underlying library fairly easily.
All this to say that some of the problems you describe are due to historical
accident.  And that's served us well(-ish) while we only had to support
mailmanclient and P/HK.  But as the REST API is used in more places, those
deficiencies are becoming more glaring.  I think if we ever make progress on
Lemme (our authenticating REST proxy), we'll likely encounter the same
problems.  But at this point, it's infeasible to completely reimplement the
REST machinery.
I don't think it needs to be re-implemented.  There are some some fairly
simple things that could address the challenges I've found.  I'll try to
add some notes to/add new bug reports as I have a chance.  (But I'm
really trying to make progress on my project, of which this is a small
On 28-Jun-17 23:19, Barry Warsaw wrote:
tries to use something
new - perhaps for a slightly unexpected purpose.
part...)
...
Much of that is because these are doctests, i.e. testable documentation.  It's
a tradeoff between making them useful as docs but also testable without too
much clutter.  It's very much oriented toward Python because that makes
testability easier.  It would be good to have documentation for pure HTTP/JSON
consumers, but it would be imperative for that also to be testable so we're
sure it remains accurate, aside from any requirements to keep the docs in
sync.  Suggestions and contributions would be very much welcome here.
If the APIs are supposed to be equivalent, I'd try to define a
consistent mapping between the http (S!) API and the internal Python
API.  Since the implementation is pretty consistent, that shouldn't be
too painful.
If that can be done, then the same tests can verify the Python API and
prove that the HTTP functions are equivalent.
The other thing to consider is some tests that are higher level and can
become documentation examples.
E.g.
Starting with an empty installation, create a list with an owner and a
subscriber.  Which would be in order, every API call necessary to
implement this, and then verify (with GETs) that all the objects exist
with the right properties.
Currently that takes me about 220 lines of source (Perl, includes blank
and closing '}' lines, but still...)
Not counting loops, I think there are 10 places where I call the API for
what is expected to be an 'atomic' create of a list.
...
...
The REST API has a hybrid interface: Requests are made with
application/x-www-form-urlencoded POST, PUT, PATCH and delete http requests.
Requests are also accepted as parameter strings.
The responses are JSON.  (This is rather surprising - one would expect JSON
requests - and I hope someday they'll be accepted, as the split complicates
clients.  I suspect this evolved to simplify the initial Web GUI client
(Postorious), but it precludes using standard JSON-in/JSON-out client
libraries.)
Absolutely.  I want to support JSON-in/JSON-out.  Again, the current mismatch
is due to historical implementation decisions, but there's nothing in
principle preventing us from accepting JSON in, afaict.  We'd have to continue
to accept both, for backward compatibility reasons.
I noted somewhere that this should be easy to implement.  I expect that
you get an associative array ("dict"?) with the webform parameters.
Decoding JSON also ususally provides an associative array.  The
parameters have the same name.  So you should be able to address this at
your API entry point, e.g. if the inbound request is Content-Type:
application/json, decode as JSON; if x-www-forms-urlencode, decode as
today.  Then you have the same parameters, and they go downstream for
decoding in the array - again as today.
...
...
Error responses are often misidentified as content-type: application/json,
but actually contain a text/plain error message.  This isn't universally
true; for example the 401 response actually IS JSON.  A client has to guess
and handle decoding exceptions.
That's clearly a bug.  The Content-Type should always be accurate.  It would
be a new feature to also support JSON error responses.
Except that for some responses, you already do...
...
The API presents resources hierarchically, rooted at '/'.  The top level
resources include /users, /lists. /addresses, /domains, /system
The next level is a resource id.
Kind of.  It all depends on what the top-level resource is returned.  This
makes sense when you think about the object based traversal machinery.
Each path component points internally to an object, and the subsequent path
component is handled by that object.  If that returns another object, then the
next remaining path component is consumed by *that* object.  And so on until
we reach the end of the path.  Then the object at the leaf responds to the
HTTP command, and each object knows how to represent itself as JSON, and it
knows its canonical location within the resource tree (there can be multiple
paths to any particular object, but there's always one canonical location).
That's a better description from inside.  I was trying to summarize what
it looks like to someone trying to use the API....
...
There is a an id which is stable for the lifetime of an object, and a current
name (which can be changed).  For example, a list has a name like
mylist@example.com, and a list_id of mylist.example.com.  But if you change
the list name, it becomes mynewname@example.net, while the list_id remains
mylist.example.com.
Lists are weird in that they have two identifiers.  One is the posting
address, what we internally (and slightly incorrectly) call the "fully
qualified list name".  I don't particularly like that nomenclature anymore,
but we live with it.
We all have similar artifacts in any non-trivial code...
A posting address can change, e.g. if you rename or rehome a list.
The (RFC 2919) List-ID is assigned when the mailing list is created and it's
immutable.  Section 4 discourages changing the List-ID and Mailman takes that
as a requirement.  Rename a list or rehome it to a different domain on the
same server and the List-ID will never change.
Yes.
Again for historical reasons, many APIs both internal and external used the
posting address to identify the list, but this is wrong exactly because that
can change.  I've slowly been converting APIs to accept both the posting
address and the List-ID when identifying the mailing list.  New APIs generally
accept only the List-ID.  Bottom line, it's best to use the List-ID.
I noticed that, and indeed go to some pains to fetch the list_id for
chained operations.
...
Lists are associated with "domains", which are the "domain part" of the
list's address.  That is, the part after the @.  This is sometimes referred
to as the "mail host", but there need not be a real host.
Yep, again historical nomenclature.
I understand the history.  But I was very, very confused by the overlaid
uses.  Even in the GUI, I couldn't decide what I should enter.  (And I'm
fairly familiar with e-mail RFCs and history...perhaps that's my
I haven't looked at the code (not being a Python person), but I'd be
surprised if this was not feasible.
problem.)  Documentation needs improvement.  Feel free to use (or
improve) my words.
...
...
A domain has to be created before you can create a list with an address in
that domain.
Some APIs (e.g. mailman create CLI) will create the domain automatically if
it doesn't already exist, and unless you disable that explicitly.
I was writing exclusively about the REST API.  I couldn't create a list
without the domain, so I create the domain unconditionally (and ignore
the 'it exists' error); that's cheaper than query+create.  "Create if
not exists" would be a useful API extension, since clients would see a
success code either way.  (200 OK for exists, 201 Created for new).
...
...
Resources are created with a POST to their top-level resource.  To create a
domain, post to /domains with mail_host => the domain, and (optionally)
description => a description for the GUI.  The response isn't JSON as one
would expect.  In fact, it's an empty application/json response with a 201
status.
To create a list, one POSTs to /lists.  This post takes a restricted set of
parameters; in the case of a list, just its fqdn_listname, (and an optional
style - which isn't well defined).  The response isn't JSON as one would
expect.  The Location header of the response contains a URL of the new list.
What else would you expect?  From my reading of books like RESTful Web
Services (admittedly a long while ago), that's exactly the proper response to
an appending POST.  Return a 201 with a Location header to the new resource
and empty content.
I didn't read the book - but I have talked to a number of services that
claim to provide a RESTful API.
Of course, REST is a style, not a specification, so mileage varies...
I would expect the new resource address to be in the JSON body of the
response.  I don't mind the Location header (if you imagine the response
to be a 301, it kind of makes sense).  But a client (well, at least
mine) wants a consistent place to look for the result of an operation.
That would be the HTTP Status + JSON reply.
This is inconsistent - and with JSON REST client libraries, it turns out
to be more difficult to get at the headers.
(I patched the one that I use...)
My 3 cents: if an interface is JSON, it should be usable entirely as
JSON.  JSON in, all JSON out....  This simplifies life for a client:
Get a response, decode the JSON - and everything is in the decoded JSON
(usually an array or associative array in whatever language).  Not some
in the JSON, some in the Response headers (and some in the HTTP status -
which usually is replicated in the response JSON), and some plaintext in
the response body.
...
...
To configure the list, you have to follow up with a PUT or PATCH to that URL
programatic way to determine which attributes are writable.  PATCH what you

/config This is where you can set the description, posting policy, etc.
It's unrealistic to do a PUT; even if you're cloning another list, there's no

know...
Mailing list resources are somewhat unique in that they probably have the most
properties of any resource/object in the system.  That's not surprising if you
think about it, but it does make PUTing to a mailing list more or less
impractical.  That's certainly not true of other, smaller resources though,
and of course, you still want symmetry there (plus, implementation-wise, it's
almost a no brainer to support both PUT and PATCH).
The problem is that from a client, there is no way to know which
attributes are writable, and which are immutable.  So get (template),
modify, put (new) doesn't work.
There should be a way to find out what attributes are writable.  Or the
API could ignore writes to immutable attributes, and return "success"
(or in a JSON reply, "partial succes") to the client....
...
...
Members are associated with e-mail addresses - which belong to Users.
Users can have multiple addresses, addresses can be linked to only one user,
but may be unlinked, and members associate "subscribers" to mailing lists,
where a subscriber is either a user-with-preferred-address, or an address.
I saw that.  I didn't look into the unlinking part, as (oddly enough)
I'm only trying to do a restricted subset of the possible operations.
All I need to do is create & configure a list (with a couple of members)
on-the fly.  (Oh, and post an announcement of the fact on a related
list.)  The tricky part is that it's all done without human intervention.
...
...
You create a user by posting to /users with email => the email address, and
optionally display_name => the name string.  (A user can also be implicitly
created by subscribing an e-mail address, but that gets confusing.)  The
e-mail address is the primary email address for the user.  More later.
Again, you get a Location header in the response, which you can use to PATCH
/preferences to set delivery_status, etc.  .These preferences are part of a
hierarchy - many have a system default, a list default, the user default, and
a subscription to that list value.  You can find the User associated with a
list by a GET of /addresses/address@example.net.
Technically, the returned user isn't necessarily associated with a list.  It
*may* be subscribed to the list, but that relationship is represented by a
member.
I understood that.
...
This GET returns two URLs: user => the user owning this address, and
self_link => the address object.
*All* resources in the REST API have a self_link, and while that may seem
redundant, it's not.  As mentioned above, you can take various paths through
the resource tree to get to a particular resource, but regardless of that,
every resource has exactly one canonical location.  That location is
represented by the self_link.
Yes.
Note also that there are both an email and original_email attribute.  The
latter preserves case.  The former is used internally by Mailman as a
resource key, etc.  (Though exactly what happens if John@example.net and
john@example.net both subscribe is unclear.)
They can't.  Mailman is case-preserving case-insensitive for email addresses.
Technically speaking, john@example.com and JOHN@example.com can be different
mailboxes, but that never happens in practice anymore, and Mailman has always
explicitly treated them as the same address.  This goes back to the earliest
days of Mailman.
I still run into systems that demand exact case for delivery.  I agree
that two different humans assigned to such a mailbox would go (or
already are) crazy.  But I have seen John using john and JOHN as two
different identities.  I havesuggested using '+' subaddressing instead.
Then again, 70% of of the e-mail address validators that people come up
with reject clark+kent@example.net because of the '+'...
I am curious as to whether anything breaks if I subscribe to a mailman
list with a plus address, but it's low on my list...
...
...
Once a list is configured, you can add members.  This requires the list_id -
which you don't (officially) have.  So you do a GET on the list resource, to
get the list_id.  Then you can subscribe a member with list_id => (the list
id), subscriber => email address.  Optionally you can
pre_verify/confirm/approve the member and/or add a display_name..  POST to
/members.  Again, you get a Location header back.  You can't specify
everything you might like. as a creation attribute.  So you may have to PATCH
the member to, for example, set the moderation_status.
This is true, but it also kind of makes sense if you grok the way preferences
work, i.e. hierarchically as you describe above.  TBH, I don't particularly
like the way preferences are modeled either internally or through the REST
API, but I haven't been able to come up with anything better.
...
You may also need to PATCH the member /preferences if you want to set
...
...
list-level delivery status, etc.
All true.  In each of those cases, you may be talking to a different
preference resource.
...
Consider adding at least an owner when creating a list.
I think that would be a useful addition to the REST API for list creation.
...
One challenge is that almost everything requires multiple REST operations to
set up.  But REST (by definition) is stateless.  So the best you can do is
order operations & hope.
I don't understand this.  REST is stateless, but not all HTTP operations are
idempotent.  POST certainly isn't, by definition, so if you use it to create a
new resource under a collection, you clearly cannot modify that resource
through PUT or PATCH until the resource is created, which only happens when
POST succeeds.
I've written up an issue on this with some more detail/thoughts and will
post (er, no pun intended) it shortly.
The short form is that (as noted below), I want a list to atomically
create a list - either it exists (configured the way I need it to be),
or it doesn't exist.  That's not possible.  So I create a domain, create
a user, configure the user, create a list, configure the list, add the
user - but at any point in the middle, a message could arrive for the
non-quite-configured list.  This isn't good...
...
The mailman client examples refer to transactions (e.g in users.rst, there is
'transaction.commit(); - but REST can't hold state.  It does appear that the
server uses DB transactions to ensure that any given REST operation is ACID,
but the composite operations (e.g. create a list and set it's config) can not
be Atomic.  This is an architectural flaw in the API.
I think what you're getting at is that some POSTs to create new resources do
not accept the same parameters as the subsequent PUT or PATCH on the newly
created resource.  I think that may be true, but it's not universally so.
E.g. POST on /domain creates a new domain, returning the location of the new
resource.  You can provide both a description and an owner, and you can also
provide a mail_host.  But mail_host is immutable, and PATCH or PUT support
changing all mutable attributes on the domain.
That should be the general rule; if an attribute on a resource is mutable, you
should be able to PATCH and PUT it, *and* you should be able to specify all
mutable attributes, plus some immutable ones on the POST that creates the
resource.

Yes.  I can't specify everything about an object on many posts - list
creation the the most glaring example.
Also, things linked to the object - e.g. if I create a member, I have
to: POST (create the member); PATCH (set moderation_action;
PATCH(preferences) to set
(delivery_status,hid_address,receive_list_copy,receive_own_postings).
That's three REST calls for what should be one  - ATOMIC transaction.
Plus, this is separate from the list creation, so there's a window
between creating the list (even if it were atomic - which it isn't) and
the time its required members are subscribed.
...
GET should return all attributes, mutable or immutable.  This rule
may not be strictly adhered to for all resources and collections, but I would
consider that a bug, not an architectural flaw.
AFAIK, this is all working as any well-defined web service should work.
...
...
Symbolic names (which are required) for attributes are in the
src/mailman/interfaces/(class).py.
Please note that the interfaces under this package are for the *internal* API,
which often, but not always, is exposed in the REST API.  These two APIs serve
different purposes so they cannot be a one-to-one mapping, and there are many
resources in the REST API that don't correspond directly to internal objects,
and there are many internal objects that are not exposed to the REST API.
There was no other place I could find that defined attribute values for
enumerated items.  E.g.
delivery_status => 'by_user' - interfaces is where I found how to spell
'by_user'.
Maybe I missed something more obvious.
...
...
What you can get/set comes from src/mailman/rest/(class).py.
Yes, so clients of the REST API are served by objects in this package.
Plugins and other internal operations are served by the objects that implement
interfaces in the src/mailman/interfaces package.  This is a strict separation
of concerns, and it mirrors other 'external facing' interfaces, of which the
command line is another example.
I keep struggling with wanting to understand the external model of the
API without having to understand the internal implementation.   When the
API is formally documented, I encourage you to de-emphasize the internal
model and concentrate on the external interfaces in client terms...
Hope that helps.
Yes, thanks.
...
-Barry