On Jun 28, 2017, at 01:47 AM, tlhackque--- via Mailman-users wrote:
As a beginner with the REST API, I found it necessary to read code and experiment to figure out how to get things done. Thanks for the feedback. Other than the work supporting Postorius and HyperKitty, I think you've done some of the more extensive exploration of the REST API, so your comments here (and greatly appreciated bug reports) are quite valuable, and I think will help us make the REST API both more usable and more discoverable. I didn't plan to become so involved, but that's what happens when one
It's important to point out some history of the REST API.
I understand/expected that there was history; there is still art and evolution in software design. I hope the tone doesn't come across as critical; I only mean to report what I found so you see it from outside the development bubble. based traversal, I was able to port the extensive existing REST implementation to the new underlying library fairly easily.All this to say that some of the problems you describe are due to historical accident. And that's served us well(-ish) while we only had to support mailmanclient and P/HK. But as the REST API is used in more places, those deficiencies are becoming more glaring. I think if we ever make progress on Lemme (our authenticating REST proxy), we'll likely encounter the same problems. But at this point, it's infeasible to completely reimplement the REST machinery. I don't think it needs to be re-implemented. There are some some fairly simple things that could address the challenges I've found. I'll try to add some notes to/add new bug reports as I have a chance. (But I'm really trying to make progress on my project, of which this is a small
On 28-Jun-17 23:19, Barry Warsaw wrote: tries to use something new - perhaps for a slightly unexpected purpose. part...)
Much of that is because these are doctests, i.e. testable documentation. It's a tradeoff between making them useful as docs but also testable without too much clutter. It's very much oriented toward Python because that makes testability easier. It would be good to have documentation for pure HTTP/JSON consumers, but it would be imperative for that also to be testable so we're sure it remains accurate, aside from any requirements to keep the docs in sync. Suggestions and contributions would be very much welcome here.
If the APIs are supposed to be equivalent, I'd try to define a consistent mapping between the http (S!) API and the internal Python API. Since the implementation is pretty consistent, that shouldn't be too painful.
If that can be done, then the same tests can verify the Python API and prove that the HTTP functions are equivalent.
The other thing to consider is some tests that are higher level and can become documentation examples. E.g. Starting with an empty installation, create a list with an owner and a subscriber. Which would be in order, every API call necessary to implement this, and then verify (with GETs) that all the objects exist with the right properties.
Currently that takes me about 220 lines of source (Perl, includes blank and closing '}' lines, but still...) Not counting loops, I think there are 10 places where I call the API for what is expected to be an 'atomic' create of a list.
The REST API has a hybrid interface: Requests are made with application/x-www-form-urlencoded POST, PUT, PATCH and delete http requests. Requests are also accepted as parameter strings.
The responses are JSON. (This is rather surprising - one would expect JSON requests - and I hope someday they'll be accepted, as the split complicates clients. I suspect this evolved to simplify the initial Web GUI client (Postorious), but it precludes using standard JSON-in/JSON-out client libraries.) Absolutely. I want to support JSON-in/JSON-out. Again, the current mismatch is due to historical implementation decisions, but there's nothing in principle preventing us from accepting JSON in, afaict. We'd have to continue to accept both, for backward compatibility reasons. I noted somewhere that this should be easy to implement. I expect that you get an associative array ("dict"?) with the webform parameters. Decoding JSON also ususally provides an associative array. The parameters have the same name. So you should be able to address this at your API entry point, e.g. if the inbound request is Content-Type: application/json, decode as JSON; if x-www-forms-urlencode, decode as today. Then you have the same parameters, and they go downstream for decoding in the array - again as today.
Error responses are often misidentified as content-type: application/json, but actually contain a text/plain error message. This isn't universally true; for example the 401 response actually IS JSON. A client has to guess and handle decoding exceptions. That's clearly a bug. The Content-Type should always be accurate. It would be a new feature to also support JSON error responses. Except that for some responses, you already do...
The API presents resources hierarchically, rooted at '/'. The top level resources include /users, /lists. /addresses, /domains, /system
The next level is a resource id. Kind of. It all depends on what the top-level resource is returned. This makes sense when you think about the object based traversal machinery.
Each path component points internally to an object, and the subsequent path component is handled by that object. If that returns another object, then the next remaining path component is consumed by *that* object. And so on until we reach the end of the path. Then the object at the leaf responds to the HTTP command, and each object knows how to represent itself as JSON, and it knows its canonical location within the resource tree (there can be multiple paths to any particular object, but there's always one canonical location). That's a better description from inside. I was trying to summarize what it looks like to someone trying to use the API....
There is a an id which is stable for the lifetime of an object, and a current name (which can be changed). For example, a list has a name like mylist@example.com, and a list_id of mylist.example.com. But if you change the list name, it becomes mynewname@example.net, while the list_id remains mylist.example.com. Lists are weird in that they have two identifiers. One is the posting address, what we internally (and slightly incorrectly) call the "fully qualified list name". I don't particularly like that nomenclature anymore, but we live with it. We all have similar artifacts in any non-trivial code... A posting address can change, e.g. if you rename or rehome a list.
The (RFC 2919) List-ID is assigned when the mailing list is created and it's immutable. Section 4 discourages changing the List-ID and Mailman takes that as a requirement. Rename a list or rehome it to a different domain on the same server and the List-ID will never change. Yes. Again for historical reasons, many APIs both internal and external used the posting address to identify the list, but this is wrong exactly because that can change. I've slowly been converting APIs to accept both the posting address and the List-ID when identifying the mailing list. New APIs generally accept only the List-ID. Bottom line, it's best to use the List-ID. I noticed that, and indeed go to some pains to fetch the list_id for chained operations.
Lists are associated with "domains", which are the "domain part" of the list's address. That is, the part after the @. This is sometimes referred to as the "mail host", but there need not be a real host. Yep, again historical nomenclature. I understand the history. But I was very, very confused by the overlaid uses. Even in the GUI, I couldn't decide what I should enter. (And I'm fairly familiar with e-mail RFCs and history...perhaps that's my
I haven't looked at the code (not being a Python person), but I'd be surprised if this was not feasible. problem.) Documentation needs improvement. Feel free to use (or improve) my words.
A domain has to be created before you can create a list with an address in that domain. Some APIs (e.g.
mailman create
CLI) will create the domain automatically if it doesn't already exist, and unless you disable that explicitly.
I was writing exclusively about the REST API. I couldn't create a list without the domain, so I create the domain unconditionally (and ignore the 'it exists' error); that's cheaper than query+create. "Create if not exists" would be a useful API extension, since clients would see a success code either way. (200 OK for exists, 201 Created for new).
Resources are created with a POST to their top-level resource. To create a domain, post to /domains with mail_host => the domain, and (optionally) description => a description for the GUI. The response isn't JSON as one would expect. In fact, it's an empty application/json response with a 201 status.
To create a list, one POSTs to /lists. This post takes a restricted set of parameters; in the case of a list, just its fqdn_listname, (and an optional style - which isn't well defined). The response isn't JSON as one would expect. The Location header of the response contains a URL of the new list. What else would you expect? From my reading of books like RESTful Web Services (admittedly a long while ago), that's exactly the proper response to an appending POST. Return a 201 with a Location header to the new resource and empty content.
I didn't read the book - but I have talked to a number of services that claim to provide a RESTful API. Of course, REST is a style, not a specification, so mileage varies...
I would expect the new resource address to be in the JSON body of the response. I don't mind the Location header (if you imagine the response to be a 301, it kind of makes sense). But a client (well, at least mine) wants a consistent place to look for the result of an operation. That would be the HTTP Status + JSON reply.
This is inconsistent - and with JSON REST client libraries, it turns out to be more difficult to get at the headers. (I patched the one that I use...)
My 3 cents: if an interface is JSON, it should be usable entirely as JSON. JSON in, all JSON out.... This simplifies life for a client:
Get a response, decode the JSON - and everything is in the decoded JSON (usually an array or associative array in whatever language). Not some in the JSON, some in the Response headers (and some in the HTTP status - which usually is replicated in the response JSON), and some plaintext in the response body.
To configure the list, you have to follow up with a PUT or PATCH to that URL programatic way to determine which attributes are writable. PATCH what you
- /config This is where you can set the description, posting policy, etc. It's unrealistic to do a PUT; even if you're cloning another list, there's no
know... Mailing list resources are somewhat unique in that they probably have the most properties of any resource/object in the system. That's not surprising if you think about it, but it does make PUTing to a mailing list more or less impractical. That's certainly not true of other, smaller resources though, and of course, you still want symmetry there (plus, implementation-wise, it's almost a no brainer to support both PUT and PATCH).
The problem is that from a client, there is no way to know which attributes are writable, and which are immutable. So get (template), modify, put (new) doesn't work.
There should be a way to find out what attributes are writable. Or the API could ignore writes to immutable attributes, and return "success" (or in a JSON reply, "partial succes") to the client....
Members are associated with e-mail addresses - which belong to Users. Users can have multiple addresses, addresses can be linked to only one user, but may be unlinked, and members associate "subscribers" to mailing lists, where a subscriber is either a user-with-preferred-address, or an address. I saw that. I didn't look into the unlinking part, as (oddly enough) I'm only trying to do a restricted subset of the possible operations. All I need to do is create & configure a list (with a couple of members) on-the fly. (Oh, and post an announcement of the fact on a related list.) The tricky part is that it's all done without human intervention.
You create a user by posting to /users with email => the email address, and optionally display_name => the name string. (A user can also be implicitly created by subscribing an e-mail address, but that gets confusing.) The e-mail address is the primary email address for the user. More later. Again, you get a Location header in the response, which you can use to PATCH /preferences to set delivery_status, etc. .These preferences are part of a hierarchy - many have a system default, a list default, the user default, and a subscription to that list value. You can find the User associated with a list by a GET of /addresses/address@example.net. Technically, the returned user isn't necessarily associated with a list. It *may* be subscribed to the list, but that relationship is represented by a member. I understood that.
This GET returns two URLs: user => the user owning this address, and self_link => the address object. *All* resources in the REST API have a self_link, and while that may seem redundant, it's not. As mentioned above, you can take various paths through the resource tree to get to a particular resource, but regardless of that, every resource has exactly one canonical location. That location is represented by the self_link. Yes. Note also that there are both an email and original_email attribute. The latter preserves case. The former is used internally by Mailman as a resource key, etc. (Though exactly what happens if John@example.net and john@example.net both subscribe is unclear.) They can't. Mailman is case-preserving case-insensitive for email addresses. Technically speaking, john@example.com and JOHN@example.com can be different mailboxes, but that never happens in practice anymore, and Mailman has always explicitly treated them as the same address. This goes back to the earliest days of Mailman. I still run into systems that demand exact case for delivery. I agree that two different humans assigned to such a mailbox would go (or already are) crazy. But I have seen John using john and JOHN as two different identities. I havesuggested using '+' subaddressing instead. Then again, 70% of of the e-mail address validators that people come up with reject clark+kent@example.net because of the '+'...
I am curious as to whether anything breaks if I subscribe to a mailman list with a plus address, but it's low on my list...
Once a list is configured, you can add members. This requires the list_id - which you don't (officially) have. So you do a GET on the list resource, to get the list_id. Then you can subscribe a member with list_id => (the list id), subscriber => email address. Optionally you can pre_verify/confirm/approve the member and/or add a display_name.. POST to /members. Again, you get a Location header back. You can't specify everything you might like. as a creation attribute. So you may have to PATCH the member to, for example, set the moderation_status. This is true, but it also kind of makes sense if you grok the way preferences work, i.e. hierarchically as you describe above. TBH, I don't particularly like the way preferences are modeled either internally or through the REST API, but I haven't been able to come up with anything better.
You may also need to PATCH the member /preferences if you want to set
list-level delivery status, etc. All true. In each of those cases, you may be talking to a different preference resource.
Consider adding at least an owner when creating a list. I think that would be a useful addition to the REST API for list creation.
One challenge is that almost everything requires multiple REST operations to set up. But REST (by definition) is stateless. So the best you can do is order operations & hope. I don't understand this. REST is stateless, but not all HTTP operations are idempotent. POST certainly isn't, by definition, so if you use it to create a new resource under a collection, you clearly cannot modify that resource through PUT or PATCH until the resource is created, which only happens when POST succeeds. I've written up an issue on this with some more detail/thoughts and will post (er, no pun intended) it shortly. The short form is that (as noted below), I want a list to atomically create a list - either it exists (configured the way I need it to be), or it doesn't exist. That's not possible. So I create a domain, create a user, configure the user, create a list, configure the list, add the user - but at any point in the middle, a message could arrive for the non-quite-configured list. This isn't good...
The mailman client examples refer to transactions (e.g in users.rst, there is 'transaction.commit(); - but REST can't hold state. It does appear that the server uses DB transactions to ensure that any given REST operation is ACID, but the composite operations (e.g. create a list and set it's config) can not be Atomic. This is an architectural flaw in the API. I think what you're getting at is that some POSTs to create new resources do not accept the same parameters as the subsequent PUT or PATCH on the newly created resource. I think that may be true, but it's not universally so. E.g. POST on /domain creates a new domain, returning the location of the new resource. You can provide both a description and an owner, and you can also provide a mail_host. But mail_host is immutable, and PATCH or PUT support changing all mutable attributes on the domain.
That should be the general rule; if an attribute on a resource is mutable, you should be able to PATCH and PUT it, *and* you should be able to specify all mutable attributes, plus some immutable ones on the POST that creates the resource.
Yes. I can't specify everything about an object on many posts - list creation the the most glaring example.
Also, things linked to the object - e.g. if I create a member, I have to: POST (create the member); PATCH (set moderation_action; PATCH(preferences) to set (delivery_status,hid_address,receive_list_copy,receive_own_postings). That's three REST calls for what should be one - ATOMIC transaction.
Plus, this is separate from the list creation, so there's a window between creating the list (even if it were atomic - which it isn't) and the time its required members are subscribed.
GET should return all attributes, mutable or immutable. This rule may not be strictly adhered to for all resources and collections, but I would consider that a bug, not an architectural flaw.
AFAIK, this is all working as any well-defined web service should work.
Symbolic names (which are required) for attributes are in the src/mailman/interfaces/(class).py. Please note that the interfaces under this package are for the *internal* API, which often, but not always, is exposed in the REST API. These two APIs serve different purposes so they cannot be a one-to-one mapping, and there are many resources in the REST API that don't correspond directly to internal objects, and there are many internal objects that are not exposed to the REST API. There was no other place I could find that defined attribute values for enumerated items. E.g. delivery_status => 'by_user' - interfaces is where I found how to spell 'by_user'.
Maybe I missed something more obvious.
What you can get/set comes from src/mailman/rest/(class).py. Yes, so clients of the REST API are served by objects in this package. Plugins and other internal operations are served by the objects that implement interfaces in the src/mailman/interfaces package. This is a strict separation of concerns, and it mirrors other 'external facing' interfaces, of which the command line is another example. I keep struggling with wanting to understand the external model of the API without having to understand the internal implementation. When the API is formally documented, I encourage you to de-emphasize the internal model and concentrate on the external interfaces in client terms... Hope that helps. Yes, thanks.
-Barry