Ahoi,
except for [1] i was not able to find anything about what kind of data will be stored in the database(s) of a mailman3-system.
could somebody kindly point me to some documentation i perhaps missed out?
or share some insights?
greetings aiz
P.S. this post is related to another about database sizing
[1] https://docs.mailman3.org/en/latest/architecture.html#understanding-user-and...
Alexander Inzinger-Zrock via Mailman-users writes:
except for [1] i was not able to find anything about what kind of data will be stored in the database(s) of a mailman3-system.
I don't think there's a formal schema in the documentation. It's an object database backed by a relational database managed by SQLAlchemy. To get the SQL schema you can get it from your RDBMS, I suppose.
Regarding the semantics of the data, it's in mailman/src/mailman/models.
About sizes, I have a brand-new, nearly empty dedicated Mailman PostgreSQL installation that takes up 69MB for the whole cluster. (That's for 129 users, split across 6 lists of about 25 subscribers each, data for all three components: Postorius, HyperKitty, and core. HyperKitty usage is neglible, only 10 messages total so far.)
A Mailman 3 installation with a 20 year history (recently upgraded from Mailman 2), somewhere between 500 and 1000 lists, and on the order of 100,000 users takes up 103MB as a text dump[1]. That's going to compress by a double-digit factor when loaded back into Postgres, I bet. This is just core and Postorius data.[2] In other words, the mailing list cost of Mailman is going to be insignificant compared to the overhead of a PostgreSQL installation unless you're really huge.
There's a third system I don't have access to any more (another recent upgrade), and IIRC the (dedicated to Mailman) database cluster took up about 128MB on disk, with ~20k lists and ~50k users. I don't even want to think about mail throughput! Some of that database space was archives, but it's hard to say how much. 10s of megabytes at least, probably (the predecessor Mailman 2 installation was about 15 years old). The primary use case for lists was periodic automatically generated reports on system health and usage statistics, and notifications, which weren't archived. But there were hundreds of discussion and announce lists for customers and internal users, many of which are archived.
The Mailman and Postorius tables in the database don't grow significantly with time. The deciding factors are the number of users and the number of lists.
So the bottom line is, Mailman database size and growth is a rounding error on any modern system.
Archives are another matter. But again, the database cost of the metadata is insignificant compared to message storage and (likely to be even bigger) full-text indexing. I'd estimate that the 500-list site adds about 5TB of message data a year and about the same in indexing.[2] But the rate of growth for any given site is going to be extremely case-dependent, and the part that's due to the database metadata is going to be a tiny fraction, unless your users are extremely economical with their messages.
Steve
[1] The PostgreSQL cluster supports several other applications, it takes up gigabytes, almost none of it Mailman.
[2] It doesn't use HyperKitty for archiving so I don't know how much it would add in database metadata.
participants (2)
-
Alexander Inzinger-Zrock
-
Stephen J. Turnbull