Setup Question: Disk-Layout (Migrating from Mailman 2.1)

Hello,
actually I am running the latest available Mailman-2 Version on Ubuntu 20.04! Now it is really time to move on! :-) I want to set up a new server (Ubuntu-24.04) and install Mailman-3 with the Virtualenv setup method: https://docs.mailman3.org/en/latest/install/virtualenv.html As I know from my old Mailman-2.1 the archives could get quite big... ;-) So it seems very important to provide enough (expandable) disk space at the right place. (I plan to use a seperate partition with LVM for the path in question)
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right?
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."?
Any guidance ist very much appreciated!
Thank you very much
Chris

- On 4/17/25 12:18, christian.schneider@tu-dortmund.de wrote:
I want to set up a new server (Ubuntu-24.04) and install Mailman-3 with the Virtualenv setup method: https://docs.mailman3.org/en/latest/install/virtualenv.html As I know from my old Mailman-2.1 the archives could get quite big... ;-) So it seems very important to provide enough (expandable) disk space at the right place. (I plan to use a seperate partition with LVM for the path in question)
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right?
Not quite.
To the best of my knowledge, by default, a venv installation is fully contained under /opt/mailman, with archives and stuff stored at /opt/mailman/var/lib/...
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."?
Depends on your archives, the way things are archived (you can choose different database backends, and these might make a difference in how efficiently or inefficiently data is stored) and whether you want to use the full-text search index, which adds another big chunk of data.
I can give you a few pointers regarding my setup: my mailman 2.1 data directory takes up about 8.2 GB of disk space, most of which are archives, naturally (8.1 GB). I've imported most of the big archives to mailman 3 already, with a full-text search index, and that uses 5.2 GB disk space currently, but I'm still missing a few archives, so that'll still grow a bit. It will probably end up at about the same size, but I can come back to that once my import is fully done (which sadly will take a few days).
Now, the 8.1 GB of mailman 2.1 archives are stored pretty inefficiently, because they are essentially duplicated (as a private mbox archive and public text archives). Mailman 3 doesn't do that - at least not directly - but instead stores archives once into the database, and queries the database whenever the archives are accessed via hyperkitty, so I would *assume* the size you need for the mailman 3 data after a successful migration is just a bit shy of what you needed for the mailman 2.1 data, maybe even a bit less.
Mihai

Thank you for your instant reply!
Mihai Moldovan wrote:
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right? Not quite. To the best of my knowledge, by default, a venv installation is fully contained under /opt/mailman, with archives and stuff stored at /opt/mailman/var/lib/...
This information is very helpful! Actually I have about 45GB of archives and these archives only hold messages of the last year... I run a clean-up script every first day of a month which deletes posts older than 365 days... So there is lot of disk space needed... ;-) (and even more if I plan to keep posts around for a longer time)
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."? Depends on your archives, the way things are archived (you can choose different database backends, and these might make a difference in how efficiently or inefficiently data is stored) and whether you want to use the full-text search index, which adds another big chunk of data. I can give you a few pointers regarding my setup: my mailman 2.1 data directory takes up about 8.2 GB of disk space, most of which are archives, naturally (8.1 GB). I've imported most of the big archives to mailman 3 already, with a full-text search index, and that uses 5.2 GB disk space currently, but I'm still missing a few archives, so that'll still grow a bit. It will probably end up at about the same size, but I can come back to that once my import is fully done (which sadly will take a few days). Now, the 8.1 GB of mailman 2.1 archives are stored pretty inefficiently, because they are essentially duplicated (as a private mbox archive and public text archives). Mailman 3 doesn't do that - at least not directly - but instead stores archives once into the database, and queries the database whenever the archives are accessed via hyperkitty, so I would *assume* the size you need for the mailman 3 data after a successful migration is just a bit shy of what you needed for the mailman 2.1 data, maybe even a bit less.
Perfect! :-) If I mount an extra partition of 250Gigs at /opt this should carry me for a longer time... And - if I setup a logical volume with LVM - there is an option to stack up the space with ease... ;-)
Mihai
Thank you very much for your help!
Chris

On Thu, Apr 17, 2025 at 3:14 PM Christian Schneider < christian.schneider@tu-dortmund.de> wrote:
Thank you for your instant reply!
Mihai Moldovan wrote:
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right? Not quite. To the best of my knowledge, by default, a venv installation is fully contained under /opt/mailman, with archives and stuff stored at /opt/mailman/var/lib/...
This information is very helpful! Actually I have about 45GB of archives and these archives only hold messages of the last year... I run a clean-up script every first day of a month which deletes posts older than 365 days... So there is lot of disk space needed... ;-) (and even more if I plan to keep posts around for a longer time)
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."? Depends on your archives, the way things are archived (you can choose different database backends, and these might make a difference in how efficiently or inefficiently data is stored) and whether you want to use the full-text search index, which adds another big chunk of data. I can give you a few pointers regarding my setup: my mailman 2.1 data directory takes up about 8.2 GB of disk space, most of which are archives, naturally (8.1 GB). I've imported most of the big archives to mailman 3 already, with a full-text search index, and that uses 5.2 GB disk space currently, but I'm still missing a few archives, so that'll still grow a bit. It will probably end up at about the same size, but I can come back to that once my import is fully done (which sadly will take a few days). Now, the 8.1 GB of mailman 2.1 archives are stored pretty inefficiently, because they are essentially duplicated (as a private mbox archive and public text archives). Mailman 3 doesn't do that - at least not directly - but instead stores archives once into the database, and queries the database whenever the archives are accessed via hyperkitty, so I would *assume* the size you need for the mailman 3 data after a successful migration is just a bit shy of what you needed for the mailman 2.1 data, maybe even a bit less.
Perfect! :-) If I mount an extra partition of 250Gigs at /opt this should carry me for a longer time... And - if I setup a logical volume with LVM - there is an option to stack up the space with ease... ;-)
Does your DB backend also store the data files in /opt? I doubt it. The archives will be stored in the DB backend. So unless you use SQLite backend, the data will go where you configure your DB to store data- whether it's MariaDB or PostgreSQL.
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 In an Internet failure case, the #1 suspect is a constant: DNS. "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html]

But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right? The Virtualenv Installation guide has you install PostgreSQL as the database backend and HyperKitty stores attachments in the database as blobs. The default data directory for PostgreSQL in Ubuntu 24.04 LTS is located at: /var/lib/postgresql/16/main
However, if you wanted you could always change the location before initializing the database by updating the setting in the postgresql.conf file located at: /etc/postgresql/16/main/postgresql.conf
Another option could be to use symlinks. Move the data directory to a location of choice and then create a symlink in the original location that points to the new/actual location.

Sorry. Didn't leave proper spacing after the original quote.
My complete reply
The Virtualenv Installation guide has you install PostgreSQL as the database backend and HyperKitty stores attachments in the database as blobs. The default data directory for PostgreSQL in Ubuntu 24.04 LTS is located at: /var/lib/postgresql/16/main
However, if you wanted you could always change the location before initializing the database by updating the setting in the postgresql.conf file located at: /etc/postgresql/16/main/postgresql.conf
Another option could be to use symlinks. Move the data directory to a location of choice and then create a symlink in the original location that points to the new/actual location.

I just set up the Ubuntu-24 server and placed the extra partition at /opt
German Rodriguez wrote:
The Virtualenv Installation guide has you install PostgreSQL as the database backend and HyperKitty stores attachments in the database as blobs. The default data directory for PostgreSQL in Ubuntu 24.04 LTS is located at: /var/lib/postgresql/16/main However, if you wanted you could always change the location before initializing the database by updating the setting in the postgresql.conf file located at: /etc/postgresql/16/main/postgresql.conf
Using the above suggestion this is the way I will follow:
- Using Postgresql as proposed in the guide for Virtualenv-Setup
- Moving the data directory of Postgresql from /var to /opt (/opt/postgresql/16/main (perhaps))
With all dynamic data kept on /opt (a logical volume, actual size 250GB) I am pretty safe for all that might come! xD
Thank you for all ithe input!
Chris

On 4/17/25 3:18 AM, christian.schneider@tu-dortmund.de wrote:
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right?
As answered in other replies, HyperKitty stores its archives in tables in the configured database.
However, if the prototype archiver is enabled, those archives are stored in maildir format in Mailman's var/archives/prototype/ directory.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

I wondered about the difference of the two archivers (prototype/hyperkitty) and whether both should be enabled.
Mark explains here that the prototype archiver is for storing the raw email message, whereas HyperKitty only stores select information. https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...

- On 4/17/25 18:47, German Rodriguez wrote:
I wondered about the difference of the two archivers (prototype/hyperkitty) and whether both should be enabled.
Mark explains here that the prototype archiver is for storing the raw email message, whereas HyperKitty only stores select information. https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
Interesting. I probably want the prototype archiver enabled, even if that means duplicating most of the archives and doubling the amount of data stored.
I was not aware that Hyperkitty doesn't store all information. I'd like to have all information archived (in order to be able to restore it again, if need be), even if they aren't published.
Now... is there a way to import mbox files into the prototype archiver? Looking around, I haven't found anything mentioning this. Since it's just using the Maildir format, I guess I could find a tool that unpacks messages from an mbox to a Maildir destination and get that done, but if mailman has some command to do that, it would be even better.
Mihai

On 4/17/25 10:20 AM, Mihai Moldovan wrote:
Now... is there a way to import mbox files into the prototype archiver? Looking around, I haven't found anything mentioning this. Since it's just using the Maildir format, I guess I could find a tool that unpacks messages from an mbox to a Maildir destination and get that done, but if mailman has some command to do that, it would be even better.
There is no Mailman command to do this, but a Python script is very simple. Something like
from mailbox import Maildir, mbox
mb = mbox('path/to/mbox', create=False)
md = Maildir('path/to/maildir', create=False)
for msg in mb:
md.add(msg)
mb.close()
md.close()
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

- On 4/17/25 19:44, Mihai Moldovan wrote:
Great, that will definitely create it in the format expected by Python and hence mailman! Thank you very much.
Then again, I realized that just adding messages from a mailman 2.1 mbox to the prototype Maildir is not the correct route to take, because they'd be missing at least the Message-ID-Hash and Archived-At headers (the last one should probably point to the Hyperkitty archives, which makes it just a bit more difficult). There might be other headers that need to added or mangled, these are just the two that stood out to me.
Getting this right is more involved, especially since these headers are generated in the guts of mailman and commonly seem to be retrieved from a msgdata object coming from the switchboard.
New incoming messages will have the correct data, of course, but imported ones wouldn't, so I'll have to use a more sophisticated approach to handle them, probably by going through mailman's email wrapper, figuring out how to generate a msgdata object for a message, using RFC2369.process and maybe more.
Mihai

On 4/17/25 11:35 AM, Mihai Moldovan wrote:
New incoming messages will have the correct data, of course, but imported ones wouldn't, so I'll have to use a more sophisticated approach to handle them, probably by going through mailman's email wrapper, figuring out how to generate a msgdata object for a message, using RFC2369.process and maybe more.
Here's an example script. You need to run this with /opt/mailman/mm/venv/bin/python to get access to the mailman imports.
from mailbox import Maildir, mbox
from mailman.email.message import Message
from mailman.handlers.rfc2369 import process
from mailman.interfaces.listmanager import IListManager
from mailman.utilities.email import add_message_hash
from zope.component import getUtility
mb = mbox('path/to/mbox', factory=Message, create=False)
md = Maildir('path/to/maildir', create=False)
mlist = getUtility(IListManager).get_by_list_id('your.list.id')
for msg in mb:
add_message_hash(msg)
process(mlist, msg, {})
md.add(msg)
mb.close()
md.close()
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (6)
-
Christian Schneider
-
christian.schneider@tu-dortmund.de
-
German Rodriguez
-
Mark Sapiro
-
Mihai Moldovan
-
Odhiambo Washington