Setup Question: Disk-Layout (Migrating from Mailman 2.1)

Hello,
actually I am running the latest available Mailman-2 Version on Ubuntu 20.04! Now it is really time to move on! :-) I want to set up a new server (Ubuntu-24.04) and install Mailman-3 with the Virtualenv setup method: https://docs.mailman3.org/en/latest/install/virtualenv.html As I know from my old Mailman-2.1 the archives could get quite big... ;-) So it seems very important to provide enough (expandable) disk space at the right place. (I plan to use a seperate partition with LVM for the path in question)
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right?
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."?
Any guidance ist very much appreciated!
Thank you very much
Chris

- On 4/17/25 12:18, christian.schneider@tu-dortmund.de wrote:
I want to set up a new server (Ubuntu-24.04) and install Mailman-3 with the Virtualenv setup method: https://docs.mailman3.org/en/latest/install/virtualenv.html As I know from my old Mailman-2.1 the archives could get quite big... ;-) So it seems very important to provide enough (expandable) disk space at the right place. (I plan to use a seperate partition with LVM for the path in question)
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right?
Not quite.
To the best of my knowledge, by default, a venv installation is fully contained under /opt/mailman, with archives and stuff stored at /opt/mailman/var/lib/...
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."?
Depends on your archives, the way things are archived (you can choose different database backends, and these might make a difference in how efficiently or inefficiently data is stored) and whether you want to use the full-text search index, which adds another big chunk of data.
I can give you a few pointers regarding my setup: my mailman 2.1 data directory takes up about 8.2 GB of disk space, most of which are archives, naturally (8.1 GB). I've imported most of the big archives to mailman 3 already, with a full-text search index, and that uses 5.2 GB disk space currently, but I'm still missing a few archives, so that'll still grow a bit. It will probably end up at about the same size, but I can come back to that once my import is fully done (which sadly will take a few days).
Now, the 8.1 GB of mailman 2.1 archives are stored pretty inefficiently, because they are essentially duplicated (as a private mbox archive and public text archives). Mailman 3 doesn't do that - at least not directly - but instead stores archives once into the database, and queries the database whenever the archives are accessed via hyperkitty, so I would *assume* the size you need for the mailman 3 data after a successful migration is just a bit shy of what you needed for the mailman 2.1 data, maybe even a bit less.
Mihai

Thank you for your instant reply!
Mihai Moldovan wrote:
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right? Not quite. To the best of my knowledge, by default, a venv installation is fully contained under /opt/mailman, with archives and stuff stored at /opt/mailman/var/lib/...
This information is very helpful! Actually I have about 45GB of archives and these archives only hold messages of the last year... I run a clean-up script every first day of a month which deletes posts older than 365 days... So there is lot of disk space needed... ;-) (and even more if I plan to keep posts around for a longer time)
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."? Depends on your archives, the way things are archived (you can choose different database backends, and these might make a difference in how efficiently or inefficiently data is stored) and whether you want to use the full-text search index, which adds another big chunk of data. I can give you a few pointers regarding my setup: my mailman 2.1 data directory takes up about 8.2 GB of disk space, most of which are archives, naturally (8.1 GB). I've imported most of the big archives to mailman 3 already, with a full-text search index, and that uses 5.2 GB disk space currently, but I'm still missing a few archives, so that'll still grow a bit. It will probably end up at about the same size, but I can come back to that once my import is fully done (which sadly will take a few days). Now, the 8.1 GB of mailman 2.1 archives are stored pretty inefficiently, because they are essentially duplicated (as a private mbox archive and public text archives). Mailman 3 doesn't do that - at least not directly - but instead stores archives once into the database, and queries the database whenever the archives are accessed via hyperkitty, so I would *assume* the size you need for the mailman 3 data after a successful migration is just a bit shy of what you needed for the mailman 2.1 data, maybe even a bit less.
Perfect! :-) If I mount an extra partition of 250Gigs at /opt this should carry me for a longer time... And - if I setup a logical volume with LVM - there is an option to stack up the space with ease... ;-)
Mihai
Thank you very much for your help!
Chris

On Thu, Apr 17, 2025 at 3:14 PM Christian Schneider < christian.schneider@tu-dortmund.de> wrote:
Thank you for your instant reply!
Mihai Moldovan wrote:
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right? Not quite. To the best of my knowledge, by default, a venv installation is fully contained under /opt/mailman, with archives and stuff stored at /opt/mailman/var/lib/...
This information is very helpful! Actually I have about 45GB of archives and these archives only hold messages of the last year... I run a clean-up script every first day of a month which deletes posts older than 365 days... So there is lot of disk space needed... ;-) (and even more if I plan to keep posts around for a longer time)
Another question is: How much space do I have to plan (roughly) for the "/opt"-partition for the mailman install if the archives are stored under "/var/..."? Depends on your archives, the way things are archived (you can choose different database backends, and these might make a difference in how efficiently or inefficiently data is stored) and whether you want to use the full-text search index, which adds another big chunk of data. I can give you a few pointers regarding my setup: my mailman 2.1 data directory takes up about 8.2 GB of disk space, most of which are archives, naturally (8.1 GB). I've imported most of the big archives to mailman 3 already, with a full-text search index, and that uses 5.2 GB disk space currently, but I'm still missing a few archives, so that'll still grow a bit. It will probably end up at about the same size, but I can come back to that once my import is fully done (which sadly will take a few days). Now, the 8.1 GB of mailman 2.1 archives are stored pretty inefficiently, because they are essentially duplicated (as a private mbox archive and public text archives). Mailman 3 doesn't do that - at least not directly - but instead stores archives once into the database, and queries the database whenever the archives are accessed via hyperkitty, so I would *assume* the size you need for the mailman 3 data after a successful migration is just a bit shy of what you needed for the mailman 2.1 data, maybe even a bit less.
Perfect! :-) If I mount an extra partition of 250Gigs at /opt this should carry me for a longer time... And - if I setup a logical volume with LVM - there is an option to stack up the space with ease... ;-)
Does your DB backend also store the data files in /opt? I doubt it. The archives will be stored in the DB backend. So unless you use SQLite backend, the data will go where you configure your DB to store data- whether it's MariaDB or PostgreSQL.
-- Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 In an Internet failure case, the #1 suspect is a constant: DNS. "Oh, the cruft.", egrep -v '^$|^.*#' ¯\_(ツ)_/¯ :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html]

But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right? The Virtualenv Installation guide has you install PostgreSQL as the database backend and HyperKitty stores attachments in the database as blobs. The default data directory for PostgreSQL in Ubuntu 24.04 LTS is located at: /var/lib/postgresql/16/main
However, if you wanted you could always change the location before initializing the database by updating the setting in the postgresql.conf file located at: /etc/postgresql/16/main/postgresql.conf
Another option could be to use symlinks. Move the data directory to a location of choice and then create a symlink in the original location that points to the new/actual location.

Sorry. Didn't leave proper spacing after the original quote.
My complete reply
The Virtualenv Installation guide has you install PostgreSQL as the database backend and HyperKitty stores attachments in the database as blobs. The default data directory for PostgreSQL in Ubuntu 24.04 LTS is located at: /var/lib/postgresql/16/main
However, if you wanted you could always change the location before initializing the database by updating the setting in the postgresql.conf file located at: /etc/postgresql/16/main/postgresql.conf
Another option could be to use symlinks. Move the data directory to a location of choice and then create a symlink in the original location that points to the new/actual location.

I just set up the Ubuntu-24 server and placed the extra partition at /opt
German Rodriguez wrote:
The Virtualenv Installation guide has you install PostgreSQL as the database backend and HyperKitty stores attachments in the database as blobs. The default data directory for PostgreSQL in Ubuntu 24.04 LTS is located at: /var/lib/postgresql/16/main However, if you wanted you could always change the location before initializing the database by updating the setting in the postgresql.conf file located at: /etc/postgresql/16/main/postgresql.conf
Using the above suggestion this is the way I will follow:
- Using Postgresql as proposed in the guide for Virtualenv-Setup
- Moving the data directory of Postgresql from /var to /opt (/opt/postgresql/16/main (perhaps))
With all dynamic data kept on /opt (a logical volume, actual size 250GB) I am pretty safe for all that might come! xD
Thank you for all ithe input!
Chris

On 4/17/25 3:18 AM, christian.schneider@tu-dortmund.de wrote:
But I could find no information about the location where hyperkitty stores the archives on an Ubuntu-/Linux-system. I assume the archives are stored underneath "/var/..." this is only a guess. But is this guess right?
As answered in other replies, HyperKitty stores its archives in tables in the configured database.
However, if the prototype archiver is enabled, those archives are stored in maildir format in Mailman's var/archives/prototype/ directory.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

I wondered about the difference of the two archivers (prototype/hyperkitty) and whether both should be enabled.
Mark explains here that the prototype archiver is for storing the raw email message, whereas HyperKitty only stores select information. https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...

- On 4/17/25 18:47, German Rodriguez wrote:
I wondered about the difference of the two archivers (prototype/hyperkitty) and whether both should be enabled.
Mark explains here that the prototype archiver is for storing the raw email message, whereas HyperKitty only stores select information. https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/...
Interesting. I probably want the prototype archiver enabled, even if that means duplicating most of the archives and doubling the amount of data stored.
I was not aware that Hyperkitty doesn't store all information. I'd like to have all information archived (in order to be able to restore it again, if need be), even if they aren't published.
Now... is there a way to import mbox files into the prototype archiver? Looking around, I haven't found anything mentioning this. Since it's just using the Maildir format, I guess I could find a tool that unpacks messages from an mbox to a Maildir destination and get that done, but if mailman has some command to do that, it would be even better.
Mihai

On 4/17/25 10:20 AM, Mihai Moldovan wrote:
Now... is there a way to import mbox files into the prototype archiver? Looking around, I haven't found anything mentioning this. Since it's just using the Maildir format, I guess I could find a tool that unpacks messages from an mbox to a Maildir destination and get that done, but if mailman has some command to do that, it would be even better.
There is no Mailman command to do this, but a Python script is very simple. Something like
from mailbox import Maildir, mbox
mb = mbox('path/to/mbox', create=False)
md = Maildir('path/to/maildir', create=False)
for msg in mb:
md.add(msg)
mb.close()
md.close()
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

- On 4/17/25 19:44, Mihai Moldovan wrote:
Great, that will definitely create it in the format expected by Python and hence mailman! Thank you very much.
Then again, I realized that just adding messages from a mailman 2.1 mbox to the prototype Maildir is not the correct route to take, because they'd be missing at least the Message-ID-Hash and Archived-At headers (the last one should probably point to the Hyperkitty archives, which makes it just a bit more difficult). There might be other headers that need to added or mangled, these are just the two that stood out to me.
Getting this right is more involved, especially since these headers are generated in the guts of mailman and commonly seem to be retrieved from a msgdata object coming from the switchboard.
New incoming messages will have the correct data, of course, but imported ones wouldn't, so I'll have to use a more sophisticated approach to handle them, probably by going through mailman's email wrapper, figuring out how to generate a msgdata object for a message, using RFC2369.process and maybe more.
Mihai

On 4/17/25 11:35 AM, Mihai Moldovan wrote:
New incoming messages will have the correct data, of course, but imported ones wouldn't, so I'll have to use a more sophisticated approach to handle them, probably by going through mailman's email wrapper, figuring out how to generate a msgdata object for a message, using RFC2369.process and maybe more.
Here's an example script. You need to run this with /opt/mailman/mm/venv/bin/python to get access to the mailman imports.
from mailbox import Maildir, mbox
from mailman.email.message import Message
from mailman.handlers.rfc2369 import process
from mailman.interfaces.listmanager import IListManager
from mailman.utilities.email import add_message_hash
from zope.component import getUtility
mb = mbox('path/to/mbox', factory=Message, create=False)
md = Maildir('path/to/maildir', create=False)
mlist = getUtility(IListManager).get_by_list_id('your.list.id')
for msg in mb:
add_message_hash(msg)
process(mlist, msg, {})
md.add(msg)
mb.close()
md.close()
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

- On 4/18/25 04:23, Mark Sapiro wrote:
On 4/17/25 11:35 AM, Mihai Moldovan wrote:
New incoming messages will have the correct data, of course, but imported ones wouldn't, so I'll have to use a more sophisticated approach to handle them, probably by going through mailman's email wrapper, figuring out how to generate a msgdata object for a message, using RFC2369.process and maybe more.
Here's an example script. You need to run this with /opt/mailman/mm/venv/bin/python to get access to the mailman imports.
from mailbox import Maildir, mbox from mailman.email.message import Message from mailman.handlers.rfc2369 import process from mailman.interfaces.listmanager import IListManager from mailman.utilities.email import add_message_hash from zope.component import getUtility mb = mbox('path/to/mbox', factory=Message, create=False) md = Maildir('path/to/maildir', create=False) mlist = getUtility(IListManager).get_by_list_id('your.list.id') for msg in mb: add_message_hash(msg) process(mlist, msg, {}) md.add(msg) mb.close() md.close()
Thank you.
Unfortunately, I'm having a hard time getting this to work correctly. My current approach is (which is modified from yours, for instance to use Prototype.archive_message() instead of writing directly to a Maildir):
import copy
import sys
from mailbox import mbox
from mailman.archiving.prototype import Prototype
from mailman.core import initialize
from mailman.email.message import Message
from mailman.handlers.rfc_2369 import process
from mailman.interfaces.listmanager import IListManager
from mailman.interfaces.mailinglist import IMailingList
from mailman.utilities.email import add_message_hash
from zope.component import getUtility
initialize.initialize()
if (len(sys.argv) < 3):
print('Usage: {0} <list-id> <mbox-file>'.format(sys.argv[0]), file=sys.stderr)
exit(1)
mb = mbox(sys.argv[2], factory=Message, create=False)
mlist = getUtility(IListManager).get_by_list_id(sys.argv[1])
for msg in mb:
try:
add_message_hash(msg)
process(mlist, msg {})
Prototype.archive_message(mlist, msg)
except Exception as e:
print("Error when adding {0}: {1}".format(msg['message-id'], str(e)),
file=sys.stderr)
mb.close()
This returns "Error when adding None: '_PartialFile' object has no attribute 'header_max_count'" for each message.
This, including getting None for the Message-ID, stumped me and I got on to debugging this.
Indeed, even something as simple as
for msg in mb:
print(type(msg))
print(msg['message-id'])
print(msg)
exit(0)
results in getting a None for the Message-ID and a stack trace:
<class 'mailman.email.message.Message'> None Traceback (most recent call last): File "/root/mailman3/prototype-import.py", line 29, in <module> print(msg) File "/usr/lib/python3.12/email/message.py", line 165, in __str__ return self.as_string() ^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/mailman/email/message.py", line 55, in as_string value = email.message.Message.as_string(self) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/email/message.py", line 188, in as_string g.flatten(self, unixfrom=unixfrom) File "/usr/lib/python3.12/email/generator.py", line 98, in flatten policy = policy.clone(max_line_length=self.maxheaderlen) ^^^^^^^^^^^^ AttributeError: '_PartialFile' object has no attribute 'clone'. Did you mean: 'close'?
This eventually led me to realize that passing mailman.email.message.Message as the factory to mbox() internally calls factory(msg), and the base class of Message (email.message.Message) has an __init__ function that takes the policy parameter, so it looks as if it's registering the mailbox message incorrectly as the policy handler, which is totally wrong of course.
If I change the call to mb = mbox(sys.argv[2], factory=None, create=False), the output looks more promising, so converting the data to mailman.email.message.Message first seems to have been the wrong idea on my part:
<class 'mailbox.mboxMessage'>
<mailman.1.1399151409.10314.x2go-i18n@lists.x2go.org>
Return-Path: <mailman-bounces@lists.x2go.org>
[...]
and indeed, if I let the import actually happen, it works.
The imported messages, do have a Message-ID-Hash, but the Archived-At and List-Archive headers are empty (literally <>).
Do you have an example message that was archived through Prototype? What are the proper header values for this module? I believe that Archived-At should be the Message-ID-Hash and the List-Archive header contain... well, given that the Prototype archiver is not meant to be publicly available, probably a file:// URL?
Mihai

On 4/19/25 1:55 PM, Mihai Moldovan wrote:
This eventually led me to realize that passing mailman.email.message.Message as the factory to mbox() internally calls factory(msg), and the base class of Message (email.message.Message) has an __init__ function that takes the policy parameter, so it looks as if it's registering the mailbox message incorrectly as the policy handler, which is totally wrong of course.
This appears to be a bug in mailbox.mbox. To work around it, try this:
from email import message_from_bytes
mb = mbox(sys.argv[2], create=False)
for key in mb.iterkeys():
msg = message_from_bytes(mb.get_bytes(key), Message)
and indeed, if I let the import actually happen, it works.
Still, it's probably better if the message is a mailman.email.message.Message object rather than an email.message.Message object.
The imported messages, do have a Message-ID-Hash, but the Archived-At and List-Archive headers are empty (literally <>).
Those headers are added by the rfc_2369 handler based on the archivers configured for the list. They will never point to the prototype archiver or the message in the prototype archive.
Normally they will point to HyperKitty and to the archived message in HyperKitty. This is the case even for messages in the prototype archiver as those messages are just the message as delivered to the list members absent any personalization.
Do you have an example message that was archived through Prototype? What are the proper header values for this module? I believe that Archived-At should be the Message-ID-Hash and the List-Archive header contain... well, given that the Prototype archiver is not meant to be publicly available, probably a file:// URL?
Typically they will be something like
List-Archive: <https://example.com/archives/list/listname@example.com/> Archived-At: <https://example.com/archives/list/listname@example.com/message/xxxxxx/>
where xxxxxx is the message-id hash value.
Note that the generation of these headers by the rfc_2369 handler depends on this script being run in an installation that has hyperkitty and mailman-hyperkitty installed and the hyperkitty archiver enabled for the list.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan

- On 4/20/25 02:26, Mark Sapiro wrote:
This appears to be a bug in mailbox.mbox. To work around it, try this:
[...]
Thanks, that works!
Those headers are added by the rfc_2369 handler based on the archivers configured for the list. They will never point to the prototype archiver or the message in the prototype archive.
Normally they will point to HyperKitty and to the archived message in HyperKitty. This is the case even for messages in the prototype archiver as those messages are just the message as delivered to the list members absent any personalization.
Yes, it iterates through all archivers and gets the permalink and archiver's URL, adding it to the message if the data is non-empty.
Note that the generation of these headers by the rfc_2369 handler depends on this script being run in an installation that has hyperkitty and mailman-hyperkitty installed and the hyperkitty archiver enabled for the list.
There is one other thing that caught me off-guard: I had the Hyperkitty archiver enabled, of course, but the service was not running (I essentially turned off everything, including the cron jobs, for the migration, to make sure that no new data is coming in).
In order for mailman to query the required data, it has to communicate with the archivers (like Hyperkitty) and that in turn means that the archiver must be running.
After starting Hyperkitty, the headers magically gained proper values!
Thank you very much for your help.
Mihai

Mihai Moldovan writes:
In order for mailman to query the required data, it has to communicate with the archivers (like Hyperkitty) and that in turn means that the archiver must be running.
This is true for HyperKitty, it seems, but not true in general. As long as the algorithm is independent of the archive's organization, the algorithm can be implemented in the plug-in that the archiver provides to Mailman core. The IETF's Mailarchive has done this for a generation using an algorithm similar to HyperKitty's.
-- GNU Mailman consultant (installation, migration, customization) Sirus Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan

Mihai Moldovan writes:
The imported messages, do have a Message-ID-Hash, but the Archived-At and List-Archive headers are empty (literally <>). [...] What are the proper header values for this module?
Empty or omitted. See site-packages/mailman/archiving/prototype.py.
If they're produced at all, I suppose that's a bug. Mailman core get the contents of those from the 'list_url' and 'permalink' callbacks defined by the archiver, which in the case of the prototype archiver return None. So Mailman should omit those headers and log an error message IMO.
I believe your web-facing archive is HyperKitty. I suppose you should import those callbacks from HyperKitty (I think they're actually in the mailman_hyperkitty Python package) in order for your backup to reflect "archived message plus headers omitted by HyperKitty", (I haven't checked to see if you can actually just
from mailman.archiving.prototype import Prototype
from mailman_hyperkitty import Archiver
class Backup(Prototype):
list_url = Archiver.list_url
permalink = Archiver.permalink
but it that doesn't work cargo-culting the code should be straightforward.
I forget if your goal was to reproduce the incoming message or the outgoing message. If the incoming message, archiving happens pretty late in the pipeline so banned attachments have been stripped and HTML-to-text conversion and the like have already been done.
-- GNU Mailman consultant (installation, migration, customization) Sirus Open Source https://www.siriusopensource.com/ Software systems consulting in Europe, North America, and Japan
participants (7)
-
Christian Schneider
-
christian.schneider@tu-dortmund.de
-
German Rodriguez
-
Mark Sapiro
-
Mihai Moldovan
-
Odhiambo Washington
-
Stephen J. Turnbull