Hi
What exactly are we supposed to include in our mailman backups? Obviously the database, but are there any other files that should be included to make sure we can properly restore?
Regards, Alain
On 5/22/19 6:31 AM, Alain Kohli wrote:
Hi
What exactly are we supposed to include in our mailman backups? Obviously the database, but are there any other files that should be included to make sure we can properly restore?
I would backup all of Mailman's var/ directory. Additionally, I would backup the directory where Haystack's backend search engine stores its indices. For the default Whoosh backend with default settings this is the fulltext_index/ directory in the same directory as your Django settings.py file.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
I have noticed that especially the var/messages directory has a rather considerable size compared to the rest. What exactly is this for and what would be the consequences if that was lost?
And more generally, would not backing up var/ at all only result in the loss of some short term data (e.g. messages that are held or currently processed) or would long term data also be lost? Or in other words, what exactly would be non-recoverable if that isn't backed up?
On 5/22/19 5:21 PM, Mark Sapiro wrote:
On 5/22/19 6:31 AM, Alain Kohli wrote:
Hi
What exactly are we supposed to include in our mailman backups? Obviously the database, but are there any other files that should be included to make sure we can properly restore?
I would backup all of Mailman's var/ directory. Additionally, I would backup the directory where Haystack's backend search engine stores its indices. For the default Whoosh backend with default settings this is the fulltext_index/ directory in the same directory as your Django settings.py file.
On 5/26/19 7:00 AM, Alain Kohli wrote:
I have noticed that especially the var/messages directory has a rather considerable size compared to the rest. What exactly is this for and what would be the consequences if that was lost?
When a message is held for moderator approval, a pickle of the message is stored in the var/messages directory and an entry pointing to it is made in the message table in the database. An entry is also made in the list's pending requests. These are required so the held message can be handled.
There is an issue in that when these messages are handled these aren't removed. See <https://gitlab.com/mailman/mailman/issues/257>.
Thus, most of what's in the var/messages directory is old, but there are also currently held messages there that would be lost.
You could periodically remove older messages, but this would leave entries pointing to them in the message table in the database which could cause errors but probably not.
The real solution is a fix for <https://gitlab.com/mailman/mailman/issues/257>.
And more generally, would not backing up var/ at all only result in the loss of some short term data (e.g. messages that are held or currently processed) or would long term data also be lost? Or in other words, what exactly would be non-recoverable if that isn't backed up?
That depends in part on what your backend database is. If it's SQLite, the database is probably in the var/data directory. If not, you need to be backing up the MySQL or postgreSQL mailman database.
var/archives contains archived messages for the 'prototype' archiver and messages that are queued for hyperkitty.
var/cache contains a cache of recently used templates.
var/data may contain a SQLite database and may contain Postfix mappings.
The Postfix mappings can be regenerated by the mailman aliases
command.
var/etc may contain your mailman.cfg if you don't point to a different one.
var/lists contains accumulated messages for the next digest for your lists
var/locks contains locks and should probably not be restored from a backup
var/logs contains Mailman's logs.
var/messages is discussed above.
var/queue contains Mailman's current queues. Except for 'shunt' this is all things currently in process.
var/templates may contain custom templates.
Also, I didn't mention it before, but your Django settings.py and settings_local.py should also be backed up.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks for the great explanation, that helped a lot! I'll try to work on that issue if I find some time. In the meantime it would be neat to try to clean up the old messages right before the backup. How can you distinguish messages that have been moderated already? I couldn't really find that in the database or the models.
On 5/26/19 8:18 PM, Mark Sapiro wrote:
On 5/26/19 7:00 AM, Alain Kohli wrote:
I have noticed that especially the var/messages directory has a rather considerable size compared to the rest. What exactly is this for and what would be the consequences if that was lost?
When a message is held for moderator approval, a pickle of the message is stored in the var/messages directory and an entry pointing to it is made in the message table in the database. An entry is also made in the list's pending requests. These are required so the held message can be handled.
There is an issue in that when these messages are handled these aren't removed. See <https://gitlab.com/mailman/mailman/issues/257>.
Thus, most of what's in the var/messages directory is old, but there are also currently held messages there that would be lost.
You could periodically remove older messages, but this would leave entries pointing to them in the message table in the database which could cause errors but probably not.
The real solution is a fix for <https://gitlab.com/mailman/mailman/issues/257>.
And more generally, would not backing up var/ at all only result in the loss of some short term data (e.g. messages that are held or currently processed) or would long term data also be lost? Or in other words, what exactly would be non-recoverable if that isn't backed up?
That depends in part on what your backend database is. If it's SQLite, the database is probably in the var/data directory. If not, you need to be backing up the MySQL or postgreSQL mailman database.
var/archives contains archived messages for the 'prototype' archiver and messages that are queued for hyperkitty.
var/cache contains a cache of recently used templates.
var/data may contain a SQLite database and may contain Postfix mappings. The Postfix mappings can be regenerated by the
mailman aliases
command.var/etc may contain your mailman.cfg if you don't point to a different one.
var/lists contains accumulated messages for the next digest for your lists
var/locks contains locks and should probably not be restored from a backup
var/logs contains Mailman's logs.
var/messages is discussed above.
var/queue contains Mailman's current queues. Except for 'shunt' this is all things currently in process.
var/templates may contain custom templates.
Also, I didn't mention it before, but your Django settings.py and settings_local.py should also be backed up.
On 5/26/19 12:17 PM, Alain Kohli wrote:
Thanks for the great explanation, that helped a lot! I'll try to work on that issue if I find some time. In the meantime it would be neat to try to clean up the old messages right before the backup. How can you distinguish messages that have been moderated already? I couldn't really find that in the database or the models.
You could run the following in mailman shell
$ /path/to/mailman shell Welcome to the GNU Mailman shell
lm = getUtility(IListManager) ms = getUtility(IMessageStore) for mlist in lm.mailing_lists: ... requestdb = IListRequests(mlist) ... for rq in requestdb.held_requests: ... key, msgdata = requestdb.get_request(rq.id) ... if msgdata['_request_type'] != 'held_message': ... continue ... msg = ms.get_message_by_id(key) ... print(msg['Message-ID-Hash']) ... U6QVO2GXMEQ5ZDQR2XCLDOH2FXPKJTWX 32HA6GKYE5FOVORLER7YAC3264BHG264 JM4NP3JRXBKKAL7UMM5XEXX644DQBAFJ L7FX6XKOOBP3YFFXUHA6EEJCQXX5BCHX DOJX55MRY6OYPZ4XFFTAQDD4I5MWJ3AD
What that prints is the names of all the files in the var/messages/ hierarchy which are currently held messages. Presumably the others are ones already handled.
You could modify the above to write the list to a file and then wrap the whole thing in a shell script which would make the list and then do something like
for f in find var/messages -type f
;do
n=cut -d / -f5 <<<$f
for keep in cat list
;do
k=0
if [ $n == $keep ] ;then
k=1
break
fi
done
if [ $k -eq 0 ]; then
rm $f
fi
done
in the above, list is the list of names to keep bade by the first script and -f5 in the cut command assumes the list from find is lines of the form var/messages/TJ/YF/TJYF5FETCRLOQZJ2H6Y4PMZP6M7GNLES. i.e. two subdirectories between messages/ and the file.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 5/26/19 3:07 PM, Mark Sapiro wrote:
You could modify the above to write the list to a file and then wrap the whole thing in a shell script which would make the list and then do something like
for f in
find var/messages -type f
;do n=cut -d / -f5 <<<$f
for keep incat list
;do k=0 if [ $n == $keep ] ;then k=1 break fi done if [ $k -eq 0 ]; then rm $f fi done
Actually, I thought about it a bit more, and I decided that arcane shell script was a bit obscure and it would be easier to just do it in Python. Plus, you can properly delete the message that way.
A complete script for mailman shell
is like
import os msg_ids = {} lm = getUtility(IListManager) ms = getUtility(IMessageStore) for mlist in lm.mailing_lists: requestdb = IListRequests(mlist) for rq in requestdb.held_requests: key, msgdata = requestdb.get_request(rq.id) if msgdata['_request_type'] != 'held_message': continue msg = ms.get_message_by_id(key) msg_ids[(msg['Message-ID-Hash'])] = True
for root, dirs, files in os.walk('path/to/var/messages'): for file in files: if not msg_ids.get(file, False): msg = ms.get_message_by_hash(file) ms.delete_message(msg['Message-ID'])
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thank you, that's perfect! For anyone that might stumble upon this, I had to make two minor adjustments to automate it.
import os
msg_ids = {}
lm = getUtility(IListManager)
ms = getUtility(IMessageStore)
for mlistin lm.mailing_lists:
requestdb = IListRequests(mlist)
for rqin requestdb.held_requests:
key, msgdata = requestdb.get_request(rq.id)
if msgdata['_request_type'] !='held_message':
continue msg = ms.get_message_by_id(key)
msg_ids[(msg['Message-ID-Hash'])] =True for root, dirs, filesin os.walk('/docker/var/messages'):
for filein files:
if not msg_ids.get(file, False):
msg = ms.get_message_by_hash(file)
# It happened to me that msg was 'None'
if msg:
ms.delete_message(msg['Message-ID'])
# I couldn't run it with '--run' and if you pipe it into 'mailman
shell' # I needed this or the loop above wouldn't execute. pass
On 5/27/19 6:22 AM, Mark Sapiro wrote:
You could modify the above to write the list to a file and then wrap the whole thing in a shell script which would make the list and then do something like
for f in
find var/messages -type f
;do n=cut -d / -f5 <<<$f
for keep incat list
;do k=0 if [ $n == $keep ] ;then k=1 break fi done if [ $k -eq 0 ]; then rm $f fi done Actually, I thought about it a bit more, and I decided that arcane shellOn 5/26/19 3:07 PM, Mark Sapiro wrote: script was a bit obscure and it would be easier to just do it in Python. Plus, you can properly delete the message that way.
A complete script for
mailman shell
is likeimport os msg_ids = {} lm = getUtility(IListManager) ms = getUtility(IMessageStore) for mlist in lm.mailing_lists: requestdb = IListRequests(mlist) for rq in requestdb.held_requests: key, msgdata = requestdb.get_request(rq.id) if msgdata['_request_type'] != 'held_message': continue msg = ms.get_message_by_id(key) msg_ids[(msg['Message-ID-Hash'])] = True
for root, dirs, files in os.walk('path/to/var/messages'): for file in files: if not msg_ids.get(file, False): msg = ms.get_message_by_hash(file) ms.delete_message(msg['Message-ID'])
On 5/27/19 3:05 PM, Alain Kohli wrote:
Thank you, that's perfect! For anyone that might stumble upon this, I had to make two minor adjustments to automate it.
snipped because formatting was off.
# I couldn't run it with '--run' and if you pipe it into 'mailman shell' # I needed this or the loop above wouldn't execute. pass
It won't run with --run as I wrote it because mailman shell
only
predefines the various interface names and getUtility when entering
interactive mode. To use --run, you need to import those.
I have attached msgs_del.py which will run with --run. It incorporates the test for msg not None (thank you), makes a minor variable name change (file -> fn) and adds a print of the number of files removed.
It has my path to var/messages hard coded, so that may need to be changed.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (2)
-
Alain Kohli
-
Mark Sapiro