Hyperkitty Recommended mySQL Database Setup & Archive Importing
Hi,
I've been working for quite a while to setup a production mailman3. Using the docker containers, I have been pretty successful getting it to work. Part of the challenge now is that there are a lot of issues when importing archives from mailman2. This of course is more common on lists that are over 10 years old.
From what I've been reading, it seems like Mailman3 expects
the mysql database to use UTF8 encoding. It appears, by default, that when using the docker-mailman containers, that the character set is set to latin1. This may explain why there are lots of import errors, and maybe also why there are a lot of errors when threads are deleted in hyperkitty's UI.
Is it the case that UTF8 is the preferred character set? I
also saw some threads indicating that maybe utf8mb4 is more preferred. I'd like to try to change the character_set on the database, if that is advisable.
Also, is Hyperkitty 1.3.4 officially released? I see that the
list.org instance is using it, but the latest announce I've seen is for 1.3.3.
Thanks.
- Matt Alberti
Hi
I cannot answer for mailman, but in general, utf8mb4 is definitely the way to go for MySql; the original utf8 implementation in MySQL supported only 3-byte characters, and that excludes quite a number of character codes that are commonly used, notably various emojis.
I would not suggest use of the latin1 (or indeed any of the others, eg cp1251) character codings for any purpose these days; UTF-8 is almost equivalent in one byte and extends to a far greater coverage. The only situation you should need the older options us to interoperate with old software.
On the basis of the above, if the docker containers is fixed with latin1, I would suggest that is a bug.
Regards,
Ruth
On 08/12/2020 17:05, matthew@alberti.us wrote:
Hi,
I've been working for quite a while to setup a production mailman3. Using the docker containers, I have been pretty successful getting it to work. Part of the challenge now is that there are a lot of issues when importing archives from mailman2. This of course is more common on lists that are over 10 years old.
From what I've been reading, it seems like Mailman3 expects
the mysql database to use UTF8 encoding. It appears, by default, that when using the docker-mailman containers, that the character set is set to latin1. This may explain why there are lots of import errors, and maybe also why there are a lot of errors when threads are deleted in hyperkitty's UI.
Is it the case that UTF8 is the preferred character set? I
also saw some threads indicating that maybe utf8mb4 is more preferred. I'd like to try to change the character_set on the database, if that is advisable.
Also, is Hyperkitty 1.3.4 officially released? I see that the
list.org instance is using it, but the latest announce I've seen is for 1.3.3.
Thanks.
- Matt Alberti
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 12/8/20 9:05 AM, matthew@alberti.us wrote:
From what I've been reading, it seems like Mailman3 expects
the mysql database to use UTF8 encoding. It appears, by default, that when using the docker-mailman containers, that the character set is set to latin1. This may explain why there are lots of import errors, and maybe also why there are a lot of errors when threads are deleted in hyperkitty's UI.
Is it the case that UTF8 is the preferred character set? I
also saw some threads indicating that maybe utf8mb4 is more preferred. I'd like to try to change the character_set on the database, if that is advisable.
You want the MySQL encoding for the Mailman database to be utf8mb4. If not, as Ruth says, you will have issues with 4-bute utf-8 encodings. See <https://gitlab.com/mailman/mailman-suite/-/blob/master/mailman-suite_project/settings.py#L156>
Also, is Hyperkitty 1.3.4 officially released? I see that the
list.org instance is using it, but the latest announce I've seen is for 1.3.3.
No, it is not. lists.mailman3.org and mail.python.org both run the HEADs of the GitLab branches so they are generally ahead of the latest releases.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mark/Ruth/The List, Thanks for the prompt responses. Before I try to figure out a way to correct this... is the "character_set_database" the only thing I should be looking at?
I also looked at the hyperkitty_email table, since that is
referenced in a lot of errors. That shows some collations are set to latin1_swedish_ci, and others are NULL.
If there suggestions about how to correct this on the database
server, I'm all ears. I'm definitely a little concerned about making such a big change now that we are moving a lot of messages/lists thru this instance.
Thanks for all your help.
- Matt Alberti
MariaDB [mailmandb]> SHOW FULL COLUMNS FROM hyperkitty_email; +-----------------+--------------+-------------------+------+-----+--------- +----------------+---------------------------------+---------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +-----------------+--------------+-------------------+------+-----+--------- +----------------+---------------------------------+---------+ | id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | | | message_id | varchar(255) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | | | message_id_hash | varchar(255) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | | | subject | varchar(512) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | | | content | longtext | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | | | date | datetime(6) | NULL | NO | MUL | NULL | | select,insert,update,references | | | timezone | smallint(6) | NULL | NO | | NULL | | select,insert,update,references | | | in_reply_to | varchar(255) | latin1_swedish_ci | YES | MUL | NULL | | select,insert,update,references | | | archived_date | datetime(6) | NULL | NO | MUL | NULL | | select,insert,update,references | | | thread_depth | int(11) | NULL | NO | | NULL | | select,insert,update,references | | | thread_order | int(11) | NULL | YES | MUL | NULL | | select,insert,update,references | | | mailinglist_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | | | parent_id | int(11) | NULL | YES | MUL | NULL | | select,insert,update,references | | | sender_id | varchar(255) | latin1_swedish_ci | NO | MUL | NULL | | select,insert,update,references | | | thread_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | | | sender_name | varchar(255) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | | +-----------------+--------------+-------------------+------+-----+--------- +----------------+---------------------------------+---------+ 16 rows in set (0.001 sec)
MariaDB [mailmandb]>
-----Original Message----- From: Mark Sapiro <mark@msapiro.net> Sent: Tuesday, December 8, 2020 2:55 PM To: mailman-users@mailman3.org Subject: [MM3-users] Re: Hyperkitty Recommended mySQL Database Setup & Archive Importing
On 12/8/20 9:05 AM, matthew@alberti.us wrote:
From what I've been reading, it seems like Mailman3
expects the mysql database to use UTF8 encoding. It appears, by default, that when using the docker-mailman containers, that the character set is set to latin1. This may explain why there are lots of import errors, and maybe also why there are a lot of errors when threads
are deleted in hyperkitty's UI.
Is it the case that UTF8 is the preferred character set?
I also saw some threads indicating that maybe utf8mb4 is more preferred. I'd like to try to change the character_set on the database, if that is advisable.
You want the MySQL encoding for the Mailman database to be utf8mb4. If not, as Ruth says, you will have issues with 4-bute utf-8 encodings. See <https://gitlab.com/mailman/mailman-suite/-/blob/master/mailman-suite_projec t/settings.py#L156>
Also, is Hyperkitty 1.3.4 officially released? I see
that the list.org instance is using it, but the latest announce I've seen is for 1.3.3.
No, it is not. lists.mailman3.org and mail.python.org both run the HEADs of the GitLab branches so they are generally ahead of the latest releases.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 12/8/20 1:02 PM, matthew@alberti.us wrote:
Mark/Ruth/The List, Thanks for the prompt responses. Before I try to figure out a way to correct this... is the "character_set_database" the only thing I should be looking at?
You need to set
'OPTIONS': {'charset': 'utf8mb4'}
in the DATABASES['default'] definition in your django settings. I'm not sure if this overrides the various collation, etc. settings in the DB or not, but it wouldn't hurt to update the DB.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Thanks Mark. Also, just to confirm, should the database connection string also be updated? In both the core and web configuration, I have something like:
DATABASE_URL=mysql+pymysql://mailman:password@database/mailmandb
Note the trailing UTF-8 encoding stuff isn't there. Should we be specifying utf8mb4 there as well?
I'm planning to test this in a dev environment. If I can somewhat confirm the fix procedure won't break an existing install (like I have), I'll submit a PR(s) to the docker-mailman github.
- Matt Alberti
-----Original Message----- From: Mark Sapiro <mark@msapiro.net> Sent: Wednesday, December 9, 2020 1:29 PM To: mailman-users@mailman3.org Subject: [MM3-users] Re: Hyperkitty Recommended mySQL Database Setup & Archive Importing
On 12/8/20 1:02 PM, matthew@alberti.us wrote:
Mark/Ruth/The List, Thanks for the prompt responses. Before I try to figure out a way to
correct this... is the "character_set_database" the only thing I should be looking at?
You need to set
'OPTIONS': {'charset': 'utf8mb4'}
in the DATABASES['default'] definition in your django settings. I'm not sure if this overrides the various collation, etc. settings in the DB or not, but it wouldn't hurt to update the DB.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 12/9/20 6:06 PM, matthew@alberti.us wrote:
Thanks Mark. Also, just to confirm, should the database connection string also be updated? In both the core and web configuration, I have something like:
DATABASE_URL=mysql+pymysql://mailman:password@database/mailmandb
Note the trailing UTF-8 encoding stuff isn't there. Should we be specifying utf8mb4 there as well?
In the mailman.cfg database section, you probably want
url: mysql+pymysql://myuser:mypassword@mymysqlhost/mailman?charset=utf8mb4&use_unicode=1
I don't know why you would have a DATABASE_URL setting in the web configuration. Exactly where is this? I don't think it would be meaningful in any Django settings.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
I should clarify a bit. That connection string I mentioned in the previous post is in the docker-compose.yaml file for the docker-mailman containers. For web, it seems the settings.py file for the mailman-web container is configured to parse connection string, but there isn't (I don't think) any provision to set additional options, such as the encoding. So there doesn't appear to be a straightforward way, using the container yaml approach, to pass the 'OPTIONS' for the encoding. It does however look like it is possible to override the whole database connection settings using settings_local.py, but unsure. That would I think be the way to do it, setting the whole database configuration settings per https://gitlab.com/mailman/mailman-suite/-/blob/master/mailman-suite_project /settings.py#L156 in the settings_local.py file.
For core, the connection string seems to accept a tailing "?charset=utf8&use_unicode=1" per https://mailman.readthedocs.io/en/stable/src/mailman/docs/database.html So for core, maybe just updating the connection string to include "?charset=utfmb4&use_unicode=1" will just work? The link lists utf8 as the recommend encoding... maybe that should be updated?
I realize this may be drifting more towards concerning the docker container setup vs. non-containerized core and web. But either way, these seem to be mysql specific caveats that I've certainly missed. Hopefully this helps someone else.
Seems the takeaway is that, for mysql databases only, the use of utf8mb4 is recommended, and it needs to be explicitly identified in the database settings for BOTH core and web, and probably somehow at database creation time?
Now to see what the best procedure is to put these configuration changes in place, and also update the database?
- Matt Alberti
-----Original Message----- From: Mark Sapiro <mark@msapiro.net> Sent: Wednesday, December 9, 2020 9:35 PM To: mailman-users@mailman3.org Subject: [MM3-users] Re: Hyperkitty Recommended mySQL Database Setup & Archive Importing
On 12/9/20 6:06 PM, matthew@alberti.us wrote:
Thanks Mark. Also, just to confirm, should the database connection string also be updated? In both the core and web configuration, I have something like:
DATABASE_URL=mysql+pymysql://mailman:password@database/mailmandb
Note the trailing UTF-8 encoding stuff isn't there. Should we be specifying utf8mb4 there as well?
In the mailman.cfg database section, you probably want
url: mysql+pymysql://myuser:mypassword@mymysqlhost/mailman?charset=utf8mb4&us mysql+e_unicode=1
I don't know why you would have a DATABASE_URL setting in the web configuration. Exactly where is this? I don't think it would be meaningful in any Django settings.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
Mailman-users mailing list -- mailman-users@mailman3.org To unsubscribe send an email to mailman-users-leave@mailman3.org https://lists.mailman3.org/mailman3/lists/mailman-users.mailman3.org/
On 12/9/20 6:39 PM, matthew@alberti.us wrote:
For core, the connection string seems to accept a tailing "?charset=utf8&use_unicode=1" per https://mailman.readthedocs.io/en/stable/src/mailman/docs/database.html So for core, maybe just updating the connection string to include "?charset=utfmb4&use_unicode=1" will just work? The link lists utf8 as the recommend encoding... maybe that should be updated?
I'm in the process of doing that update.
Seems the takeaway is that, for mysql databases only, the use of utf8mb4 is recommended, and it needs to be explicitly identified in the database settings for BOTH core and web, and probably somehow at database creation time?
It's not as critical for core because barring things like 4-byte utf-8 unicode graphics in list configurations, core doesn't store anything like that in its tables. The major issue is with HyperKitty where messages have emojis and other 4-byte utf-8 encodings in message subjects and bodies.
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
participants (3)
-
Mark Sapiro
-
matthew@alberti.us
-
Ruth Ivimey-Cook