Archive search failing on one list
Hi,
I have several lists all happily running with private archives. They all use the one domain name.
A new list was added recently and although mail delivery and archiving are fine, this new list's search function is failing.
For example, on searching for the word "Derbyshire" (which is mentioned in one of the archived messages) the results page displays:
"Sorry no email could be found for this query."
The logs show:
+++++++++++++++++++++++++++++++++++++ /var/log/mailman3/web/mailman-web.log +++++++++++++++++++++++++++++++++++++
[pid: 1156|app: 0|req: 55/55] 2001:8004:5310:2405:ce66:5ff:41fc:60db () {76 vars in 1537 bytes} [Sat Sep 23 23:57:43 2023] GET /mailman3/hyperkitty/search?mlist=LISTNAME-1%40DOMAIN.COM&q=Derbyshire => generated 13752 bytes in 3188 msecs (HTTP/1.1 200) 5 headers in 164 bytes (1 switches on core 0)
23:57:51 [Q] INFO Process-1:1 processing [rebuild_mailinglist_cache_for_month] 23:57:51 [Q] INFO Process-1:2 processing [rebuild_mailinglist_cache_recent] 23:57:51 [Q] ERROR Failed [rebuild_mailinglist_cache_for_month] - MailingList matching query does not exist. : Traceback (most recent call last): File "/usr/lib/python3/dist-packages/django_q/cluster.py", line 421, in worker res = f(*task["args"], **task["kwargs"]) File "/usr/lib/python3/dist-packages/hyperkitty/tasks.py", line 79, in _rebuild_mailinglist_cache_for_month mlist = MailingList.objects.get(name=mlist_name) File "/usr/lib/python3/dist-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 406, in get raise self.model.DoesNotExist( hyperkitty.models.mailinglist.MailingList.DoesNotExist: MailingList matching query does not exist.
+++++++++++++++++++++++++++++++++++++
On all the other lists, which all have successful search results, the logs shows examples like so (searching for "Farnborough") ...
[pid: 1156|app: 0|req: 486/486] 2001:8004:5310:2405:ce66:5ff:41fc:60db () {76 vars in 1531 bytes} [Sun Sep 24 00:38:24 2023] GET /mailman3/hyperkitty/search?mlist=LISTNAME-2%40DOMAIN.COM&q=Farnborough => generated 141638 bytes in 11039 msecs (HTTP/1.1 200) 6 headers in 333 bytes (1 switches on core 0)
... which seems to be pretty much the same but with out the "MailingList.DoesNotExist" error.
Any clues on where I should be looking to fix this would be gratefully received.
Thanks, Mar
On 9/23/23 17:51, Mark wrote:
For example, on searching for the word "Derbyshire" (which is mentioned in one of the archived messages) the results page displays:
"Sorry no email could be found for this query."
The logs show:
+++++++++++++++++++++++++++++++++++++ /var/log/mailman3/web/mailman-web.log +++++++++++++++++++++++++++++++++++++
[pid: 1156|app: 0|req: 55/55] 2001:8004:5310:2405:ce66:5ff:41fc:60db () {76 vars in 1537 bytes} [Sat Sep 23 23:57:43 2023] GET /mailman3/hyperkitty/search?mlist=LISTNAME-1%40DOMAIN.COM&q=Derbyshire => generated 13752 bytes in 3188 msecs (HTTP/1.1 200) 5 headers in 164 bytes (1 switches on core 0)
...
mlist = MailingList.objects.get(name=mlist_name)
File "/usr/lib/python3/dist-packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 406, in get raise self.model.DoesNotExist( hyperkitty.models.mailinglist.MailingList.DoesNotExist: MailingList matching query does not exist.
What does this database query return
select * from hyperkitty_mailinglist where name like
'LISTNAME-1@DOMAIN.COM';
If it returns a row, does case match for the name in the row and the query?
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On Sat, 2023-09-23 at 18:56 -0700, Mark Sapiro wrote: On 9/23/23 17:51, Mark wrote:
For example, on searching for the word "Derbyshire" (which is mentioned in one of the archived messages) the results page displays:
"Sorry no email could be found for this query."
The logs show:
+++++++++++++++++++++++++++++++++++++ /var/log/mailman3/web/mailman-web.log +++++++++++++++++++++++++++++++++++++
[pid: 1156|app: 0|req: 55/55] 2001:8004:5310:2405:ce66:5ff:41fc:60db () {76 vars in 1537 bytes} [Sat Sep 23 23:57:43 2023] GET /mailman3/hyperkitty/search?mlist=LISTNAME- 1%40DOMAIN.COM&q=Derbyshire => generated 13752 bytes in 3188 msecs (HTTP/1.1 200) 5 headers in 164 bytes (1 switches on core 0)
...
mlist = MailingList.objects.get(name=mlist_name) File "/usr/lib/python3/dist- packages/django/db/models/manager.py", line 82, in manager_method return getattr(self.get_queryset(), name)(*args, **kwargs) File "/usr/lib/python3/dist-packages/django/db/models/query.py", line 406, in get raise self.model.DoesNotExist( hyperkitty.models.mailinglist.MailingList.DoesNotExist: MailingList2023-09-21 08:55:36.385239+10 matching query does not exist.
What does this database query return
select * from hyperkitty_mailinglist where name like
'LISTNAME-1@DOMAIN.COM';
If it returns a row, does case match for the name in the row and the query?
Yes, the query returns a result and the case in "name" matches that in the query ...
select * from hyperkitty_mailinglist where name like 'listname@domain.com'
id | 91 name | listname@domain.com display_name | Listname description | ... subject_prefix | [Listname] archive_policy | 1 created_at | 2023-09-21 08:55:36.385239+2 list_id | listname@domain.com
On 9/23/23 19:19, Mark wrote:
Yes, the query returns a result and the case in "name" matches that in the query ...
select * from hyperkitty_mailinglist where name like 'listname@domain.com'
id | 91 name | listname@domain.com display_name | Listname description | ... subject_prefix | [Listname] archive_policy | 1 created_at | 2023-09-21 08:55:36.385239+2 list_id | listname@domain.com
Is that really what you get for list_id? It should have a dot, not at as listname.domain.com.
What if you do something like the following where django-admin is whatever command you use for the Django admin command.
$ django-admin shell
Python 3.9.16 (main, Dec 11 2022, 12:49:23)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from hyperkitty.models.mailinglist import MailingList
>>> mlist_name = 'listname@domain.com'
>>> mlist = MailingList.objects.get(name=mlist_name)
>>> mlist
<MailingList listname@domain.com (listname.domain.com)>
>>>
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 2023-09-24 13:28, Mark Sapiro wrote:
On 9/23/23 19:19, Mark wrote:
Yes, the query returns a result and the case in "name" matches that in the query ...
select * from hyperkitty_mailinglist where name like 'listname@domain.com'
id | 91 name | listname@domain.com display_name | Listname description | ... subject_prefix | [Listname] archive_policy | 1 created_at | 2023-09-21 08:55:36.385239+2 list_id | listname@domain.com
Is that really what you get for list_id? It should have a dot, not at as listname.domain.com.
What if you do something like the following where django-admin is whatever command you use for the Django admin command.
$ django-admin shell Python 3.9.16 (main, Dec 11 2022, 12:49:23) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> from hyperkitty.models.mailinglist import MailingList >>> mlist_name = 'listname@domain.com' >>> mlist = MailingList.objects.get(name=mlist_name) >>> mlist <MailingList listname@domain.com (listname.domain.com)> >>>
My bad. I was in too much of a hurry when redacting.
The list_id is as you say: listname.domain.com
On Sat, 2023-09-23 at 20:28 -0700, Mark Sapiro wrote:
What if you do something like the following where django-admin is whatever command you use for the Django admin command.
$ django-admin shell Python 3.9.16 (main, Dec 11 2022, 12:49:23) [GCC 9.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> from hyperkitty.models.mailinglist import MailingList >>> mlist_name = 'listname@domain.com' >>> mlist = MailingList.objects.get(name=mlist_name) >>> mlist <MailingList listname@domain.com (listname.domain.com)> >>>
Hi Mark,
I'm still on this problem of one list's archive-search not returning any results. I created a new list today and it also has the same problem.
In short, the older lists all seem fine -- their search results are up- to-date so that tells me the cron is okay. It's just these 2 newer lists that have the problem.
It's taken me while to get my head around the Django admin command you have suggested, as I kept getting an error when starting:
#django-admin shell
from hyperkitty.models.mailinglist import MailingList ... django.core.exceptions.ImproperlyConfigured ...
That sent me down a rabbit hole which I haven't emerged from yet.
But instead, starting with # python3 manage.py shell ...
# python3 manage.py shell Python 3.9.2 (default, Feb 28 2021, 17:03:44) [GCC 10.2.1 20210110] on linux Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole)
from hyperkitty.models.mailinglist import MailingList mlist_name = 'thepost@example.com' mlist = MailingList.objects.get(name=mlist_name) mlist <MailingList: <MailingList thepost@example.com>>
The format of my final line is different to yours ...
Mine: <MailingList: <MailingList thepost@example.com>>
Yours: <MailingList listname@domain.com (listname.domain.com)>
Does this (missing list_id) give any clue as to why the archive-search isn't working, or is just that starting with "# python3 manage.py shell" is the wrong thing to do?
By the way, the list_id shows correctly in the SQL query result.
select * from hyperkitty_mailinglist where name like 'thepost@example.com';
...
name | thepost@example.com
|
list_id | thepost.example.com
...
Any ideas appreciated. Mark
Mark writes:
as I kept getting an error when starting:�
#django-admin shell
from hyperkitty.models.mailinglist import MailingList ... django.core.exceptions.ImproperlyConfigured ...
This suggests to me that you have multiple versions of Python and/or Django installed. This is an advantage of using a venv -- once activated, you are much more likely to get a well-defined, consistent set of utilities and libraries.
By the way, does that # prompt mean you are logged in as root while you're working with Mailman? That is inadvisable, as if you change anything there is a chance that root will be the owner of the file, and the list user will not be able to work with it.
Also, if you have mailman-web installed, I recommend using that rather than django-admin or python3 manage.py. Each Django application may have its own copy of django-admin (not usually, but I've seen it happen) and definitely will have its own copy of manage.py. mailman-web knows how to work with both HyperKitty and Postorius without getting confused about that.
Does this (missing list_id) give any clue as to why the archive-search isn't working, or is just that starting with "# python3 manage.py shell" is the wrong thing to do?
If all you're doing is querying the database, it doesn't really matter how you do this. If you get the right copies of python3, django-admin, and manage-py it's all just going to work. It doesn't matter which utility you start with, it all ends up going through the same Django modules in the end.
The process of search index construction is both CPU and wall-clock time intensive. For this reason, the cron jobs that do regular index updating (I think once an hour?) are separate from the utility that constructs the index in the first place when you have existing archives (either migrated from another MLM such as Mailman 2, or if you're upgrading to better archive software).
Is it possible that you have most of a bunch of migrated archives indexed because the migration process did it, but that the periodic update process (like cron jobs, but actually run by the Django queue task manager IIRC) is not running, so no new posts are being archived? That fits the symptoms you described so far (that I recall).
Steve
On 10/9/23 22:50, Mark wrote:
I'm still on this problem of one list's archive-search not returning any results. I created a new list today and it also has the same problem.
In short, the older lists all seem fine -- their search results are up- to-date so that tells me the cron is okay. It's just these 2 newer lists that have the problem.
It's taken me while to get my head around the Django admin command you have suggested, as I kept getting an error when starting:
Steve has addressed your issues with multiple Django admin commands, some of which apparently don't reference the correct settings.
The format of my final line is different to yours ...
Mine: <MailingList: <MailingList thepost@example.com>>
Yours: <MailingList listname@domain.com (listname.domain.com)>
Does this (missing list_id) give any clue as to why the archive-search isn't working, or is just that starting with "# python3 manage.py shell" is the wrong thing to do?
It is not a clue as to the problem. The format of that representation
changed in HyperKitty 1.3.5. What is your version?
What happens if you do, with the appropriate Django admin and list name,
python3 manage.py update_index_one_list thepost@example.com
-- Mark Sapiro <mark@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan
On 2023-10-11 04:38, Mark Sapiro wrote:
Steve has addressed your issues with multiple Django admin commands, some of which apparently don't reference the correct settings.
The format of my final line is different to yours ...
It is not a clue as to the problem. The format of that
representation
changed in HyperKitty 1.3.5. What is your version?
HyperKitty version 1.3.4
What happens if you do, with the appropriate Django admin and list name,
python3 manage.py update_index_one_list thepost@example.com
Thank you Mark and Steve,
That worked a treat!
python3 manage.py update_index_one_list thepost@example.com
Running that command returned the message "Indexing 12 emails", then returned to the shell prompt when done. It successfully indexed the messages in the archive, and I could then do a search on the existing messages.
On 2023-10-11 00:54, Stephen J. Turnbull wrote:
The process of search index construction is both CPU and wall-clock time intensive. For this reason, the cron jobs that do regular index updating (I think once an hour?) are separate from the utility that constructs the index in the first place when you have existing archives (either migrated from another MLM such as Mailman 2, or if you're upgrading to better archive software).
The messages in the archive came directly from the list (ie. they weren't imported). It looks as though running that command flushed out a cobweb, as the hourly indexing is now happening.
Some time after I had run that indexing command, a new message was posted to the list, and appeared in the archive. I searched (a)immediately, (b) 15 mins after, and (c)an hour and a bit after.
As Steve said, the newer messages are automatically indexed hourly. So it wasn't until after that hourly index that the search terms in the newer messages were showing up in the search results.
I can't say I understand the inner workings of the process yet, but we have a working archive search and I'm very happy about that.
Thank you again, Mark
participants (3)
-
Mark
-
Mark Sapiro
-
Stephen J. Turnbull