[MM3-users] Re: toward improving correctness when running on single core system ...

June 20, 2023 · *all*


      Nelson Strother writes:
...
...
Presumably there's also a chance for the runner to detect and
log the expired lock before it dies.
This may require some cleverness, as the delays introduced by
recording a log message e.g. "I'm the nntp runner holding the lock,
and it has not expired yet." are likely to change the results of
the lock competition, maybe even allowing all runner processes to
continue as intended.
Unless the runner is killed with an uncatchable signal, there likely
is a point before exit to do this, and no, *all* runner processes
won't continue, because this one is already dying in my suggestion.  I
understand there may be cases where you can't find a place to put the
log, but I don't think it's worth treating this as a recoverable error
if it happens that after the message is logged the initialization
somehow completes successfully -- once you detect an expired lock,
give up.
...
It is unclear to me whether a process would ever have an
opportunity to record a log message e.g. "I'm the nntp runner
holding the lock which has expired, but I am not dead yet." before
it dies.
Agreed.  Until it's clear that the opportunity doesn't exist, this
idea may provide a faster simpler way to diagnose this kind of
problem.
...
If such an opportunity exists, it should extend the lock's lifetime
via e.g Lock.refresh(timedelta(seconds=10)).
I disagree.  The point of setting an expiration is that you know the
process is misbehaving if it holds the lock that long.  Setting a
longer expiration is a reasonable workaround until we understand and
fix the misbehavior.  But allowing the misbehaving process to extend
possession of the lock is a bad idea.
...
As I attempted to summarize at the end of [3] above, each time I
tried shorter sleep intervals (1, 10, 15 seconds) within
.../flufl/lock/_lockfile.py
It wasn't clear to me which experiment that referred to.  Thank you
for clarifying.

[MM3-users] Re: toward improving correctness when running on single core system ...

Stephen J. Turnbull