Nelson Strother writes:
Presumably there's also a chance for the runner to detect and log the expired lock before it dies.
This may require some cleverness, as the delays introduced by recording a log message e.g. "I'm the nntp runner holding the lock, and it has not expired yet." are likely to change the results of the lock competition, maybe even allowing all runner processes to continue as intended.
Unless the runner is killed with an uncatchable signal, there likely is a point before exit to do this, and no, *all* runner processes won't continue, because this one is already dying in my suggestion. I understand there may be cases where you can't find a place to put the log, but I don't think it's worth treating this as a recoverable error if it happens that after the message is logged the initialization somehow completes successfully -- once you detect an expired lock, give up.
It is unclear to me whether a process would ever have an opportunity to record a log message e.g. "I'm the nntp runner holding the lock which has expired, but I am not dead yet." before it dies.
Agreed. Until it's clear that the opportunity doesn't exist, this idea may provide a faster simpler way to diagnose this kind of problem.
If such an opportunity exists, it should extend the lock's lifetime via e.g Lock.refresh(timedelta(seconds=10)).
I disagree. The point of setting an expiration is that you know the process is misbehaving if it holds the lock that long. Setting a longer expiration is a reasonable workaround until we understand and fix the misbehavior. But allowing the misbehaving process to extend possession of the lock is a bad idea.
As I attempted to summarize at the end of [3] above, each time I tried shorter sleep intervals (1, 10, 15 seconds) within .../flufl/lock/_lockfile.py
It wasn't clear to me which experiment that referred to. Thank you for clarifying.