[cinder][tooz]Lock-files are remained
On 17/10, Herve Beraud wrote:
> Thanks Ben for your feedbacks.
> I already tried to follow the `remove_external_lock_file` few months ago,
> but unfortunately, I don't think we can goes like this with Cinder...
> As Gorka has explained to me few months ago:
> > Those are not the only type of locks we use in Cinder. Those are the
> > ones we call "Global locks" and use TooZ so the DLM can be configured
> > for Cinder Active-Active.
> > We also use Oslo's synchronized locks.
> > More information is available in the Cinder HA dev ref I wrote last
> > year. It has a section dedicated to the topic of mutual exclusion and
> > the 4 types we currently have in Cinder :
> > - Database locking using resource states.
> > - Process locks.
> > - Node locks.
> > - Global locks.
> > As for calling the remove_external_lock_file_with_prefix directly on
> > delete, I don't think that's something we can do, as the locks may still
> > be in use. Example:
> > - Start deleting volume -> get lock
> > - Try to clone volume -> wait for lock
> > - Finish deleting volume -> release and delete lock
> > - Cloning recreates the lock when acquiring it
> > - Cloning fails because the volume no longer exists but leaves the lock
> So the Cinder workflow and mechanisms seems to definitively forbid to
> us the possibility to use the remove features of oslo.concurrency...
> Also like discussed on the review (https://review.opendev.org/#/c/688413),
> this issue can't be fixed in the underlying libraries, and I think that if
> we want to fix that on stable branches then Cinder need to address it
> directly by adding some piece of code who will be triggered if needed and
> in a safely manner, in other words, only Cinder can really address it and
> remove safely these file.
> See the discussion extract on the review (
I've given it some more thought, and I am now on the side of those that
argue that "something imperfect" is better than what we currently have,
so maybe we can reach some sort of compromise doing the following:
- Cleanup locks directory on node start
- Remove locks on delete volume/snapshot operation
- Remove locks on missing source on create volume (volume/snapshot)
It may not cover 100% of the cases, and in a long running system with
few deletions will not help, but it should at least alleviate the issue
for some users.
To illustrate the Cinder part of this approach I have written a WIP
patch  (and an oslo.concurrency one  that is not needed, but would
improve the user experience). I haven't bothered to test it yet, but
I'll do it if we agree this is a reasonable compromise we are
> > Thanks Gorka for your feedback, then in view of all the discussions
> > about this topic I suppose only Cinder can really address it safely
> > on stable branches.
> > > It is not a safe assumption that *-delete_volume file locks can be
> > > removed just because they have not been used in a couple of days.
> > > A new volume clone could come in that would use it and then we
> > > could have a race condition if the cron job was running.
> > >
> > > The only way to be sure that it can be removed is checking in the
> > > Cinder DB and making sure that the volume has been deleted or it
> > > doesn't even exist (DB has been purged).
> > >
> > > Same thing with detach_volume, delete_snapshot, and those that are
> > > directly volume ids locks.
> > I definitely think that it can't be fixed in the underlying
> > libraries like Eric has suggested , indeed, as you has explained
> > only Cinder can know if a lock file can be removed safely.
> > > In my opinion the fix should be done in fasteners, or we should add
> > > code in Cinder that cleans up all locks related to a volume or
> > > snapshot when this one is deleted.
> > I agree the most better solution is to fix the root cause and so to
> > fix fasteners, but I don't think it's can be backported to stable
> > branches because we will need to bump a requirement version on
> > stable branche in this case and also because it'll introduce new
> > features, so I guess Cinder need to add some code to remove these
> > files and possibly backport it to stable branches.
> > 
> The Fasteners fix IMHO can only be used by future versions of openstack,
> due to the version bump and due to the new features added. I think that it
> could be available only from the ussuri or future cycle like V.
> The main goal of the cron approach was to definitively avoid to unearth
> this topic each 6 months, try to address it on stable branches, and try to
> take care of the file system usage even if it's a theoretical issue, but by
> getting feedbacks from the Cinder team and their warnings I don't think
> that this track is still followable.
> Definitely, this is not an oslo.concurrency bug. Anyway your proposed
> "Administrator Guide" is a must to have, to track things in one place,
> inform users and avoid to spend time to explain the same things again and
> again about this topic... so it's worth-it. I'll review it and propose my
> related knowledge on this topic. oslo.concurrency can't address this safely
> because we risk to introduce race conditions and worse situations than the
> leftover lock files.
> So, due to all these elements, only cinder can address it for the moment
> and for fix that on stable branches too.
> Le mer. 16 oct. 2019 Ã 00:15, Ben Nemec <openstack at nemebean.com> a Ã©crit :
> > In the interest of not having to start this discussion from scratch
> > every time, I've done a bit of a brain dump into
> > https://review.opendev.org/#/c/688825/ that covers why things are the
> > way they are and what we recommend people do about it. Please take a
> > look and let me know if you see any issues with it.
> > Thanks.
> > -Ben
> HervÃ© Beraud
> Senior Software Engineer
> Red Hat - Openstack Oslo
> irc: hberaud
> -----BEGIN PGP SIGNATURE-----
> -----END PGP SIGNATURE-----