[cinder][dev] Bug for deferred deletion in RBD
I tested today by increasing EVENTLET_THREADPOOL_SIZE size to 100. I wanted
to have good results,
but this time I did not get a response after removing 41 volumes. This
environment variable did not fix
the cinder-volume stopping.
Restarting the stopped cinder-volume will delete all volumes that are in
deleting state while running the clean_up function.
Only one volume in the deleting state, I force the state of this volume to
be available, and then delete it, all volumes will be deleted.
This result was the same for 3 consecutive times. After removing dozens of
volumes, the cinder-volume was down,
and after the restart of the service, 199 volumes were deleted and one
volume was manually erased.
If you have a different approach to solving this problem, please let me
2019ë?? 2ì?? 11ì?¼ (ì??) ì?¤í?? 9:40, Arne Wiebalck <Arne.Wiebalck at cern.ch>ë??ì?´ ì??ì?±:
> On 11 Feb 2019, at 11:39, Jae Sang Lee <hyangii at gmail.com> wrote:
> I saw the messages like ''moving volume to trash" in the cinder-volume
> logs and the peridic task also reports
> like "Deleted <vol-uuid> from trash for backend '<backends-name>'"
> The patch worked well when clearing a small number of volumes. This
> happens only when I am deleting a large
> number of volumes.
> Hmm, from cinderâ??s point of view, the deletion should be more or less
> instantaneous, so it should be able to â??deleteâ??
> many more volumes before getting stuck.
> The periodic task, however, will go through the volumes one by one, so if
> you delete many at the same time,
> volumes may pile up in the trash (for some time) before the tasks gets
> round to delete them. This should not affect
> c-vol, though.
> I will try to adjust the number of thread pools by adjusting the
> environment variables with your advices
> Do you know why the cinder-volume hang does not occur when create a
> volume, but only when delete a volume?
> Deleting a volume ties up a thread for the duration of the deletion (which
> is synchronous and can hence take very
> long for ). If you have too many deletions going on at the same time, you
> run out of threads and c-vol will eventually
> time out. FWIU, creation basically works the same way, but it is almost
> instantaneous, hence the risk of using up all
> threads is simply lower (Gorka may correct me here :-).
> 2019ë?? 2ì?? 11ì?¼ (ì??) ì?¤í?? 6:14, Arne Wiebalck <Arne.Wiebalck at cern.ch>ë??ì?´ ì??ì?±:
>> To make sure deferred deletion is properly working: when you delete
>> individual large volumes
>> with data in them, do you see that
>> - the volume is fully â??deleted" within a few seconds, ie. not staying in
>> â??deletingâ?? for a long time?
>> - that the volume shows up in trash (with â??rbd trash lsâ??)?
>> - the periodic task reports it is deleting volumes from the trash?
>> Another option to look at is â??backend_native_threads_pool_size": this
>> will increase the number
>> of threads to work on deleting volumes. It is independent from deferred
>> deletion, but can also
>> help with situations where Cinder has more work to do than it can cope
>> with at the moment.
>> On 11 Feb 2019, at 09:47, Jae Sang Lee <hyangii at gmail.com> wrote:
>> Yes, I added your code to pike release manually.
>> 2019ë?? 2ì?? 11ì?¼ (ì??) ì?¤í?? 4:39ì?? Arne Wiebalck <Arne.Wiebalck at cern.ch>ë??ì?´ ì??ì?±:
>>> Hi Jae,
>>> You back ported the deferred deletion patch to Pike?
>>> > On 11 Feb 2019, at 07:54, Jae Sang Lee <hyangii at gmail.com> wrote:
>>> > Hello,
>>> > I recently ran a volume deletion test with deferred deletion enabled
>>> on the pike release.
>>> > We experienced a cinder-volume hung when we were deleting a large
>>> amount of the volume in which the data was actually written(I make 15GB
>>> file in every volumes), and we thought deferred deletion would solve it.
>>> > However, while deleting 200 volumes, after 50 volumes, the
>>> cinder-volume downed as before. In my opinion, the trash_move api does not
>>> seem to work properly when removing multiple volumes, just like remove api.
>>> > If these test results are my fault, please let me know the correct
>>> test method.
>>> Arne Wiebalck
>>> CERN IT
>> Arne Wiebalck
>> CERN IT
> Arne Wiebalck
> CERN IT
-------------- next part --------------
An HTML attachment was scrubbed...