OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problem with CLOUDSTACK-10240 (Cannot migrate local volume to shared storage)


Hi Rafael,

Thanks for your time on this.

Here is an example where the new code deviates from the old code in a critical fashion (code right below is new):

    private Map<Volume, StoragePool> getDefaultMappingOfVolumesAndStoragePoolForMigration(VirtualMachineProfile profile, Host targetHost) {
        Map<Volume, StoragePool> volumeToPoolObjectMap = new HashMap<Volume, StoragePool>();
        List<VolumeVO> allVolumes = _volsDao.findUsableVolumesForInstance(profile.getId());
        for (VolumeVO volume : allVolumes) {
            StoragePoolVO currentPool = _storagePoolDao.findById(volume.getPoolId());
            if (ScopeType.HOST.equals(currentPool.getScope())) {
                createVolumeToStoragePoolMappingIfNeeded(profile, targetHost, volumeToPoolObjectMap, volume, currentPool);
            } else {
                volumeToPoolObjectMap.put(volume, currentPool);
            }
        }
        return volumeToPoolObjectMap;
    }

What happens in the new code (above) is if the user didn’t pass in a storage pool to migrate the virtual disk to (but the VM is being migrated to a new cluster), this code just assigns the virtual disk to its current storage pool (which is not going to be visible to any of the hosts in the new compute cluster).

In the old code (I’m looking at 4.11.3 here), you could look around line 2337 for the following code (in the VirtualMachineManagerImpl.getPoolListForVolumesForMigration method):

                    // Find a suitable pool for the volume. Call the storage pool allocator to find the list of pools.

                    final DiskProfile diskProfile = new DiskProfile(volume, diskOffering, profile.getHypervisorType());
                    final DataCenterDeployment plan = new DataCenterDeployment(host.getDataCenterId(), host.getPodId(), host.getClusterId(),
                            host.getId(), null, null);

                    final List<StoragePool> poolList = new ArrayList<>();
                    final ExcludeList avoid = new ExcludeList();

                    for (final StoragePoolAllocator allocator : _storagePoolAllocators) {
                        final List<StoragePool> poolListFromAllocator = allocator.allocateToPool(diskProfile, profile, plan, avoid, StoragePoolAllocator.RETURN_UPTO_ALL);

                        if (poolListFromAllocator != null && !poolListFromAllocator.isEmpty()) {
                            poolList.addAll(poolListFromAllocator);
                        }
                    }

This old code would find an applicable storage pool in the destination cluster (one that can be seen by the hosts in that compute cluster).

I think the main error in the new logic is the assumption that a VM can only be migrated to a host in the same computer cluster. For XenServer (perhaps for other hypervisor types?), we support cross-cluster VM migration.

The other issue I noticed is that there is no logic in the new code that checks for managed-storage use cases. If you look in the VirtualMachineManagerImpl.getPoolListForVolumesForMigration method in the old code, there is special handling for managed storage. I don’t see this reproduced in the new logic.

I sympathize with your point that all tests passed yet this issue was not uncovered. Unfortunately, I suspect we have a fairly low % coverage of automated tests on CloudStack. If we ever did get to a high % of automated test coverage, we might be able to spin up new releases more frequently. As the case stands today, however, there are probably many un-tested use cases when it comes to our automated suite of tests.

Thanks again!
Mike

On 7/15/18, 4:19 PM, "Rafael Weingärtner" <rafaelweingartner@xxxxxxxxx> wrote:

    Mike, are you able to pin-point in the old/replaced code the bit that was
    handling your use case?  I took the most care not to break anything.
    Also, your test case, isn't it in the ACS' integration test suite? In
    theory, all test passed when we merged the PR.
    
    I sure can take a look at it. Can you detail your use case? I mean, the
    high level execution flow. What API methods you do, what you expected to
    happen, and what is happening today.
    
    On Sun, Jul 15, 2018 at 3:25 AM, Tutkowski, Mike <Mike.Tutkowski@xxxxxxxxxx>
    wrote:
    
    > It looks like this is the problematic PR:
    >
    > https://github.com/apache/cloudstack/pull/2425/
    >
    > On 7/15/18, 12:20 AM, "Tutkowski, Mike" <Mike.Tutkowski@xxxxxxxxxx> wrote:
    >
    >     Hi,
    >
    >     While running managed-storage regression tests tonight, I noticed a
    > problem that is not related to managed storage.
    >
    >     CLOUDSTACK-10240 is a ticket asking that we allow the migration of a
    > virtual disk that’s on local storage to shared storage. In the process of
    > enabling this feature, the VirtualMachineManagerImpl.
    > getPoolListForVolumesForMigration method was re-written in a way that
    > completely breaks at least one use case: Migrating a VM across compute
    > clusters (at least supported in XenServer). If, say, a virtual disk resides
    > on shared storage in the source compute cluster, we must be able to copy
    > this virtual disk to shared storage in the destination compute cluster.
    >
    >     As the code is currently written, this is no longer possible. It also
    > seems that the managed-storage logic has been dropped for some reason in
    > the new implementation.
    >
    >     Rafael – It seems that you worked on this feature. Would you be able
    > to look into this and create a PR?
    >
    >     Thanks,
    >     Mike
    >
    >
    >
    
    
    -- 
    Rafael Weingärtner