Table of Contents
VSAN deployment is all about moving large amounts of data into a new physical VSAN cluster. This is done based on the customer requirements and the VSAN design specs. But most of the time during VSAN deployments it is found that the VSAN cluster goes down for some time and comes back after some time due to various issues and problems. Below are some of the most common issues in a VSAN deployment.
- High disk usage;
- Data corruption;
- Out of memory errors;
- Slow Performance;
- Bad network performance
Let’s dive deep into all the top VSAN issues and fixes and their corresponding solutions. Please use these for avoiding those issues in your deployment.
Table of contents:
- High disk data usage
- Data corruption
- Out-of-memory errors
- Slow performance
- Bad network performance
- Final Say
1. High Disk Usage
This is the most common and serious issue faced during a VSAN deployment. During the deployment, disk space for logical disk volumes is allocated to different vSAN clusters or datastores, and it continues to grow over time. This results in high disk space usage for the respective VSAN datastore. This results in a higher number of snapshots, backups, and replicas. As this continues over time, in the end, you get a huge amount of extra disk space usage. And as the disks run out of space, you will be unable to create more disk volumes and you will be forced to scale the Virtual SAN storage.
You will need to analyze the cause of the high disk usage. Make sure the disks are not full and other VSAN datastore are not having issues. If the disk usage is for some other reason, then you need to look for the resolution.
Here are some scenarios which can be causing the issue:
- If the cluster configuration is set for all the volumes to use the same VSAN datastore, it can cause overage as the respective disks will be added in multiple clusters and it can cause higher space usage;
- If you are using snapshots to recover data, it can result in higher space usage. If the volumes are of large size, it is recommended that you delete the snapshots if the disks are getting close to the disk space quota;
- If you are using the VSAN replication feature, it is recommended that you use only those volumes which are used for replication and not for anything else. The reason for this is that the replication and migration feature uses a higher number of disks, which results in higher disk usage.
2. Data Corruption
This is another common issue during a VSAN deployment. This is the result of the snapshot taking place when the VSAN datastore is getting full. Snapshots are taken to make sure that there is no data loss when the volume is used. But if the disk usage for the respective datastore is getting higher, it is very likely that the data is going to be corrupted in the end. Here are some of the situations which can cause this.
- Snapshot taking place at the end of the disk usage;
- The snapshot is taking place because of data growth;
- The disk is getting full when a snapshot is taking place;
- High disk usage causes too many snapshots.
If you face this issue during the deployment, then you will need to do the following. Check if the backup server is also facing the same issue as it is also consuming the space and make sure that the snapshot is not taking place in a particular cluster. If you see any errors in the backups during the deployment, then you can manually stop the backup or delete the backup. You can also consider using the VMware vSAN Snapshot Throttle tool to help with the problem. You can also try increasing the storage quota for the disks by enabling this option in your VSAN deployment.
3. Out-Of-Memory Errors
This is a common issue during the deployment of a VSAN cluster. VSAN is used for the high-performance, multi-site deployment of vSphere. So it does allocate space for various components in your deployment and it also needs to have a high-speed connection to the VSAN gateway for migration. All this will increase the memory usage of the deployment, which may result in the out of memory error.
Here are some of the things you can do to resolve this issue:
- Memory allocation;
- Slow connection to the gateway;
- Slow migration of the disks
It is better to implement all of them on your end. Thus, you can be sure that no deployment issues occur in the future, and advanced sustainable computing can be ensured.
4. Slow Performance
Performance is one of the most important attributes of a VSAN deployment. If the performance is low, then the customer experience and the availability of the vSphere environment are affected. You need to look for a solution that can improve the performance. Here are some of the typical reasons for low performance.
- VSAN datastore has high disk space usage;
- Data migration or snapshot is taking place when the datastore is getting full;
- The migration or snapshot is taking place to the cluster which is already heavily loaded;
- High network traffic during migration and snapshot
There are different tools that you can use to check the performance of your deployment.
- VMDK size
- VMFS size
- vSAN inventory tool
- VMware vSAN toolkit (VSTK)
You can use the below tools to check the performance of the vSAN deployment. Besides that, you can also run the usage and other reports to check the performance. You can check the disk usage, migration, performance, etc.
5. Bad Network Performance
A bad network performance issue can lead to the performance of the vSphere infrastructure and the availability of services. It can also affect the performance of the VM workloads and the overall VM performance.
- The network link has less capacity than required;
- Bad performance of the network card;
- High network traffic caused by VSAN migration
You will need to understand the cause of the performance issue. There is no single solution that will work in 100% of cases. We strongly recommend that you monitor and analyze the performance of the network. But you will also need to ensure that the performance can be supported for the entire VSAN deployment.
Different deployment and configuration issues may occur with VSAN (Storage Virtualization over Asynchronous Networks) deployments across vendors, products, OS versions, and other environmental factors. And in order to be able to prevent major failures, it is important to know what key issues are and how they can be fixed. And this is where the above info will come in handy.