random writes A systems engineer's blog

A few VMware Virtual Volumes (VVol) tips

VMware Virtual Volumes (VVols) are a vSphere feature which allow for storing VM virtual disks and other files natively on the storage system (instead of on VMFS datastores created on top of LUNs) and managing their placement on the storage system through vSphere storage policies. VVols are available since vSphere 6.0 and require a vSphere Standard or Enterprise Plus license.

VVols are a pretty fascinating piece of technology, which I’m guessing is set to completely replace VMFS datastores in a few years time. Since this is a fairly new feature without many implementation tips available online, here are a few things to bear in mind when deploying VVols:

  • in order to use VVols, besides a VVol certified storage system, HBAs in your ESXi hosts need to support Secondary LUN IDs - to check whether this is the case, refer to the the VMware Compatibility Guide for IO Devices and be sure to select Secondary LUNID (Enables VVols) in the Features section
    • if your hosts are running ESXi 6, you can check for HBA VVol compatibility from the command line by running esxcli storage core adapter list and noting whether “Capabilites” column contains Second Level Lun ID for the vmhbas in question
  • although maximum data VVol (= virtual disk) size in vSphere 6.0 is 62 TB, you should check what are the limits on the storage system side, because they could be much less and force you to create multiple VVol datastores on the same system in order to consume its full available capacity; one such example is EMC Unity which allows you to consume up to 16 TB of a single storage pool for a VVol datastore

  • migration of VMs from VMFS to VVol datastores is performed using Storage vMotion and should be initiated solely from the web client; the vSphere C# client doesn’t offer the option to select a storage policy in the Storage vMotion wizard, which results in user-created storage policies being ignored and migrated VMs always being assigned the default “VVol No Requirements Policy”

  • if you are using Veeam Backup and Replication, VVols are supported since VBR 8.0U2b but with a few caveats:
    • direct SAN transport mode is not supported
    • Virtual Appliance (Hot Add) processing mode requires that all VBR proxy VM disks are located on the same VVol with the processed VM.

Trying out Docker client for Windows

Docker client for Windows was released a few months ago, and recently I installed it inside of a Windows 10 CTP machine and tested it against a CentOS 7 serving as a Docker host.

For the installation I employed the Chocolatey package manager:

C:\> choco install docker -y

which will install the latest version of the client (1.7.0 at the time of writing).

By default, docker daemon listens only on unix:///var/run/docker.sock socket and therefore accepts only local connections, so in order to access it from the outside I needed to bind it to a TCP port. So I first stopped the Docker daemon on my container host:

$ sudo systemctl stop docker

and then ran it like this so it binds to TCP port 2375 in addition to the default socket:

$ sudo docker -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock -d &

Note: This was only for the purpose of a quick-and-dirty test, and you can read here why allowing the daemon to accept remote calls in this way isn’t such a great idea security-wise.

After this, everything was ready for connecting from my Windows client, or so I thought. I tried listing images on the docker host but was greeted with the following error:

C:\> docker -H 10.0.0.1:2375 info
Error response from daemon: client and server don't have same version (client : 1.19, server: 1.18)

After a quick docker -v, I realized that the daemon was running Docker version 1.6.2, while the client was running version 1.7.0.

I had no choice but to fire up Chocolatey again and install version 1.6.0 of the client:

C:\> choco uninstall docker -y
C:\> choco install docker -version 1.6.0 -y

and then everything worked as expected:

C:\> docker -H 10.0.0.1:2375 info
Containers: 0
Images: 33
Storage Driver: devicemapper
...
Kernel Version: 3.10.0-123.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 1
Total Memory: 458.4 MiB
Name: dockerhost
ID: DBST:ZQ5O:LOAJ:XG7D:FEBL:6FYO:2JZ3:CQN2:QPK6:6ARN:XBEZ:THVQ

If you go back to the Docker host, you’ll notice that your client commands result in calls to the Docker API:

...
INFO[0680] GET /v1.18/info                              
...

where v1.18 is the API version and when a client is running on a higher version than the daemon, a mismatch occurs in the API calls (trying to access /v1.19/info instead of /v1.18/info and similar). This explains the problem I originally had, although at the time it seemed kinda odd to me that you cannot use a higher version client to access a lower version daemon.

In order to avoid having to specify your Docker host parameters with every command, you can set a Windows environment variable DOCKER_HOST and assign it a value of tcp://<FQDN or IP of Docker host>:<TCP port>, e.g. tcp://10.0.0.1:2375 in my case. Afterwards, you can run the commands on your Windows client machine as if you’re located directly on the Docker host:

C:\> docker run -it ubuntu
root@825574d22c14:/# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"
root@825574d22c14:/# exit
C:\>

Pretty crazy, huh :) Kudos to the Microsoft Azure Linux team for porting the Docker client to Windows, and be sure to read more about their efforts on Ahmet Alp Balkan’s blog.

NetApp cDOT and ESXi SCSI UNMAP support

I have recently noticed that LUNs created on NetApp cDOT storage systems by default don’t support space reclamation (aka SCSI UNMAP) which is a part of the vSphere VAAI Thin Provisioning primitive:

~ # esxcli storage core device vaai status get
naa.600a0980424733322f5d444f76513730
   VAAI Plugin Name: VMW_VAAIP_NETAPP
   ATS Status: supported
   Clone Status: supported
   Zero Status: supported
   Delete Status: unsupported

and when you try to reclaim space, you are greeted with the following message:

~ # esxcli storage vmfs unmap -l cdot_01
Devices backing volume 549a787c-f5a4021c-e59d-68b599cbd47c do not support UNMAP

Based on a short review of a few systems, this seems to be the case only for the cDOT platform (tested on Ontap 8.2 and 8.3), while LUNs created on 7-mode systems show Delete Status: supported.

Explanation for this behavior can be found in the NetApp cDOT official documentation, namely the SAN Administration Guide. The document states that in order for space reclamation to be supported on a LUN:

  1. LUN needs to be thinly provisioned on the storage system
  2. space-allocation option needs to be enabled on the LUN

Information on how to enable space-allocation on a LUN is available in the same guide:

lun modify -vserver <vs> -volume <vol> -lun <lun> -space-allocation enabled

What’s a bugger is that the LUN needs to be offline in order to run this command, so best thing to do would be to enable this prior to presenting the LUN to the hosts.

One more thing to have in mind is that the VMware Compatibility Guide explicitly states that “VAAI Thin Provisioning Space Reclamation is not supported” for ESXi >= 5.5, and Data Ontap < 8.3 (both clustered and 7-mode). So, even if your NetApp LUNs are configured properly for space reclamation, be sure that you are running a supported configuration before initiating a SCSI UNMAP operation from an ESXi host.

Centralized ESXi syslog with ELK feat. SexiLog

The ELK stack is a commonly used open source solution for log collection and analysis. It consists of three components:

  • Elasticsearch - distributed search engine, responsible for storing and searching through the received data;

  • Logstash - log collector, parser and forwarder;

  • Kibana - Elasticsearch web frontend, the end user interface for searching and visualizing log data.

There are a lot of tutorials on setting up the ELK stack, but if you’re looking into implementing ELK as a centralized (sys)log server for your vSphere environment, you should probably look no further than SexiLog.

To quote the official site - SexiLog is a specific ELK virtual appliance designed for vSphere environment. It’s pre-configured and heavily tuned for VMware ESXi logs. This means that the nice folks from Hypervisor.fr and VMdude.fr have gone through the trouble of installing ELK, optimizing it for vSphere log collection, adding a bunch of other useful tools and packaging everything as an easy to deploy VMware virtual appliance.

The appliance is a Debian-based VM with 2vCPU, 4GB RAM and 58GB hard drive space (8GB system disk + 50GB disk for storing indexed logs), and is sized for collecting up to 1500 events per second. Default credentials for the appliance are root / Sex!Log and after each log in you will be greeted with a menu that can be used for basic configuration and operations. Since this “SexiMenu” is a bash script which can be run manually from /root/seximenu/seximenu.sh, if you want to avoid it and log in directly to shell each time, you can simply comment out the last three lines from /root/.bashrc.

After deploying the appliance, you’ll need to configure your ESXi hosts to start sending syslog data, which can be done manually through vSphere client or ESXCLI or in an automated manner, e.g. with PowerCLI. Besides syslog, SexiLog also offers the possibility of collecting SNMP traps (and has Logstash SNMP grok patterns optimized for ESXi and Veeam Backup and Replication), as well as vCenter vpxd logs and Windows event logs with the help of the NXLog agent. For more info, RTFM :)

SexiLog goodies

SexiLog offers much more than just a vanilla ELK installation. Besides providing Logstash filters optimized for vCenter, ESXi and Veeam B&R log and SNMP trap collection, it also comes with a bunch of pre-configured Kibana dashboards.

Then, there is a notification service powered by Riemann, which receives warnings and alerts from Logstash and either sends them every 10 minutes (critical alerts) or aggregates them and e-mails them once per hour (all other alerts). You can check out which events are considered critical by looking at Logstash configuration files located at /etc/logstash/conf.d/ and searching for rules which add the achtung tag. In order for Riemann to do its job, it is necessary to configure SMTP parameters for the appliance, which can be done through the SexiMenu (option 7).

Another important part of SexiLog is Curator, which is used for purging Elasticsearch indices, in order for them not to fill up the /dev/sdb1 partition used for storing logs. Curator runs once per hour, as defined in /etc/crontab:

5 * * * * root curator delete --disk-space 40

and takes maximum allowed size of indices in GB as the input parameter. By default this is set to 40[GB] and should be changed if you decide to extend SexiLog’s hard disk no. 2.

Also, for more info about the health and performance of the Elasticsearch service, SexiLog provides two useful plugins - Head and Bigdesk, which can be accessed from http://<sexilog_IP_or_FQDN>/_plugin/head and http://<sexilog_IP_or_FQDN>/_plugin/bigdesk.

Feature requests

Two possible improvements have come to my mind while using SexiLog. First, it would be nice if there was a way to provide user authentication for accessing the Kibana interface (maybe kibana-authentication-proxy could help with this since Shield seems to be a commercial product?). Also, an option to export search results would be useful (e.g. when dealing with techical support), but this seems to be a current limitation of Elasticsearch/Kibana.

Further steps

For more information about SexiLog check out the site, and be sure to try out the demo. You can also contribute to this great project via Github.

Exploring VCSA embedded PostgreSQL database

Since vSphere 5.0U1, VMware vCenter Server Appliance (VCSA) uses vPostgres - VMware flavored PostgreSQL as the embedded database. This post describes how to connect to the VCSA vPostgres server locally and remotely, and perform database backups using native PostgreSQL tools.

Note: Following procedures are probably unsupported by VMware and are given here just for the fun of hacking around VCSA. The instructions have been tested for VCSA 5.5 and 6.0 (VCSA 6.0 requires additional steps which can be found at the bottom of the post).

Connecting to PostgreSQL server locally

After logging to the VCSA over SSH or console, you can easily connect to the PostgreSQL server locally using psql:

/opt/vmware/vpostgres/current/bin/psql -U postgres

After connecting you can use psql or regular SQL commands, e.g.

# /opt/vmware/vpostgres/current/bin/psql -U postgres
psql.bin (9.3.5 (VMware Postgres 9.3.5.2-2444648 release))
Type "help" for help.

postgres=# \du
                             List of roles
 Role name |                   Attributes                   | Member of 
-----------+------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication | {}
 vc        | Create DB                                      | {}

Here we see that there are two users defined in the PostgreSQL server, the postgres superuser, and the vc user, used by vCenter for connecting to its database.

Enabling remote PostgreSQL server access

By default, only local connections to the database server are allowed. In order to allow remote access (e.g. so that you can connect via GUI based administrative tools such as pgAdmin), first take a look at the following files:

/etc/vmware-vpx/embedded_db.cfg

/etc/vmware-vpx/vcdb.properties

embedded_db.cfg file stores general PostgreSQL server information (as well as the password for the postgres superuser), while vcdb.properties stores connection information for the vCenter server database VCDB, along with the password for the vc user. Take a note of these passwords, since you’ll be required to supply them for remote access.

Then, edit the /storage/db/vpostgres/pg_hba.conf configuration file in order to allow your IP to connect to the PostgreSQL server by adding the following line:

host    all             all             1.2.3.4/24          md5

replacing 1.2.3.4/24 with the actual IP address or range of addresses for which you want to allow access (e.g. 192.168.1.0/24).

Next, edit the /storage/db/vpostgres/postgresql.conf in order to configure PostgreSQL server to listen on all available IP addresses by adding the following line:

listen_addresses = '*'

Finally, restart the PostgreSQL server by running

/etc/init.d/vmware-vpostgres restart

Backing up the vCenter database

For information on using native PostgreSQL tools to perform VCDB backups and restores check out VMware KB 2034505. The requirement for the vCenter service to be stopped during the database backup seems kinda redundant, since pg_dump should perform consistent backups even if the database is in use.

Sample backup scripts and instructions on how to schedule them via cron can be found on Florian Bidabe’s and vNinja blogs. Since mount.nfs is available on the VCSA, it seems that you can even use an NFS share as a destination for your VCDB backups (haven’t tested it though).

VCSA 6.0 additional steps

VCSA 6.0 comes extra hardened compared to previous vSphere editions and additional steps are needed in order to allow remote access to the OS and then to the PostgreSQL server.

First, you need to enable SSH access to VCSA. This can be done during the deployment, or later, over VM console (similar interface to ESXi DCUI: F2 - Troubleshooting Mode Options - Enable SSH) or vSphere Web Client (Home - System Configuration - Nodes - Manage - Settings - Access).

After logging as root over SSH, you will be greeted with a limited shell called appliancesh. To switch to bash temporarily, run:

shell.set --enabled True
shell

For switching to bash permanently you can follow the instructions from this virtuallyGhetto post.

The final step is to allow external access to the PostgreSQL through the VCSA IPTables-based firewall. This can be done by editing the /etc/vmware/appliance/firewall/vmware-vpostgres file so that it looks like this:

{
  "firewall": {
    "enable": true,
    "rules": [
      {
        "direction": "inbound",
        "protocol": "tcp",
        "porttype": "dst",
        "port": "5432",
        "portoffset": 0
      }
    ]
  },
  "internal-ports": {
    "rules": [
      {
        "name": "server_port",
        "port": 5432
      }
    ]
  }
}

Afterwards, reload VCSA firewall by running

/usr/lib/applmgmt/networking/bin/firewall-reload

and PostgreSQL server should be accessible from the outside world after configuring pg_hba.conf and postgresql.conf as described above.