random writes A systems engineer's blog

Trying out Docker client for Windows

Docker client for Windows was released a few months ago, and recently I installed it inside of a Windows 10 CTP machine and tested it against a CentOS 7 serving as a Docker host.

For the installation I employed the Chocolatey package manager:

C:\> choco install docker -y

which will install the latest version of the client (1.7.0 at the time of writing).

By default, docker daemon listens only on unix:///var/run/docker.sock socket and therefore accepts only local connections, so in order to access it from the outside I needed to bind it to a TCP port. So I first stopped the Docker daemon on my container host:

$ sudo systemctl stop docker

and then ran it like this so it binds to TCP port 2375 in addition to the default socket:

$ sudo docker -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock -d &

Note: This was only for the purpose of a quick-and-dirty test, and you can read here why allowing the daemon to accept remote calls in this way isn’t such a great idea security-wise.

After this, everything was ready for connecting from my Windows client, or so I thought. I tried listing images on the docker host but was greeted with the following error:

C:\> docker -H 10.0.0.1:2375 info
Error response from daemon: client and server don't have same version (client : 1.19, server: 1.18)

After a quick docker -v, I realized that the daemon was running Docker version 1.6.2, while the client was running version 1.7.0.

I had no choice but to fire up Chocolatey again and install version 1.6.0 of the client:

C:\> choco uninstall docker -y
C:\> choco install docker -version 1.6.0 -y

and then everything worked as expected:

C:\> docker -H 10.0.0.1:2375 info
Containers: 0
Images: 33
Storage Driver: devicemapper
...
Kernel Version: 3.10.0-123.el7.x86_64
Operating System: CentOS Linux 7 (Core)
CPUs: 1
Total Memory: 458.4 MiB
Name: dockerhost
ID: DBST:ZQ5O:LOAJ:XG7D:FEBL:6FYO:2JZ3:CQN2:QPK6:6ARN:XBEZ:THVQ

If you go back to the Docker host, you’ll notice that your client commands result in calls to the Docker API:

...
INFO[0680] GET /v1.18/info                              
...

where v1.18 is the API version and when a client is running on a higher version than the daemon, a mismatch occurs in the API calls (trying to access /v1.19/info instead of /v1.18/info and similar). This explains the problem I originally had, although at the time it seemed kinda odd to me that you cannot use a higher version client to access a lower version daemon.

In order to avoid having to specify your Docker host parameters with every command, you can set a Windows environment variable DOCKER_HOST and assign it a value of tcp://<FQDN or IP of Docker host>:<TCP port>, e.g. tcp://10.0.0.1:2375 in my case. Afterwards, you can run the commands on your Windows client machine as if you’re located directly on the Docker host:

C:\> docker run -it ubuntu
root@825574d22c14:/# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"
root@825574d22c14:/# exit
C:\>

Pretty crazy, huh :) Kudos to the Microsoft Azure Linux team for porting the Docker client to Windows, and be sure to read more about their efforts on Ahmet Alp Balkan’s blog.

NetApp cDOT and ESXi SCSI UNMAP support

I have recently noticed that LUNs created on NetApp cDOT storage systems by default don’t support space reclamation (aka SCSI UNMAP) which is a part of the vSphere VAAI Thin Provisioning primitive:

~ # esxcli storage core device vaai status get
naa.600a0980424733322f5d444f76513730
   VAAI Plugin Name: VMW_VAAIP_NETAPP
   ATS Status: supported
   Clone Status: supported
   Zero Status: supported
   Delete Status: unsupported

and when you try to reclaim space, you are greeted with the following message:

~ # esxcli storage vmfs unmap -l cdot_01
Devices backing volume 549a787c-f5a4021c-e59d-68b599cbd47c do not support UNMAP

Based on a short review of a few systems, this seems to be the case only for the cDOT platform (tested on Ontap 8.2 and 8.3), while LUNs created on 7-mode systems show Delete Status: supported.

Explanation for this behavior can be found in the NetApp cDOT official documentation, namely the SAN Administration Guide. The document states that in order for space reclamation to be supported on a LUN:

  1. LUN needs to be thinly provisioned on the storage system
  2. space-allocation option needs to be enabled on the LUN

Information on how to enable space-allocation on a LUN is available in the same guide:

lun modify -vserver <vs> -volume <vol> -lun <lun> -space-allocation enabled

What’s a bugger is that the LUN needs to be offline in order to run this command, so best thing to do would be to enable this prior to presenting the LUN to the hosts.

One more thing to have in mind is that the VMware Compatibility Guide explicitly states that “VAAI Thin Provisioning Space Reclamation is not supported” for ESXi >= 5.5, and Data Ontap < 8.3 (both clustered and 7-mode). So, even if your NetApp LUNs are configured properly for space reclamation, be sure that you are running a supported configuration before initiating a SCSI UNMAP operation from an ESXi host.

Centralized ESXi syslog with ELK feat. SexiLog

The ELK stack is a commonly used open source solution for log collection and analysis. It consists of three components:

  • Elasticsearch - distributed search engine, responsible for storing and searching through the received data;

  • Logstash - log collector, parser and forwarder;

  • Kibana - Elasticsearch web frontend, the end user interface for searching and visualizing log data.

There are a lot of tutorials on setting up the ELK stack, but if you’re looking into implementing ELK as a centralized (sys)log server for your vSphere environment, you should probably look no further than SexiLog.

To quote the official site - SexiLog is a specific ELK virtual appliance designed for vSphere environment. It’s pre-configured and heavily tuned for VMware ESXi logs. This means that the nice folks from Hypervisor.fr and VMdude.fr have gone through the trouble of installing ELK, optimizing it for vSphere log collection, adding a bunch of other useful tools and packaging everything as an easy to deploy VMware virtual appliance.

The appliance is a Debian-based VM with 2vCPU, 4GB RAM and 58GB hard drive space (8GB system disk + 50GB disk for storing indexed logs), and is sized for collecting up to 1500 events per second. Default credentials for the appliance are root / Sex!Log and after each log in you will be greeted with a menu that can be used for basic configuration and operations. Since this “SexiMenu” is a bash script which can be run manually from /root/seximenu/seximenu.sh, if you want to avoid it and log in directly to shell each time, you can simply comment out the last three lines from /root/.bashrc.

After deploying the appliance, you’ll need to configure your ESXi hosts to start sending syslog data, which can be done manually through vSphere client or ESXCLI or in an automated manner, e.g. with PowerCLI. Besides syslog, SexiLog also offers the possibility of collecting SNMP traps (and has Logstash SNMP grok patterns optimized for ESXi and Veeam Backup and Replication), as well as vCenter vpxd logs and Windows event logs with the help of the NXLog agent. For more info, RTFM :)

SexiLog goodies

SexiLog offers much more than just a vanilla ELK installation. Besides providing Logstash filters optimized for vCenter, ESXi and Veeam B&R log and SNMP trap collection, it also comes with a bunch of pre-configured Kibana dashboards.

Then, there is a notification service powered by Riemann, which receives warnings and alerts from Logstash and either sends them every 10 minutes (critical alerts) or aggregates them and e-mails them once per hour (all other alerts). You can check out which events are considered critical by looking at Logstash configuration files located at /etc/logstash/conf.d/ and searching for rules which add the achtung tag. In order for Riemann to do its job, it is necessary to configure SMTP parameters for the appliance, which can be done through the SexiMenu (option 7).

Another important part of SexiLog is Curator, which is used for purging Elasticsearch indices, in order for them not to fill up the /dev/sdb1 partition used for storing logs. Curator runs once per hour, as defined in /etc/crontab:

5 * * * * root curator delete --disk-space 40

and takes maximum allowed size of indices in GB as the input parameter. By default this is set to 40[GB] and should be changed if you decide to extend SexiLog’s hard disk no. 2.

Also, for more info about the health and performance of the Elasticsearch service, SexiLog provides two useful plugins - Head and Bigdesk, which can be accessed from http://<sexilog_IP_or_FQDN>/_plugin/head and http://<sexilog_IP_or_FQDN>/_plugin/bigdesk.

Feature requests

Two possible improvements have come to my mind while using SexiLog. First, it would be nice if there was a way to provide user authentication for accessing the Kibana interface (maybe kibana-authentication-proxy could help with this since Shield seems to be a commercial product?). Also, an option to export search results would be useful (e.g. when dealing with techical support), but this seems to be a current limitation of Elasticsearch/Kibana.

Further steps

For more information about SexiLog check out the site, and be sure to try out the demo. You can also contribute to this great project via Github.

Exploring VCSA embedded PostgreSQL database

Since vSphere 5.0U1, VMware vCenter Server Appliance (VCSA) uses vPostgres - VMware flavored PostgreSQL as the embedded database. This post describes how to connect to the VCSA vPostgres server locally and remotely, and perform database backups using native PostgreSQL tools.

Note: Following procedures are probably unsupported by VMware and are given here just for the fun of hacking around VCSA. The instructions have been tested for VCSA 5.5 and 6.0 (VCSA 6.0 requires additional steps which can be found at the bottom of the post).

Connecting to PostgreSQL server locally

After logging to the VCSA over SSH or console, you can easily connect to the PostgreSQL server locally using psql:

/opt/vmware/vpostgres/current/bin/psql -U postgres

After connecting you can use psql or regular SQL commands, e.g.

# /opt/vmware/vpostgres/current/bin/psql -U postgres
psql.bin (9.3.5 (VMware Postgres 9.3.5.2-2444648 release))
Type "help" for help.

postgres=# \du
                             List of roles
 Role name |                   Attributes                   | Member of 
-----------+------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication | {}
 vc        | Create DB                                      | {}

Here we see that there are two users defined in the PostgreSQL server, the postgres superuser, and the vc user, used by vCenter for connecting to its database.

Enabling remote PostgreSQL server access

By default, only local connections to the database server are allowed. In order to allow remote access (e.g. so that you can connect via GUI based administrative tools such as pgAdmin), first take a look at the following files:

/etc/vmware-vpx/embedded_db.cfg

/etc/vmware-vpx/vcdb.properties

embedded_db.cfg file stores general PostgreSQL server information (as well as the password for the postgres superuser), while vcdb.properties stores connection information for the vCenter server database VCDB, along with the password for the vc user. Take a note of these passwords, since you’ll be required to supply them for remote access.

Then, edit the /storage/db/vpostgres/pg_hba.conf configuration file in order to allow your IP to connect to the PostgreSQL server by adding the following line:

host    all             all             1.2.3.4/24          md5

replacing 1.2.3.4/24 with the actual IP address or range of addresses for which you want to allow access (e.g. 192.168.1.0/24).

Next, edit the /storage/db/vpostgres/postgresql.conf in order to configure PostgreSQL server to listen on all available IP addresses by adding the following line:

listen_addresses = '*'

Finally, restart the PostgreSQL server by running

/etc/init.d/vmware-vpostgres restart

Backing up the vCenter database

For information on using native PostgreSQL tools to perform VCDB backups and restores check out VMware KB 2034505. The requirement for the vCenter service to be stopped during the database backup seems kinda redundant, since pg_dump should perform consistent backups even if the database is in use.

Sample backup scripts and instructions on how to schedule them via cron can be found on Florian Bidabe’s and vNinja blogs. Since mount.nfs is available on the VCSA, it seems that you can even use an NFS share as a destination for your VCDB backups (haven’t tested it though).

VCSA 6.0 additional steps

VCSA 6.0 comes extra hardened compared to previous vSphere editions and additional steps are needed in order to allow remote access to the OS and then to the PostgreSQL server.

First, you need to enable SSH access to VCSA. This can be done during the deployment, or later, over VM console (similar interface to ESXi DCUI: F2 - Troubleshooting Mode Options - Enable SSH) or vSphere Web Client (Home - System Configuration - Nodes - Manage - Settings - Access).

After logging as root over SSH, you will be greeted with a limited shell called appliancesh. To switch to bash temporarily, run:

shell.set --enabled True
shell

For switching to bash permanently you can follow the instructions from this virtuallyGhetto post.

The final step is to allow external access to the PostgreSQL through the VCSA IPTables-based firewall. This can be done by editing the /etc/vmware/appliance/firewall/vmware-vpostgres file so that it looks like this:

{
  "firewall": {
    "enable": true,
    "rules": [
      {
        "direction": "inbound",
        "protocol": "tcp",
        "porttype": "dst",
        "port": "5432",
        "portoffset": 0
      }
    ]
  },
  "internal-ports": {
    "rules": [
      {
        "name": "server_port",
        "port": 5432
      }
    ]
  }
}

Afterwards, reload VCSA firewall by running

/usr/lib/applmgmt/networking/bin/firewall-reload

and PostgreSQL server should be accessible from the outside world after configuring pg_hba.conf and postgresql.conf as described above.

Trying out CoreOS on vSphere

CoreOS is a minimal Linux distribution created to serve as a host for running Docker containers. Here’s some info on how to try it out on vSphere.

Since stable release 557, CoreOS comes packaged as an OVA and is supported as a guest OS on vSphere 5.5 and above. Deployment instructions can be found in VMware KB 2109161 and they are pretty much straightforward, with the exception of the need to modify boot parameters in order to change the password for the core user before logging in for the first time.

vCenter customization of CoreOS is currently not supported and customization can be done only through coreos-cloudinit. This is a CoreOS implementation of cloud-init, a mechanism for boot time customization of Linux instances, which can be employed on vSphere with some manual effort. More info on coreos-cloudinit and its configuration files called cloud-config can be found in the CoreOS Cloud-Config Documentation page, while for a tutorial on how to make Cloud-Config work in vSphere take a look at CoreOS with Cloud-Config on VMware ESXi.

CoreOS comes with integrated open-vm-tools, an open source implementation of VMware Tools, which is quite handy since CoreOS doesn’t offer a package manager or Perl, so no way to manually install VMware Tools after deployment. According to VMware Compatibility Guide, the open-vm-tools package has recently become the recommended way for running VMware Tools on newer Linux distros and vSphere editions (e.g. RHEL/CentOS 7.x and Ubuntu >=14.04 on vSphere 5.5 and above). For more info on open-vm-tools head to VMware KB 2073803.

As for CoreOS stable releases prior to 557, releases 522 and 494 are supported on vSphere as a Technical Preview. Since they don’t come prepackaged in a format that can be used directly on vSphere, their deployment involves one additional step - converting the downloaded .vmx file to OVF via OVF Tool. Check out VMware KB 2104303 for detailed installation instructions.

CoreOS quickstart

After logging in for the first time, you’ll probably want to start building and running containers. Obviously, Docker comes preinstalled and CoreOS currently uses BTRFS as the filesystem for storing images and containers:

$ docker -v
Docker version 1.4.1, build 5bc2ff8-dirty
$ docker info
Containers: 0
Images: 23
Storage Driver: btrfs
 Build Version: Btrfs v3.17.1
 Library Version: 101
Execution Driver: native-0.2
Kernel Version: 3.18.1
Operating System: CoreOS 557.2.0
CPUs: 2
Total Memory: 5.833 GiB
Name: core557
ID: 2BUV:642W:WTZQ:3L4O:FFIY:JOC5:XKO2:3QPC:ADEJ:LSCS:QS5K:XHKB

As mentioned before, CoreOS doesn’t offer a way to install additional packages directly to the OS, but provides Toolbox, which is by default a stock Fedora Docker container that can be used for installing sysadmin tools. Toolbox can be run using /usr/bin/toolbox, first execution of which will pull and run the container:

$ toolbox 
fedora:latest: The image you are pulling has been verified
00a0c78eeb6d: Pull complete 
834629358fe2: Pull complete 
834629358fe2: Pulling fs layer 
Status: Downloaded newer image for fedora:latest
core-fedora-latest
Spawning container core-fedora-latest on /var/lib/toolbox/core-fedora-latest.
Press ^] three times within 1s to kill container.
-bash-4.3#

After that, packages are yum install <package> away, while CoreOS filesystem is mounted inside the container to /media/root.

Further reading

Compared to traditional Linux distributions, CoreOS has a fundamentally different approach to perfoming updates, which involves automatic upgrades of the complete OS as soon as the new release is available . Therefore, take a look at the CoreOS Update Philosophy in order not to be surprised when your container hosts start automatically upgrading and rebooting themselves.

For running containers at scale, check out CoreOS documentation pages on etcd and fleet.