Training neural networks (deep learning) is very compute-intensive. Fast GPUs can make those sessions, which sometimes
take hours, days or weeks go orders of magnitude faster. However, laptops usually don't come with the fastest GPUs and having
to maintain a desktop machine only to occasionally run deep learning tasks is extra hassle.
Cloud providers now offer virtual machines (VMs) with GPUs which run in data centers and can be used by anybody on an hourly basis.
Below is a quick tutorial that walks through setting up a VM in Microsoft Azure with the necessary drivers
to train neural networks using TensorFlow.
First, if you haven't done so already, create an Azure account, install the Azure 2.0 command line interface (CLI)...
sudo pip install azure-cli
... and follow the login procedure:
az login
Azure manages resources (virtual machines, storage etc.) via resource groups.
GPU virtual machine instances are currently available in the East US region. If you already have a group for that region feel free
to use it, otherwise create a new resource group:
az group create --name tensorflow --location EastUS
We will connect to the machine via SSH and need to create a key pair:
Next, we create the actual virtual machine running Ubuntu 16.04.
We choose the cheapest and least powerful GPU size (NC6) and downgrade from premium (SSD) to standard storage (HDD) as the former is not supported for NC instances yet.
az vm create --resource-group tensorflow --name tensorflow --image Canonical:UbuntuServer:16.04-LTS:latest --size Standard_NC6 --storage-type Standard_LRS --admin-username tensorflow --ssh-key-value ~/.ssh/tensorflow_id_rsa.pub
Once completed, the command will print the IP address for the newly created machine:
We can now start a Python console and create a TensorFlow session:
python
>>> import tensorflow as tf
>>> session = tf.Session()
If everything went well, it will recognize the Tesla K80 GPU:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885]
Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID b0b5:00:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
Remember to stop the VM when done to avoid using cycles:
az vm stop --resource-group tensorflow --name tensorflow
Once no longer needed, you can delete the virtual machine by running:
az vm delete --resource-group tensorflow --name tensorflow