UCSD Datahub
Overview
A resource provided by UCSD that allows us to take advantage of UCSD's powerful GPUs to speed up our various machine learning and computer vision tasks. We can remotely connect these computers to access their power no matter where we are.
Setup
Fill out this form to request access to use Datahub for Triton UAS. If you have any questions about the form fields, contact a software lead.
Usage
Connecting and accessing
There are 3 ways of connecting to and accessing Datahub:
-
Web Interface
- Navigate to https://datahub.ucsd.edu and login with your UCSD active directory credentials
- Select an environment with a GPU
- Click "New" in the top right to launch a terminal to run your code
- Create/upload/edit files in the graphical file manager and text editor
-
VS Code SSH Extension
- Install VS Code
- Install the Remote - SSH VS Code extension
- Generate an SSH key locally if you haven't already:
- Use the command
ssh-keygen -t rsa
to generate one
- Use the command
- Copy the contents of your local public key
- If you used the
ssh-keygen -t rsa
command then your key will be located at~/.ssh/id_rsa.pub
on your local machine - If you used a diffent key generation algorithm then your public key will be named something else. Just make sure the file ends with the
.pub
extension
- If you used the
- Open a terminal and SSH into Datahub with the command
ssh UCSD-USERNAME@dsmlp-login.ucsd.edu
- Paste your public key into the file at
~/.ssh/authorized_keys
on Datahub - Open the
~/.ssh/config
file on your local machine -
Add a Host entry to the
~/.ssh/config
file that will tell VS Code to spawn a new container on Datahub and run the VS Code Remote Server inside it. (Note that this config does not request a GPU)Host datahub User UCSD-USERNAME ProxyCommand ssh -i ~/.ssh/id_rsa UCSD-USERNAME@dsmlp-login.ucsd.edu /opt/launch-sh/bin/launch-scipy-ml.sh -P Always -i tritonuas/cv-docker:master -H -N datahub-vscode
- Note that the path to your local ssh private key may vary depending on which key generation algorithm you used
-
Add a similar host entry that will request a GPU. If you're not training a model with a GPU then fallback to the previous host entry.
Host datahub-gpu User UCSD-USERNAME ProxyCommand ssh -i ~/.ssh/id_rsa UCSD-USERNAME@dsmlp-login.ucsd.edu /opt/launch-sh/bin/launch-scipy-ml.sh -P Always -g 1 -i tritonuas/cv-docker:master -H -N datahub-vscode
-
In the menubar click View -> Command Pallete
- Select "Connect to Host"
- Select one of the host entries you created
- Open your project folder and use the integrated VS Code terminal to run commands
- You should be inside our container and be able to use
pipenv
for example
- You should be inside our container and be able to use
-
SSH (Terminal)
- Enter the command:
ssh UCSDUSERNAME@dsmlp-login.ucsd.edu
- Use a prebuilt docker container or a custom one you've created (more info here)
- Enter the command:
Long-running tasks/training (background containers)
Accessing background containers via SSH ProxyCommand currently doesn't work, so you won't be able to use VS Code to work in a background container. To get around this, after finishing development through VS Code, you'll have to exit and shut down the container, and then SSH in from the terminal directly to start and access a background container. You'll be running the training commands directly like this.
Before you train your model properly on a full dataset and higher epochs, make sure the code as a whole works on a smaller dataset and lower epochs (so an exception doesn't happen after several hours of training) in order to have metrics and results generate properly at the end.
Simply add the -b
flag to to the command you used to launch the container. This allows your container to run in the background for up to 6 hours (the default is much lower).
To enter the shell of a running container:
Note that once you run a command to train through the terminal, the shell will lock up. In order to avoid this (and not accidentally kill the training in the terminal), you can use nohup
to send the training task to the background and free up the shell.
The way to use nohup
is
The "&" at the end is crucial as this actually frees the shell and pipes all the output from the command into a file called nohup.out
. Make sure to gitignore this file!
For example, if you were running it with a pipenv command
Now the shell is safe to close, and the model will train in the background!
Datahub Containers
While accessing Datahub from the terminal, you will probably need to use a Docker container to run the software you're intersted in. By default, Datahub doesn't have things like Python3, Pytorch, CUDA, etc. enabled. We can use Docker images to access this software and make our own.
Premade Container
There are two ways we recommend to pull and launch premade containers:
1. Use our premade container
Run this command (recommended to place this command in a script):
If you're feeling spicy, you can actually specify which GPU on Datahub you want to use (make sure it's actually available on the status page). Example, if you want to request a 2080TI:
2. Select one of the prebuilt UCSD ETS containers
Select a container here and launch the container on datahub using the instructions here
Background Containers
If you would like your processes to not be killed when your host machine loses connection to the remote Datahub server, you can launch Docker contains to stay alive in the background.
Simply add the -b
flag to to the command you used to launch the container. This allows your container to run in the background for up to 6 hours.
If you are using our premade container the command will look something like this:
Here are a few helpful commands for interfacing with background containers:
Get a list of all running containers:
Enter the shell of a running container (POD_NAME
is from the output of the previous command):
Stopsa and deletes a running container: