Skip to content

Computer Vision Reference

Resources

Fundamental Math

Computer Vision Fundamentals

  • CS131 : Computer Vision - Foundations and Applications

Deep Neural Networks

  • The Deep Learning Book
  • Gradient Descent cheatsheet
  • CS230 : Deep Learning
  • CS231n : Convolutional Neural Networks for Visual Recognition

Coursera

Pytorch

We mainly use Pytorch because it has a very low learning curve and simplifies a large amount of neural network infrastructure building. Also, it's taught at UCSD in ML courses, which lets people apply skills learned in class to a project.

  • Pytorch has some very good tutorials to learn from
  • 60 minute intro to ML: This is a good starting point, it gives an overview of how Pytorch tensors work, how autograd does all the heavy lifting, and the training process for a classifier.
  • Transfer Learning: Networks taking too long to train? Not enough data? Use someone else's network!

Ram

Computer Vision can be very RAM heavy, so just download more RAM

Coding cheat sheet

Detailed below are very common functions used across all of computer vision for manipulating images and more.

Here are the imports for the shorthand we use

import cv2  # OpenCV
import numpy as np  # numpy
from PIL import Image  # PIL
import torch  # PyTorch
import torchvision.transforms as transforms  # PyTorch transforms

Primary image objects

numpy object (OpenCV)

  • IMPORTANT NOTE: OpenCV reads images as BGR!
  • Channel values are unsigned 8-bit ints (UINT8)

Image object (PIL)

  • Channel values are floats in [0,1]

Tensor object (PyTorch)

  • Channel values are floats in [0,1]

Reading/opening images

Open as a OpenCV (numpy) image (NOTE: this opens the image as BGR)

cv2.imread(path)

Open as a PIL image

Image.Open(path)

Writing images to disk

Write OpenCV image to disk

type(img) == np.ndarray
cv2.imwrite(path, img)

Write Tensor to disk as image

type(img) == torch.Tensor
transforms.ToPILImage()(img).save(path)

Converting between image objects

Convert from PIL to OpenCV

type(img) == Image.Image
cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

Convert from OpenCV to PIL

type(img) == np.ndarray
Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

Convert from Tensor to PIL

type(img) == torch.Tensor
transforms.ToPILImage()(img)

Convert from PIL to Tensor

type(img) == Image.Image
transforms.ToTensor()(img)

Converting between colorspaces

OpenCV

OpenCV has several constants that specify which colorspace convert from and to.

cv2.COLOR_RGB2BGR  # RGB -> BGR
cv2.COLOR_BGR2GRAY  # BGR -> grayscale
# etc... you can swap the order of the colorspaces too
There exist many other colorspaces that you may find helpful (CIE, HSV, etc) and you can check all the conversions in the OpenCV docs here.

To actually convert the image

# replace cv2.COLOR_RGB2BGR with whatever you need
cv2.cvtColor(img, cv2.COLOR_RGB2BGR)