Classification
Background
Classification in computer vision is quite literally, the process of classifying the different pixels in an image into predetermined classes.
Overview
We have three main classification tasks: shape, character, and color. Classification as a task will eventaully be consolidated into the Taxonomy-101 repository, but it is currently split up.
Current Implementation
Both shape and character classification are implemented in transfer using the PyTorch framework.
The classification network is the one stop shop for all your classification needs, instead of using separate networks for shape, character, and rotation, we combined them all into one network with three different output layers.
There are two benefits to this:
- Reduced training time: Separate networks would need to retrain their basic feature detecting layers, but combining them together lets all three tasks "share" the training
- Specifically for character and rotation, there are situations where we need the outputs to agree, even if they are wrong. The competition gives us some leeway on the character orientations, any easily mistaken combination can be accepted. For example, if the real target is a 6 facing north, we could submit a 9 facing south and get full points. Two separately trained networks could very easily predict a mismatch, since they know nothing about what the other is predicting. So we could get a result that gives us fewer points. Training them together means that the character and orientation parts of the network will give a prediction that is consistent with the other half.
The network itself is a fairly standard convolutional network. To see the full architecture, take a look here. The layer sizes were chosen based on a few test training runs to see what size would give a high accuracy, but they may need to be tweaked.
Color
Color classification is currently implemented with a simple KNN.