CNN
- CNN = Convolutional Neural Network
- It reduces the number of input nodes
- Tolerate small shifts in images (as pooling is used, small shift result in same weight)
- Take advantage of local context or relation as it uses filter to gather local information
- The matrix obtained as a result convolution operation is called activation map
- Typically ReLU activation function is used
- Usually use Padding in CNN
- Stride in CNN is used to scan through the image
- Typical case:
- Filter Size of 2 or 3
- Stride size of 2
- Max pooling
Steps:
- Filter scans through left to right, top to bottom
- Filter weights and Image weights have a dot product (Element-wise multiplication and sum)
- Use Pooling to gain information
Common Structure of Vision Models
(Filter -> Pooling) x N -> (Dense Network) x M -> Output Layer
Why CNN over Neural Network?
Theoretically, we can use NN to get the same or better results than CNN. The only issue is one low resolution image (224 x 224) has around ~50k features and with dense connection it will be so computationally expensive, that it wont be feasible.
So we use CNN filters and pooling to reduce the dimension and then use the dense layers of NN to formulate the model.