CNN

Steps:

  1. Filter scans through left to right, top to bottom
  2. Filter weights and Image weights have a dot product (Element-wise multiplication and sum)
  3. Use Pooling to gain information

Common Structure of Vision Models

(Filter -> Pooling) x N -> (Dense Network) x M -> Output Layer

Why CNN over Neural Network?

Theoretically, we can use NN to get the same or better results than CNN. The only issue is one low resolution image (224 x 224) has around ~50k features and with dense connection it will be so computationally expensive, that it wont be feasible.

So we use CNN filters and pooling to reduce the dimension and then use the dense layers of NN to formulate the model.


Related Notes