Convolutional Neural Networks work like photo filters on your phone. Imagine you have a photo and want to detect edges or patterns. A filter is like a small window that slides across the entire image, looking for specific features. Just like how Instagram filters transform your photos, CNN filters scan and detect important patterns in images.
The convolution operation is the core of CNN. We take a small filter and slide it across the input image. At each position, we multiply corresponding values and sum them up. This creates a feature map that highlights where specific patterns are detected. Positive values in green detect one type of feature, negative values in red detect the opposite.
After convolution, we apply pooling to reduce the size of feature maps. Max pooling takes the maximum value from each region, preserving the strongest features. Average pooling takes the mean value, providing a smoother representation. Both methods make the network more efficient and help prevent overfitting by reducing the number of parameters.
A complete CNN architecture combines all these layers. The input image goes through convolution layers that detect features, pooling layers that reduce size, and finally fully connected layers for classification. Each convolution layer learns different patterns - early layers detect edges and textures, while deeper layers recognize complex shapes and objects. The final output gives probabilities for each digit class.
Here we see the training results of LeNet on the MNIST dataset. The accuracy curve shows rapid improvement from 10% to over 98% in just 10 epochs. The loss curve decreases smoothly, indicating stable training. This demonstrates the power of CNNs for image classification tasks. With PyTorch, implementing and training such networks becomes straightforward, making deep learning accessible for practical applications.