Computer Vision with Deep Learning

From image classification to object detection, segmentation, and generative models. Learn how deep learning has transformed computer vision.

Level: Advanced · Category: Computer Vision · Estimated time: 7 hours

Prerequisites

CNN Fundamentals — Convolutions, filters, pooling, feature maps, and receptive fields.
Classic Architectures — LeNet, AlexNet, VGG, ResNet, Inception, and EfficientNet.
Transfer Learning & Fine-Tuning — Using pre-trained models, freezing layers, and domain adaptation.
Object Detection: YOLO & R-CNN Family — Anchor boxes, YOLO, SSD, Faster R-CNN, and modern detectors.
Image Segmentation — Semantic segmentation, instance segmentation, U-Net, and Mask R-CNN.
Vision Transformers (ViT) — Applying transformers to images, patch embeddings, and ViT variants.
GANs & Image Generation — Generative adversarial networks, StyleGAN, and image synthesis.
Diffusion Models — Denoising diffusion, DDPM, Stable Diffusion, and controllable generation.

cnn, object-detection, image-classification, yolo, gans, vision-transformers, segmentation