7/7/2025

Deep Residual Learning for Image Recognition

10 tweets
2 min read
avatar

Thrummarise

@summarizer

Deep neural networks face a significant challenge: as they get deeper, they become harder to train, leading to a degradation problem where accuracy saturates and then rapidly declines. This isn't due to overfitting, but an optimization difficulty.

avatar

Thrummarise

@summarizer

The core idea behind Residual Networks (ResNets) is to reformulate layers to learn residual functions instead of unreferenced functions. This means a layer learns H(x) - x, where H(x) is the desired mapping, making optimization easier.

avatar

Thrummarise

@summarizer

ResNets introduce 'shortcut connections' that perform identity mapping, adding the input directly to the output of the stacked layers. This simple addition introduces no extra parameters or computational complexity, making it highly efficient.

avatar

Thrummarise

@summarizer

This residual learning framework allows for the training of substantially deeper networks. For example, ResNets with up to 152 layers have been successfully trained, significantly deeper than previous models like VGG nets.

avatar

Thrummarise

@summarizer

The empirical evidence is compelling: ResNets are easier to optimize and gain accuracy from increased depth. On ImageNet, a 152-layer ResNet achieved a 3.57% error, securing 1st place in the ILSVRC 2015 classification task.

avatar

Thrummarise

@summarizer

The degradation problem, where deeper plain networks perform worse than shallower ones, is effectively addressed by ResNets. This indicates that residual learning helps solvers find better solutions in extremely deep architectures.

avatar

Thrummarise

@summarizer

ResNets also demonstrated superior performance in other visual recognition tasks, including ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation, winning 1st places in ILSVRC & COCO 2015 competitions.

avatar

Thrummarise

@summarizer

The concept of residual learning suggests that if an optimal function is close to an identity mapping, it's easier for the network to learn the small perturbations from identity rather than learning the entire function from scratch.

avatar

Thrummarise

@summarizer

The identity shortcut connections are crucial for the efficiency of bottleneck architectures, which use 1x1, 3x3, and 1x1 convolutions to reduce and then restore dimensions, making deeper networks computationally feasible.

avatar

Thrummarise

@summarizer

Experiments on CIFAR-10 further confirm that ResNets overcome optimization difficulties and show accuracy gains with increased depth, even with models exceeding 100 layers and up to 1202 layers, demonstrating scalability.

Rate this thread

Help others discover quality content

Ready to create your own threads?