Question Bank
#839

Why Deep Sigmoids Stop Learning

EasyMachine Learning

Problem

Compute the maximum of σ(x)\sigma'(x) for the sigmoid σ(x)=11+ex\sigma(x) = \frac{1}{1+e^{-x}}. Use it to show why stacking sigmoid layers starves early layers of gradient, explain what ReLU changes, and name ReLU's own failure mode.

Your answer

Accepts decimals, fractions (5/12), and percentages (25%).

Hints