d(ui , vi ) dt = − ∇F(u1 , v1 , …) := ∑ i ( n ∑ k=1 uk σ(⟨zi , vk ⟩) − yi)2 Training Dynamics of 2 layers MLP 2 layers perceptron: z ↦ ∑n k=1 uk σ(⟨z, uk ⟩) Theorem: for perceptrons, if has « enough neurons », can only converge to a global minimum. αt=0 αt « Global » convergence, despite not being convex. → F Lenaic Chizat Francis Bach σ (uk )k (vk )k z f(α) = ∫ kdα ⊗ α + ∫ hdα := ∑ i (∫ uσ(⟨z, v⟩)dα(u, v) − yi)2 α = ∑ k δ(uk ,vk ) ∂αt ∂t − div(∇W f(αt ) αt ) = 0 min α f(α) α (uk , vk ) k(u, v, u′  , v′  ) := ∑ i uu′  σ(⟨zi , v⟩)σ(⟨zi , v′  ⟩)