Embed presentation










![11High dimensionality discussion ---curse• [0,1]^d•…](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-11-2048.jpg&f=jpg&w=240)



















![A 2-layer Neural Network, one hiddenlayer of 4 neurons (or units), and oneoutput layer with 2 neurons, and threeinputs.The network has 4 + 2 = 6 neurons (notcounting the inputs), [3 x 4] + [4 x 2] = 20weights and 4 + 2 = 6 biases, for a totalof 26 learnable parameters.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-31-2048.jpg&f=jpg&w=240)
![A 3-layer neural network with threeinputs, two hidden layers of 4 neuronseach and one output layer. Notice that inboth cases there are connections(synapses) between neurons acrosslayers, but not within a layer.The network has 4 + 4 + 1 = 9 neurons,[3 x 4] + [4 x 4] + [4 x 1] = 12 + 16 + 4 = 32weights and 4 + 4 + 1 = 9 biases, for atotal of 41 learnable parameters.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-32-2048.jpg&f=jpg&w=240)





![Cost or loss functions• Usually the parametric model f(x,theta) defines a distribution p(y | x; theta) and weuse the maximum likelihood, that is the cross-entropy between the training data y andthe model’s predictions f(x,theta) as the loss or the cost function.– The cross-entropy between a ‘true’ distribution p and an estimated distribution q is H(p,q) = - sum_x p(x) log q(x)• The cost J(theta) = - E log p (y|x), if p is normal, we have the mean squared error cost J= ½ E ||y-f(x; theta)||^2+const• The cost can also be viewed as a functional, mapping functions to real numbers, weare learning functions f parameterized by theta. By calculus of variations, f(x) = E_* [y]• The SVM loss is carefully designed and special, hinge loss, max margin loss,• The softmax is the cross-entropy between the estimated class probabilities e^y_i / sum e and the true class labels, also the negative log likelihood loss L_i = - log(e^y_i / sum e)](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-38-2048.jpg&f=jpg&w=240)





















![LeNet: a layered model composed of convolution andsubsampling operations followed by a holisticrepresentation and ultimately a classifier forhandwritten digits. [ LeNet ]Convolutional Neural Networks: 1998. Input 32*32. CPU](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-60-2048.jpg&f=jpg&w=240)
![AlexNet: a layered model composed of convolution,subsampling, and further operations followed by aholistic representation and all-in-all a landmarkclassifier onILSVRC12. [ AlexNet ]+ data+ gpu+ non-saturating nonlinearity+ regularizationConvolutional Neural Networks: 2012. Input 224*224*3. GPU.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-61-2048.jpg&f=jpg&w=240)






![Pooling layer down-samples the volume spatially, independently in eachdepth slice of the input volume.Left: the input volume of size [224x224x64] is pooled with filter size 2,stride 2 into output volume of size [112x112x64]. Notice that the volumedepth is preserved.Right: The most common down-sampling operation is max, giving riseto max pooling, here shown with a stride of 2. That is, each max is takenover 4 numbers (little 2x2 square).](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-68-2048.jpg&f=jpg&w=240)

























































1. Introduction généraleLes réseaux de neurones convolutifs, ou CNN (Convolutional Neural Networks), représentent une avancée majeure dans le domaine de l'intelligence artificielle, et plus spécifiquement dans celui de l’apprentissage automatique supervisé. Inspirés du fonctionnement du cortex visuel humain, les CNN sont aujourd’hui omniprésents dans les applications de reconnaissance d’images, de vidéos, de traitement du langage naturel, de diagnostic médical, et bien plus encore.L’objectif de cette présentation est de fournir une compréhension approfondie mais accessible des CNN. Nous allons explorer leur structure, leur fonctionnement, les raisons de leur efficacité, ainsi que des exemples d’applications réelles. Le tout sera soutenu par des illustrations et, si pertinent, des démonstrations pratiques.2. Contexte et motivationa. OrigineLes CNN trouvent leurs racines dans les années 1980 avec les travaux de Yann LeCun, qui a introduit les premiers réseaux convolutifs pour la reconnaissance de chiffres manuscrits. Ce système a été notamment utilisé par la banque américaine pour la lecture automatique des chèques.b. Pourquoi les CNN ?Avant les CNN, les modèles traditionnels nécessitaient une étape manuelle de feature engineering, c’est-à-dire que les humains devaient extraire les caractéristiques d’une image à la main (ex : bords, coins, formes). Les CNN permettent à la machine d’apprendre automatiquement ces caractéristiques à partir des données brutes.c. ApplicationsVision par ordinateur : détection d’objets, reconnaissance faciale, segmentation d’images.Médical : détection de tumeurs sur des IRM.Sécurité : reconnaissance biométrique, surveillance vidéo intelligente.Voitures autonomes : lecture des panneaux, identification des piétons.Art et création : style transfer, colorisation automatique.3. Anatomie d’un CNNUn CNN est composé de plusieurs couches, chacune jouant un rôle spécifique dans le traitement et l’analyse des images.a. Convolution LayerLe cœur du CNN.Applique un filtre (ou noyau) sur l’image pour extraire des caractéristiques locales.Par exemple, un filtre peut détecter des bords verticaux ou horizontaux.b. ReLU (Rectified Linear Unit)Fonction d’activation non-linéaire.Applique f(x) = max(0, x) à chaque valeur, supprimant les valeurs négatives.Permet d’introduire de la non-linéarité dans le modèle.c. Pooling Layer (Sous-échantillonnage)Réduit la taille des représentations (feature maps).Les plus courants : Max Pooling, Average Pooling.Réduction de la complexité, amélioration de la robustesse.d. Flatten + Fully Connected LayersÀ la fin du CNN, les données sont aplaties puis traitées par des couches entièrement connectées.C’est ici que la classification finale est effectuée (ex. : chat ou chien).4. Fonctionnement global d’un CNNa. Propagation avant (Forward Propagation)L’image passe de couche en couche, transformée à chaque étape.










![11High dimensionality discussion ---curse• [0,1]^d•…](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-11-2048.jpg&f=jpg&w=240)



















![A 2-layer Neural Network, one hiddenlayer of 4 neurons (or units), and oneoutput layer with 2 neurons, and threeinputs.The network has 4 + 2 = 6 neurons (notcounting the inputs), [3 x 4] + [4 x 2] = 20weights and 4 + 2 = 6 biases, for a totalof 26 learnable parameters.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-31-2048.jpg&f=jpg&w=240)
![A 3-layer neural network with threeinputs, two hidden layers of 4 neuronseach and one output layer. Notice that inboth cases there are connections(synapses) between neurons acrosslayers, but not within a layer.The network has 4 + 4 + 1 = 9 neurons,[3 x 4] + [4 x 4] + [4 x 1] = 12 + 16 + 4 = 32weights and 4 + 4 + 1 = 9 biases, for atotal of 41 learnable parameters.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-32-2048.jpg&f=jpg&w=240)





![Cost or loss functions• Usually the parametric model f(x,theta) defines a distribution p(y | x; theta) and weuse the maximum likelihood, that is the cross-entropy between the training data y andthe model’s predictions f(x,theta) as the loss or the cost function.– The cross-entropy between a ‘true’ distribution p and an estimated distribution q is H(p,q) = - sum_x p(x) log q(x)• The cost J(theta) = - E log p (y|x), if p is normal, we have the mean squared error cost J= ½ E ||y-f(x; theta)||^2+const• The cost can also be viewed as a functional, mapping functions to real numbers, weare learning functions f parameterized by theta. By calculus of variations, f(x) = E_* [y]• The SVM loss is carefully designed and special, hinge loss, max margin loss,• The softmax is the cross-entropy between the estimated class probabilities e^y_i / sum e and the true class labels, also the negative log likelihood loss L_i = - log(e^y_i / sum e)](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-38-2048.jpg&f=jpg&w=240)





















![LeNet: a layered model composed of convolution andsubsampling operations followed by a holisticrepresentation and ultimately a classifier forhandwritten digits. [ LeNet ]Convolutional Neural Networks: 1998. Input 32*32. CPU](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-60-2048.jpg&f=jpg&w=240)
![AlexNet: a layered model composed of convolution,subsampling, and further operations followed by aholistic representation and all-in-all a landmarkclassifier onILSVRC12. [ AlexNet ]+ data+ gpu+ non-saturating nonlinearity+ regularizationConvolutional Neural Networks: 2012. Input 224*224*3. GPU.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-61-2048.jpg&f=jpg&w=240)






![Pooling layer down-samples the volume spatially, independently in eachdepth slice of the input volume.Left: the input volume of size [224x224x64] is pooled with filter size 2,stride 2 into output volume of size [112x112x64]. Notice that the volumedepth is preserved.Right: The most common down-sampling operation is max, giving riseto max pooling, here shown with a stride of 2. That is, each max is takenover 4 numbers (little 2x2 square).](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fcnn-250420180715-645672fd%2f75%2fIntroduction-to-Convolutional-Neural-Network-pptx-68-2048.jpg&f=jpg&w=240)
























































