
CHAPTER6.DEEPFEEDFORWARDNETWORKS
havelongbeenusedtosolveoptimizationproblemsinclosedform,butgradient
descentwasnotintroducedasatechniqueforiterativelyapproximatingthesolution
tooptimizationproblemsuntilthenineteenthcentury(Cauchy,1847).
Beginninginthe1940s,thesefunctionapproximationtechniqueswereusedto
motivatemachinelearningmodelssuchastheperceptron.However,theearliest
modelswerebasedonlinearmodels.CriticsincludingMarvinMinskypointedout
severaloftheflawsofthelinearmodelfamily,suchasitsinabilitytolearnthe
XORfunction,whichledtoabacklashagainsttheentireneuralnetworkapproach.
Learningnonlinearfunctionsrequiredthedevelopmentofamultilayerper-
ceptronandameansofcomputingthegradientthroughsuchamodel.Efficient
applicationsofthechainrulebasedondynamicprogrammingbegantoappear
inthe1960sand1970s,mostlyforcontrolapplications(Kelley,1960;Brysonand
Denham,1961;Dreyfus,1962;BrysonandHo,1969;Dreyfus,1973)butalsofor
sensitivityanalysis(Linnainmaa,1976).Werbos(1981)proposedapplyingthese
techniquestotrainingartificialneuralnetworks.Theideawasfinallydeveloped
inpracticeafterbeingindependentlyrediscoveredindifferentways(LeCun,1985;
Parker,1985;Rumelhartetal.,1986a).Thebook
ParallelDistributedPro-
cessing
presentedtheresultsofsomeofthefirstsuccessfulexperimentswith
back-propagationinachapter(Rumelhartetal.,1986b)thatcontributedgreatly
tothepopularizationofback-propagationandinitiatedaveryactiveperiodofre-
searchinmultilayerneuralnetworks.Theideasputforwardbytheauthorsofthat
book,particularlybyRumelhartandHinton,gomuchbeyondback-propagation.
Theyincludecrucialideasaboutthepossiblecomputationalimplementationof
severalcentralaspectsofcognitionandlearning,whichcameunderthename
“connectionism”becauseoftheimportancethisschoolofthoughtplacesonthe
connectionsbetweenneuronsasthelocusoflearningandmemory.Inparticular,
theseideasincludethenotionofdistributedrepresentation(Hintonetal.,1986).
Followingthesuccessofback-propagation,neuralnetworkresearchgainedpop-
ularityandreachedapeakintheearly1990s.Afterwards,othermachinelearning
techniquesbecamemorepopularuntilthemoderndeeplearningrenaissancethat
beganin2006.
Thecoreideasbehindmodernfeedforwardnetworkshavenotchangedsub-
stantiallysincethe1980s. Thesameback-propagationalgorithmandthesame
approachestogradientdescentarestillinuse.Mostoftheimprovementinneural
networkperformancefrom1986to2015canbeattributedtotwofactors.First,
largerdatasetshavereducedthedegreetowhichstatisticalgeneralizationisa
challengeforneuralnetworks.Second,neuralnetworkshavebecomemuchlarger,
becauseofmorepowerfulcomputersandbettersoftwareinfrastructure.Asmall
221