Movatterモバイル変換

[0]ホーム

Jump to content

Human image synthesis

Edit links

From Wikipedia, the free encyclopedia

Computer generation of human images

In this morph target animation system four "expressions" have been defined as deformations of the geometry of the model. Any combination of these four expressions can be used to animate the mouth shape. Similar controls can be applied to animate an entire human-like model.

Human image synthesis is technology that can be applied to make believable and evenphotorealistic renditions^[1]^[2] of human-likenesses, moving or still. It has effectively existed since the early 2000s. Many films usingcomputer generated imagery have featured synthetic images of human-like charactersdigitally composited onto the real or other simulated film material. Towards the end of the 2010sdeep learning artificial intelligence has been applied tosynthesize images and video that look like humans, without need for human assistance, once the training phase has been completed, whereas the old school 7D-route required massive amounts of human work.

Timeline of human image synthesis

[edit]

Main articles:History of computer animation andTimeline of computer animation in film and television

In 1971Henri Gouraud made the firstCG geometry capture and representation of a human face. Modeling was his wife Sylvie Gouraud. The 3D model was a simplewire-frame model and he appliedthe Gouraud shader he is most known for to produce the first known representation of human-likeness on computer.^[3]^[4]
The 1972 short filmA Computer Animated Hand byEdwin Catmull andFred Parke was the first time thatcomputer-generated imagery was used in film to simulate moving human appearance. The film featured a computer simulated hand and face(watch film here).
The 1976 filmFutureworld reused parts ofA Computer Animated Hand on the big screen.
The 1983music video for song Musique Non-Stop by German bandKraftwerk aired in 1986. Created by the artistRebecca Allen, it features non-realistic looking, but clearly recognizable computer simulations of the band members.
The 1994 filmThe Crow was the first film production to make use of digital compositing of a computer simulated representation of a face onto scenes filmed using abody double. Necessity was the muse as the actorBrandon Lee portraying the protagonist was tragically killed accidentally on-stage.
In 1999Paul Debevec et al. ofUSC captured the reflectance field of a human face with their first version of alight stage. They presented their method at theSIGGRAPH 2000^[5]

In 2003audience debut of photo realistic human-likenesses in the 2003 filmsThe Matrix Reloaded inthe burly brawl sequence where up-to-100Agent Smiths fightNeo and inThe Matrix Revolutions where at the start of the end showdown Agent Smith'scheekbone gets punched in by Neo leaving the digital look-alike unnaturally unhurt. The Matrix Revolutions bonus DVD documents and depicts the process in some detail and the techniques used, includingfacial motion capture andlimbal motion capture, andprojection onto models.
In 2003The Animatrix: Final Flight of the Osiris astate-of-the-art want-to-be human likenesses not quite fooling the watcher made bySquare Pictures.
In 2003 digital likeness ofTobey Maguire was made for moviesSpider-man 2 andSpider-man 3 bySony Pictures Imageworks.^[6]
In 2005 theFace of the Future project was an established.^[7] by theUniversity of St Andrews and Perception Lab, funded by theEPSRC.^[8] The website contains a "Face Transformer", which enables users to transform their face into anyethnicity andage as well as the ability to transform their face into a painting (in the style of eitherSandro Botticelli orAmedeo Modigliani).^[9] This process is achieved by combining the user's photograph with anaverage face.^[8]
In 2009 Debevec et al. presented new digital likenesses, made byImage Metrics, this time of actressEmily O'Brien whose reflectance was captured with the USC light stage 5^[10] Motion looks fairly convincing contrasted to the clunky run in theAnimatrix: Final Flight of the Osiris which wasstate-of-the-art in 2003 if photorealism was the intention of the animators.
In 2009 a digital look-alike of a youngerArnold Schwarzenegger was made for the movieTerminator Salvation though the end result was critiqued as unconvincing. Facial geometry was acquired from a 1984 mold of Schwarzenegger.
In 2010Walt Disney Pictures released a sci-fi sequel entitledTron: Legacy with a digitally rejuvenated digital look-alike of actorJeff Bridges playing theantagonist CLU.
In SIGGGRAPH 2013Activision and USC presented areal time "Digital Ira" a digital face look-alike of Ari Shapiro, an ICT USC research scientist,^[11] utilizing the USC light stage X by Ghosh et al. for both reflectance field and motion capture.^[12] The end result both precomputed and real-time rendering with the modernest gameGPU shown here and looks fairly realistic.
In 2014 The Presidential Portrait by USCInstitute for Creative Technologies in conjunction with theSmithsonian Institution was made using the latest USC mobile light stage wherein PresidentBarack Obama had his geometry, textures and reflectance captured.^[13]

In 2014Ian Goodfellow et al. presented the principles of agenerative adversarial network. GANs made the headlines in early 2018 with thedeepfakes controversies.
For the 2015 filmFurious 7 a digital look-alike of actorPaul Walker who died in an accident during the filming was done byWeta Digital to enable the completion of the film.^[14]
In 2016 techniques which allownear real-time counterfeiting offacial expressions in existing 2D video have been believably demonstrated.^[15]
In 2016 a digital look-alike ofPeter Cushing was made for theRogue One film where its appearance would appear to be of same age as the actor was during the filming of the original 1977Star Wars film.
In SIGGRAPH 2017 an audio driven digital look-alike of upper torso of Barack Obama was presented by researchers fromUniversity of Washington.^[16] It was driven only by a voice track as source data for the animation after the training phase to acquirelip sync and wider facial information fromtraining material consisting 2D videos with audio had been completed.^[17]

Late 2017^[18] and early 2018 saw the surfacing of thedeepfakes controversy whereporn videos were doctored usingdeep machine learning so that the face of the actress was replaced by the software's opinion of what another persons face would look like in the same pose and lighting.
In 2018Game Developers Conference Epic Games andTencent Games demonstrated "Siren", a digital look-alike of the actressBingjie Jiang. It was made possible with the following technologies:CubicMotion'scomputer vision system,3Lateral's facial rigging system andVicon's motion capture system. The demonstration ran in near real time at 60 frames per second in theUnreal Engine 4.^[19]
In 2018 at theWorld Internet Conference inWuzhen theXinhua News Agency presented two digital look-alikes made to the resemblance of its real news anchors Qiu Hao (Chinese language)^[20] and Zhang Zhao (English language). The digital look-alikes were made in conjunction withSogou.^[21] Neither thespeech synthesis used nor the gesturing of the digital look-alike anchors were good enough to deceive the watcher to mistake them for real humans imaged with a TV camera.
In September 2018 Google added "involuntary synthetic pornographic imagery" to its ban list, allowing anyone to request the search engine block results that falsely depict them as "nude or in a sexually explicit situation."^[22]

In February 2019Nvidia open sources StyleGAN, a novelgenerative adversarial network.^[23] Right after this Phillip Wang made the website ThisPersonDoesNotExist.com with StyleGAN to demonstrate that unlimited amounts of often photo-realistic looking facial portraits of no-one can be made automatically using a GAN.^[24] Nvidia's StyleGAN was presented in a not yetpeer reviewed paper in late 2018.^[24]

At the June 2019CVPR theMIT CSAIL presented a system titled"Speech2Face: Learning the Face Behind a Voice" that synthesizes likely faces based on just a recording of a voice. It was trained with massive amounts of video of people speaking.
Since 1 July 2019^[25]Virginia has criminalized the sale and dissemination of unauthorized synthetic pornography, but not the manufacture.,^[26] as§ 18.2–386.2 titled 'Unlawful dissemination or sale of images of another; penalty.' became part of theCode of Virginia. The law text states: "Any person who, with theintent tocoerce,harass, orintimidate,maliciously disseminates or sells any videographic or still image created by any means whatsoever that depicts another person who is totally nude, or in a state of undress so as to expose thegenitals, pubic area,buttocks, or femalebreast, where such person knows or has reason to know that he is notlicensed orauthorized to disseminate or sell such videographic or still image is guilty of a Class 1misdemeanor.".^[26] The identical bills were House Bill 2678 presented byDelegate Marcus Simon to theVirginia House of Delegates on 14 January 2019 and three-day later an identical Senate bill 1736 was introduced to theSenate of Virginia by SenatorAdam Ebbin.

Since 1 September 2019Texas senate bill SB 751amendments to the election code came into effect, givingcandidates inelections a 30-day protection period to the elections during which making and distributing digital look-alikes or synthetic fakes of the candidates is an offense. The law text defines the subject of the law as "a video, created with the intent to deceive, that appears to depict a real person performing an action that did not occur in reality"^[27]
In September 2019Yle, the Finnishpublic broadcasting company, aired a result of experimental journalism, a deepfake of the President in officeSauli Niinistö in its main news broadcast for the purpose of highlighting the advancing disinformation technology and problems that arise from it.
1 January 2020^[28] California thestate law AB-602 came into effect banning the manufacturing anddistribution of synthetic pornography without theconsent of the people depicted. AB-602 provides victims of synthetic pornography withinjunctive relief and poses legal threats ofstatutory andpunitive damages on criminals making or distributing synthetic pornography without consent. The bill AB-602 was signed into law by CaliforniaGovernor Gavin Newsom on 3 October 2019 and was authored byCalifornia State Assembly memberMarc Berman.^[29]
1 January 2020, Chinese law requiring that synthetically faked footage should bear a clear notice about its fakeness came into effect. Failure to comply could be considered a crime theCyberspace Administration of China stated on its website. China announced this new law in November 2019.^[30] The Chinese government seems to be reserving the right to prosecute both users andonline video platforms failing to abide by the rules.^[31]12 November [deepfake]

Key breakthrough to photorealism: reflectance capture

[edit]

ESPER LightCage is an example of aspherical light stage withmulti-camera setup around the sphere suitable for capturing into a 7D reflectance model.

In 1999Paul Debevec et al. of USC did the first knownreflectance capture over the human face with their extremely simplelight stage. They presented their method and results inSIGGRAPH 2000.^[5]

Bidirectional scattering distribution function (BSDF) for human skin likeness requires bothBRDF and special case of BTDF wherelight enters the skin, is transmitted and exits the skin.

The scientific breakthrough required finding thesubsurface light component (the simulation models are glowing from within slightly) which can be found using knowledge that light that is reflected from the oil-to-air layer retains itspolarization and the subsurface light loses its polarization. So equipped only with a movable light source, movable video camera, 2 polarizers and a computer program doing extremely simple math and the last piece required to reach photorealism was acquired.^[5]

For a believable result both lightreflected from skin (BRDF) and within the skin (a special case ofBTDF) which together make up theBSDF must be captured and simulated.

Capturing

[edit]

The 3Dgeometry andtextures are captured onto a3D model by a3D reconstruction method, such assampling the target by means of3D scanning with anRGB XYZ scanner such asArius3d orCyberware (textures from photos, not pure RGB XYZ scanner),stereophotogrammetrically fromsynchronized photos or even from enough repeated non-simultaneousphotos.Digital sculpting can be used to make up models of the body parts for which data cannot be acquired e.g. parts of the body covered by clothing.
For believable results also thereflectance field must be captured or an approximation must be picked from the libraries to form a 7D reflectance model of the target.

Synthesis

[edit]

The whole process of making digital look-alikes i.e. characters so lifelike and realistic that they can be passed off as pictures of humans is a very complex task as it requires photorealisticallymodeling, animating,cross-mapping, andrendering thesoft body dynamics of the human appearance.

Synthesis with an actor and suitablealgorithms is applied using powerful computers. The actor's part in the synthesis is to take care of mimicking humanexpressions in still picture synthesizing and also human movement in motion picture synthesizing. Algorithms are needed to simulate laws ofphysics andphysiology and to map the models and their appearance, movements and interaction accordingly.

Often bothphysics/physiology based (i.e.skeletal animation) andimage-based modeling and rendering are employed in the synthesis part. Hybrid models employing both approaches have shown best results in realism and ease-of-use.Morph target animation reduces the workload by giving higher level control, where different facial expressions are defined as deformations of the model, which facial allows expressions to be tuned intuitively. Morph target animation can then morph the model between different defined facial expressions or body poses without much need for human intervention.

Usingdisplacement mapping plays an important part in getting a realistic result with fine detail of skin such aspores andwrinkles as small as 100μm.

Machine learning approach

[edit]

In the late 2010s,machine learning, and more preciselygenerative adversarial networks (GAN), were used byNVIDIA to produce random yet photorealistic human-like portraits. The system, namedStyleGAN, was trained on a database of 70,000 images from the images depository websiteFlickr. The source code was made public onGitHub in 2019.^[32] Outputs of the generator network from random input were made publicly available on a number of websites.^[33]^[34]

Similarly, since 2018,deepfake technology has allowed GANs to swap faces between actors; combined with the ability to fake voices, GANs can thus generate fake videos that seem convincing.^[35]

Applications

[edit]

Main applications fall within the domains ofstock photography,synthetic datasets,virtual cinematography, computer andvideo games andcovert disinformation attacks.^[36]^[34] Some facial-recognition AI use images generated by other AI assynthetic data for training.^[37]

Furthermore, some research suggests that it can havetherapeutic effects as "psychologists andcounselors have also begun usingavatars to deliver therapy to clients who havephobias, a history oftrauma, addictions,Asperger’s syndrome orsocial anxiety."^[38] The strong memory imprint and brain activation effects caused by watching a digital look-alike avatar of yourself is dubbed theDoppelgänger effect.^[38] The doppelgänger effect can heal when covert disinformation attack is exposed as such to the targets of the attack.

Related issues

[edit]

Thespeech synthesis has been verging on being completely indistinguishable from a recording of a real human's voice since the 2016 introduction of the voice editing and generation softwareAdobe Voco, a prototype slated to be a part of theAdobe Creative Suite andDeepMind WaveNet, a prototype from Google.^[39]Ability to steal and manipulate other peoples voices raises obvious ethical concerns.^[40]

At the 2018Conference on Neural Information Processing Systems (NeurIPS) researchers from Google presented the work 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis', whichtransfers learning fromspeaker verification to achieve text-to-speech synthesis, that can be made to sound almost like anybody from a speech sample of only 5 seconds(listen).^[41]

Sourcing images for AI training raises a question of privacy as people who are used for training didn't consent.^[42]

Digital sound-alikes technology found its way to the hands of criminals as in 2019Symantec researchers knew of 3 cases where technology has been used for crime.^[43]^[44]

This coupled with the fact that (as of 2016) techniques which allownear real-time counterfeiting offacial expressions in existing 2D video have been believably demonstrated increases the stress on the disinformation situation.^[15]

References

[edit]

^Physics-based muscle model for mouth shape control onIEEE Explore (requires membership)
^Realistic 3D facial animation in virtual space teleconferencing onIEEE Explore (requires membership)
^Berlin, Isabelle (14 September 2008)."Images de synthèse : palme de la longévité pour l'ombrage de Gouraud".Interstices (in French). Retrieved3 October 2024.
^"Images de synthèse : palme de la longévité pour l'ombrage de Gouraud". 14 September 2008.
^^a ^b ^cDebevec, Paul (2000)."Acquiring the reflectance field of a human face".Proceedings of the 27th annual conference on Computer graphics and interactive techniques - SIGGRAPH '00. ACM. pp. 145–156.doi:10.1145/344779.344855.ISBN 978-1581132083.S2CID 2860203. Retrieved24 May 2017.
^Pighin, Frédéric."Siggraph 2005 Digital Face Cloning Course Notes"(PDF). Retrieved24 May 2017.
^"St. Andrews Face Transformer".Futility Closet. 30 January 2005. Retrieved7 December 2020.
^^a ^bWest, Marc (4 December 2007)."Changing the face of science".Plus Magazine. Retrieved7 December 2020.
^Goddard, John (27 January 2010)."The many faces of race research".thestar.com. Retrieved7 December 2020.
^In this TED talk video at 00:04:59 you can seetwo clips, one with the real Emily shot with a real camera and one with a digital look-alike of Emily, shot with a simulation of a camera –Which is which is difficult to tell. Bruce Lawmen was scanned using USC light stage 6 in still position and also recorded running there on atreadmill. Many, many digital look-alikes of Bruce are seen running fluently and natural looking at the ending sequence of the TED talk video.
^ReForm – Hollywood's Creating Digital Clones (youtube). The Creators Project. 24 May 2017.
^Debevec, Paul."Digital Ira SIGGRAPH 2013 Real-Time Live". Archived fromthe original on 21 February 2015. Retrieved24 May 2017.
^"Scanning and printing a 3D portrait of President Barack Obama". University of Southern California. 2013. Archived fromthe original on 17 September 2015. Retrieved24 May 2017.
^Giardina, Carolyn (25 March 2015)."'Furious 7' and How Peter Jackson's Weta Created Digital Paul Walker".The Hollywood Reporter. Retrieved24 May 2017.
^^a ^bThies, Justus (2016)."Face2Face: Real-time Face Capture and Reenactment of RGB Videos". Proc. Computer Vision and Pattern Recognition (CVPR), IEEE. Retrieved24 May 2017.
^"Synthesizing Obama: Learning Lip Sync from Audio".grail.cs.washington.edu. Retrieved3 October 2024.
^Suwajanakorn, Supasorn; Seitz, Steven; Kemelmacher-Shlizerman, Ira (2017),Synthesizing Obama: Learning Lip Sync from Audio,University of Washington, retrieved2 March 2018
^Roettgers, Janko (21 February 2018)."Porn Producers Offer to Help Hollywood Take Down Deepfake Videos".Variety. Retrieved28 February 2018.
^Takahashi, Dean (21 March 2018)."Epic Games shows off amazing real-time digital human with Siren demo".VentureBeat. Retrieved10 September 2018.
^Kuo, Lily (9 November 2018)."World's first AI news anchor unveiled in China".TheGuardian.com. Retrieved9 November 2018.
^Hamilton, Isobel Asher (9 November 2018)."China created what it claims is the first AI news anchor — watch it in action here".Business Insider. Retrieved9 November 2018.
^Harwell, Drew (30 December 2018)."Fake-porn videos are being weaponized to harass and humiliate women: 'Everybody is a potential target'".The Washington Post. Retrieved14 March 2019.In September [of 2018], Google added "involuntary synthetic pornographic imagery" to its ban list
^"NVIDIA Open-Sources Hyper-Realistic Face Generator StyleGAN".Medium.com. 9 February 2019. Retrieved3 October 2019.
^^a ^bPaez, Danny (13 February 2019)."This Person Does Not Exist Is the Best One-Off Website of 2019".Inverse. Retrieved5 March 2018.
^"New state laws go into effect July 1". 24 June 2019.
^^a ^b"§ 18.2–386.2. Unlawful dissemination or sale of images of another; penalty".Virginia. Retrieved1 January 2020.
^"Relating to the creation of a criminal offense for fabricating a deceptive video with intent to influence the outcome of an election".Texas. 14 June 2019. Retrieved2 January 2020.In this section, "deep fake video" means a video, created with the intent to deceive, that appears to depict a real person performing an action that did not occur in reality
^Johnson, R.J. (30 December 2019)."Here Are the New California Laws Going Into Effect in 2020".KFI.iHeartMedia. Retrieved1 January 2020.
^Mihalcik, Carrie (4 October 2019)."California laws seek to crack down on deepfakes in politics and porn".cnet.com.CNET. Retrieved14 October 2019.
^"China seeks to root out fake news and deepfakes with new online content rules".Reuters.com.Reuters. 29 November 2019. Retrieved8 December 2019.
^Statt, Nick (29 November 2019)."China makes it a criminal offense to publish deepfakes or fake news without disclosure".The Verge. Retrieved8 December 2019.
^Synced (9 February 2019)."NVIDIA Open-Sources Hyper-Realistic Face Generator StyleGAN".Synced. Retrieved4 August 2020.
^StyleGAN public showcase website
^^a ^bPorter, Jon (20 September 2019)."100,000 free AI-generated headshots put stock photo companies on notice".The Verge. Retrieved7 August 2020.
^"What Is a Deepfake?".PCMAG.com. March 2020. Retrieved8 June 2020.
^Harwell, Drew."Dating apps need women. Advertisers need diversity. AI companies offer a solution: Fake people".Washington Post. Retrieved4 August 2020.
^"Neural Networks Need Data to Learn. Even If It's Fake".Quanta Magazine. 11 December 2023. Retrieved18 June 2023.
^^a ^bMurphy, Samantha (2023)."Scientific American: Your Avatar, Your Guide"(.pdf). Scientific American / Uni of Stanford. Retrieved11 December 2023.
^"WaveNet: A Generative Model for Raw Audio".Deepmind.com. 8 September 2016. Archived fromthe original on 27 May 2017. Retrieved24 May 2017.
^"Adobe Voco 'Photoshop-for-voice' causes concern".BBC.com.BBC. 7 November 2016. Retrieved5 July 2016.
^Jia, Ye; Zhang, Yu; Weiss, Ron J. (12 June 2018), "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis",Advances in Neural Information Processing Systems,31:4485–4495,arXiv:1806.04558,Bibcode:2018arXiv180604558J
^Rachel Metz (19 April 2019)."If your image is online, it might be training facial-recognition AI".CNN. Retrieved4 August 2020.
^"Fake voices 'help cyber-crooks steal cash'".bbc.com.BBC. 8 July 2019. Retrieved16 April 2020.
^Drew, Harwell (16 April 2020)."An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft".Washington Post. Retrieved8 September 2019.

Artificial intelligence (AI)

Concepts

Applications

Implementations

Audio–visual	AlexNet WaveNet Human image synthesis HWR OCR Computer vision Speech synthesis 15.ai ElevenLabs Speech recognition Whisper Facial recognition AlphaFold Text-to-image models Aurora DALL-E Firefly Flux Ideogram Imagen Midjourney Recraft Stable Diffusion Text-to-video models Dream Machine Runway Gen Hailuo AI Kling Sora Veo Music generation Riffusion Suno AI Udio
Text	Word2vec Seq2seq GloVe BERT T5 Llama Chinchilla AI PaLM GPT 1 2 3 J ChatGPT 4 4o o1 o3 4.5 4.1 o4-mini 5 5.1 Claude Gemini Gemini (language model) Gemma Grok LaMDA BLOOM DBRX Project Debater IBM Watson IBM Watsonx Granite PanGu-Σ DeepSeek Qwen
Decisional	AlphaGo AlphaZero OpenAI Five Self-driving car MuZero Action selection AutoGPT Robot control