Movatterモバイル変換

Friendly artificial intelligence

From Wikipedia, the free encyclopedia

AI to benefit humanity

Artificial intelligence (AI)
Part ofa series on

Major goals Artificial general intelligence Intelligent agent Recursive self-improvement Planning Computer vision General game playing Knowledge representation Natural language processing Robotics AI safety
Approaches Machine learning Symbolic Deep learning Bayesian networks Evolutionary algorithms Hybrid intelligent systems Systems integration Open-source AI data centers
Applications Bioinformatics Deepfake Earth sciences Finance Generative AI Art Audio Music Government Healthcare Mental health Industry Software development Translation Military Physics Projects
Philosophy AI alignment Artificial consciousness The bitter lesson Chinese room Friendly AI Ethics Existential risk Turing test Uncanny valley Human–AI interaction
History Timeline Progress AI winter AI boom AI bubble
Controversies Deepfake pornography Taylor Swift deepfake pornography controversy Grok sexual deepfake scandal Google Gemini image generation controversy Pause Giant AI Experiments Removal of Sam Altman from OpenAI Statement on AI Risk Tay (chatbot) Théâtre D'opéra Spatial Voiceverse NFT plagiarism scandal
Glossary Glossary
v t e

Friendly artificial intelligence (friendly AI orFAI) is hypotheticalartificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at leastalign with human interests such as fostering the improvement of the human species. It is a part of theethics of artificial intelligence and is closely related tomachine ethics. While machine ethics is concerned with how an artificially intelligent agentshould behave, friendly artificial intelligence research is focused on how to practically bring about this behavior and ensuring it is adequately constrained.

Etymology and usage

[edit]

Eliezer Yudkowsky, AI researcher and creator of the term

The term was coined byEliezer Yudkowsky,^[1] who is best known for popularizing the idea,^[2]^[3] to discusssuperintelligent artificial agents that reliably implement human values.Stuart J. Russell andPeter Norvig's leadingartificial intelligence textbook,Artificial Intelligence: A Modern Approach, describes the idea:^[2]

Yudkowsky (2008) goes into more detail about how to design aFriendly AI. He asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design—to define a mechanism for evolving AI systems under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.

"Friendly" is used in this context astechnical terminology, and picks out agents that are safe and useful, not necessarily ones that are "friendly" in the colloquial sense. The concept is primarily invoked in the context of discussions of recursively self-improving artificial agents that rapidlyexplode in intelligence, on the grounds that this hypothetical technology would have a large, rapid, and difficult-to-control impact on human society.^[4]

Risks of unfriendly AI

[edit]

Main article:Existential risk from artificial general intelligence

The roots of concern about artificial intelligence are very old. Kevin LaGrandeur showed that the dangers specific to AI can be seen in ancient literature concerning artificial humanoid servants such as thegolem, or the proto-robots ofGerbert of Aurillac andRoger Bacon. In those stories, the extreme intelligence and power of these humanoid creations clash with their status as slaves (which by nature are seen as sub-human), and cause disastrous conflict.^[5] By 1942 these themes promptedIsaac Asimov to create the "Three Laws of Robotics"—principles hard-wired into all the robots in his fiction, intended to prevent them from turning on their creators, or allowing them to come to harm.^[6]

In modern times as the prospect ofsuperintelligent AI looms nearer, philosopherNick Bostrom has said that superintelligent AI systems with goals that are not aligned with human ethics are intrinsically dangerous unless extreme measures are taken to ensure the safety of humanity. He put it this way:

Basically we should assume that a 'superintelligence' would be able to achieve whatever goals it has. Therefore, it is extremely important that the goals we endow it with, and its entire motivation system, is 'human friendly.'

In 2008, Eliezer Yudkowsky called for the creation of "friendly AI" to mitigateexistential risk from advanced artificial intelligence. He explains: "The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."^[7]

Steve Omohundro says that a sufficiently advanced AI system will, unless explicitly counteracted, exhibit a number ofbasic "drives", such as resource acquisition,self-preservation, and continuous self-improvement, because of the intrinsic nature of any goal-driven systems and that these drives will, "without special precautions", cause the AI to exhibit undesired behavior.^[8]^[9]

Alexander Wissner-Gross says that AIs driven to maximize their future freedom of action (or causal path entropy) might be considered friendly if their planning horizon is longer than a certain threshold, and unfriendly if their planning horizon is shorter than that threshold.^[10]^[11]

Luke Muehlhauser, writing for theMachine Intelligence Research Institute, recommends thatmachine ethics researchers adopt whatBruce Schneier has called the "security mindset": Rather than thinking about how a system will work, imagine how it could fail. For instance, he suggests even an AI that only makes accurate predictions and communicates via a text interface might cause unintended harm.^[12]

In 2014, Luke Muehlhauser and Nick Bostrom underlined the need for 'friendly AI';^[13] nonetheless, the difficulties in designing a 'friendly' superintelligence, for instance via programming counterfactual moral thinking, are considerable.^[14]^[15]

Coherent extrapolated volition

[edit]

Main article:Coherent extrapolated volition

Yudkowsky advances the Coherent Extrapolated Volition (CEV) model. According to him, our coherent extrapolated volition is "our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted".^[16]

Rather than a Friendly AI being designed directly by human programmers, it is to be designed by a "seed AI" programmed to first studyhuman nature and then produce the AI that humanity would want, given sufficient time and insight, to arrive at a satisfactory answer.^[16] The appeal to anobjective through contingent human nature (perhaps expressed, for mathematical purposes, in the form of autility function or otherdecision-theoretic formalism), as providing the ultimate criterion of "Friendliness", is an answer to themeta-ethical problem of defining anobjective morality; extrapolated volition is intended to be what humanity objectively would want, all things considered, but it can only be defined relative to the psychological and cognitive qualities of present-day, unextrapolated humanity.

Other approaches

[edit]

Steve Omohundro has proposed a "scaffolding" approach toAI safety, in which one provably safe AI generation helps build the next provably safe generation.^[17]

Seth Baum argues that the development of safe, socially beneficial artificial intelligence or artificial general intelligence is a function of the social psychology of AI research communities and so can be constrained by extrinsic measures and motivated by intrinsic measures. Intrinsic motivations can be strengthened when messages resonate with AI developers; Baum argues that, in contrast, "existing messages about beneficial AI are not always framed well". Baum advocates for "cooperative relationships, and positive framing of AI researchers" and cautions against characterizing AI researchers as "not want(ing) to pursue beneficial designs".^[18]

In his bookHuman Compatible, AI researcherStuart J. Russell lists three principles to guide the development of beneficial machines. He emphasizes that these principles are not meant to be explicitly coded into the machines; rather, they are intended for the human developers. The principles are as follows:^[19]^: 173

The machine's only objective is to maximize the realization of human preferences.
The machine is initially uncertain about what those preferences are.
The ultimate source of information about human preferences is human behavior.

The "preferences" Russell refers to "are all-encompassing; they cover everything you might care about, arbitrarily far into the future."^[19]^: 173 Similarly, "behavior" includes any choice between options,^[19]^: 177 and the uncertainty is such that some probability, which may be quite small, must be assigned to every logically possible human preference.^[19]^: 201

Public policy

[edit]

James Barrat, author ofOur Final Invention, suggested that "a public-private partnership has to be created to bring A.I.-makers together to share ideas about security—something like theInternational Atomic Energy Agency, but in partnership with corporations." He urges AI researchers to convene a meeting similar to theAsilomar Conference on Recombinant DNA, which discussedrisks of biotechnology.^[17]

John McGinnis encourages governments to accelerate friendly AI research. Because the goalposts of friendly AI are not necessarily eminent, he suggests a model similar to theNational Institutes of Health, where "Peer review panels of computer and cognitive scientists would sift through projects and choose those that are designed both to advance AI and assure that such advances would be accompanied by appropriate safeguards." McGinnis feels that peer review is better "than regulation to address technical issues that are not possible to capture through bureaucratic mandates". McGinnis notes that his proposal stands in contrast to that of theMachine Intelligence Research Institute, which generally aims to avoid government involvement in friendly AI.^[20]

Criticism

[edit]

Some critics believe that both human-level AI and superintelligence are unlikely and that, therefore, friendly AI is unlikely. Writing inThe Guardian, Alan Winfield compares human-level artificial intelligence with faster-than-light travel in terms of difficulty and states that while we need to be "cautious and prepared" given the stakes involved, we "don't need to be obsessing" about the risks of superintelligence.^[21] Boyles and Joaquin, on the other hand, argue that Luke Muehlhauser andNick Bostrom's proposal to create friendly AIs appear to be bleak. This is because Muehlhauser and Bostrom seem to hold the idea that intelligent machines could be programmed to think counterfactually about the moral values that human beings would have had.^[13] In an article inAI & Society, Boyles and Joaquin maintain that such AIs would not be that friendly considering the following: the infinite amount of antecedent counterfactual conditions that would have to be programmed into a machine, the difficulty of cashing out the set of moral values—that is, those that are more ideal than the ones human beings possess at present, and the apparent disconnect between counterfactual antecedents and ideal value consequent.^[14]

Some philosophers claim that any truly "rational" agent, whether artificial or human, will naturally be benevolent; in this view, deliberate safeguards designed to produce a friendly AI could be unnecessary or even harmful.^[22] Other critics question whether artificial intelligence can be friendly. Adam Keiper and Ari N. Schulman, editors of the technology journalThe New Atlantis, say that it will be impossible ever to guarantee "friendly" behavior in AIs because problems of ethical complexity will not yield to software advances or increases in computing power. They write that the criteria upon which friendly AI theories are based work "only when one has not only great powers of prediction about the likelihood of myriad possible outcomes but certainty and consensus on how one values the different outcomes.^[23]

The inner workings of advanced AI systems may be complex and difficult to interpret, leading to concerns about transparency and accountability.^[24]

References

[edit]

^Tegmark, Max (2014). "Life, Our Universe and Everything".Our Mathematical Universe: My Quest for the Ultimate Nature of Reality (First ed.). Knopf Doubleday Publishing.ISBN 978-0-307-74425-8.Its owner may cede control to what Eliezer Yudkowsky terms a "Friendly AI,"...
^^a ^bRussell, Stuart;Norvig, Peter (2009).Artificial Intelligence: A Modern Approach. Prentice Hall.ISBN 978-0-13-604259-4.
^Leighton, Jonathan (2011).The Battle for Compassion: Ethics in an Apathetic Universe. Algora.ISBN 978-0-87586-870-7.
^Wallach, Wendell; Allen, Colin (2009).Moral Machines: Teaching Robots Right from Wrong. Oxford University Press, Inc.ISBN 978-0-19-537404-9.
^Kevin LaGrandeur (2011)."The Persistent Peril of the Artificial Slave".Science Fiction Studies.38 (2): 232.doi:10.5621/sciefictstud.38.2.0232.Archived from the original on January 13, 2023. RetrievedMay 6, 2013.
^Isaac Asimov (1964)."Introduction".The Rest of the Robots. Doubleday.ISBN 0-385-09041-2.{{cite book}}:ISBN / Date incompatibility (help)
^Eliezer Yudkowsky (2008)."Artificial Intelligence as a Positive and Negative Factor in Global Risk"(PDF). In Nick Bostrom; Milan M. Ćirković (eds.).Global Catastrophic Risks. pp. 308–345.Archived(PDF) from the original on October 19, 2013. RetrievedOctober 19, 2013.
^Omohundro, S. M. (February 2008). "The basic AI drives".Artificial General Intelligence.171:483–492.CiteSeerX 10.1.1.393.8356.
^Bostrom, Nick (2014). "Chapter 7: The Superintelligent Will".Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press.ISBN 978-0-19-967811-2.
^Dvorsky, George (April 26, 2013)."How Skynet Might Emerge From Simple Physics".Gizmodo.Archived from the original on October 8, 2021. RetrievedDecember 23, 2021.
^Wissner-Gross, A. D.;Freer, C. E. (2013)."Causal entropic forces".Physical Review Letters.110 (16) 168702.Bibcode:2013PhRvL.110p8702W.doi:10.1103/PhysRevLett.110.168702.hdl:1721.1/79750.PMID 23679649.
^Muehlhauser, Luke (July 31, 2013)."AI Risk and the Security Mindset".Machine Intelligence Research Institute.Archived from the original on July 19, 2014. RetrievedJuly 15, 2014.
^^a ^bMuehlhauser, Luke; Bostrom, Nick (December 17, 2013). "Why We Need Friendly AI".Think.13 (36):41–47.doi:10.1017/s1477175613000316.ISSN 1477-1756.S2CID 143657841.
^^a ^bBoyles, Robert James M.; Joaquin, Jeremiah Joven (July 23, 2019). "Why friendly AIs won't be that friendly: a friendly reply to Muehlhauser and Bostrom".AI & Society.35 (2):505–507.doi:10.1007/s00146-019-00903-0.ISSN 0951-5666.S2CID 198190745.
^Chan, Berman (March 4, 2020)."The rise of artificial intelligence and the crisis of moral passivity".AI & Society.35 (4):991–993.doi:10.1007/s00146-020-00953-9.ISSN 1435-5655.S2CID 212407078.Archived from the original on February 10, 2023. RetrievedJanuary 21, 2023.
^^a ^bEliezer Yudkowsky (2004)."Coherent Extrapolated Volition"(PDF). Singularity Institute for Artificial Intelligence.Archived(PDF) from the original on September 30, 2015. RetrievedSeptember 12, 2015.
^^a ^bHendry, Erica R. (January 21, 2014)."What Happens When Artificial Intelligence Turns On Us?".Smithsonian Magazine.Archived from the original on July 19, 2014. RetrievedJuly 15, 2014.
^Baum, Seth D. (September 28, 2016). "On the promotion of safe and socially beneficial artificial intelligence".AI & Society.32 (4):543–551.doi:10.1007/s00146-016-0677-0.ISSN 0951-5666.S2CID 29012168.
^^a ^b ^c ^dRussell, Stuart (October 8, 2019).Human Compatible: Artificial Intelligence and the Problem of Control. United States: Viking.ISBN 978-0-525-55861-3.OCLC 1083694322.
^McGinnis, John O. (Summer 2010)."Accelerating AI".Northwestern University Law Review.104 (3):1253–1270.Archived from the original on December 1, 2014. RetrievedJuly 16, 2014.
^Winfield, Alan (August 9, 2014)."Artificial intelligence will not turn into a Frankenstein's monster".The Guardian.Archived from the original on September 17, 2014. RetrievedSeptember 17, 2014.
^Kornai, András (May 15, 2014). "Bounding the impact of AGI".Journal of Experimental & Theoretical Artificial Intelligence.26 (3). Informa UK Limited:417–438.doi:10.1080/0952813x.2014.895109.ISSN 0952-813X.S2CID 7067517....the essence of AGIs is their reasoning facilities, and it is the very logic of their being that will compel them to behave in a moral fashion... The real nightmare scenario (is one where) humans find it advantageous to strongly couple themselves to AGIs, with no guarantees against self-deception.
^Keiper, Adam; Schulman, Ari N. (Summer 2011)."The Problem with 'Friendly' Artificial Intelligence".The New Atlantis. No. 32. pp. 80–89.Archived from the original on January 15, 2012. RetrievedJanuary 16, 2012.
^Norvig, Peter; Russell, Stuart (2010).Artificial Intelligence: A Modern Approach (3rd ed.). Pearson.ISBN 978-0-13-604259-4.

External links

[edit]

Ethical Issues in Advanced Artificial Intelligence by Nick Bostrom
What is Friendly AI? — A brief description of Friendly AI by the Machine Intelligence Research Institute.
Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures — A near book-length description from the MIRI
Critique of the MIRI Guidelines on Friendly AI — byBill Hibbard
Commentary on MIRI's Guidelines on Friendly AI — by Peter Voss.
The Problem with 'Friendly' Artificial Intelligence — On the motives for and impossibility of FAI; by Adam Keiper and Ari N. Schulman.

v t e Existential risk from artificial intelligence
Concepts	AGI AI alignment AI boom AI capability control AI safety AI takeover Effective accelerationism Ethics of artificial intelligence Existential risk from artificial intelligence Friendly artificial intelligence Instrumental convergence Intelligence explosion Longtermism Machine ethics Suffering risks Superintelligence Technological singularity Vulnerable world hypothesis
Organizations	AI Futures Project Alignment Research Center Center for AI Safety Center for Applied Rationality Center for Human-Compatible Artificial Intelligence Centre for the Study of Existential Risk EleutherAI Future of Humanity Institute Future of Life Institute Google DeepMind Humanity+ Institute for Ethics and Emerging Technologies Leverhulme Centre for the Future of Intelligence Machine Intelligence Research Institute METR OpenAI PauseAI Safe Superintelligence
People	Scott Alexander Sam Altman Yoshua Bengio Nick Bostrom Paul Christiano Eric Drexler Sam Harris Stephen Hawking Dan Hendrycks Geoffrey Hinton Bill Joy Daniel Kokotajlo Shane Legg Jan Leike Elon Musk Steve Omohundro Huw Price Martin Rees Stuart J. Russell Nate Soares Ilya Sutskever Jaan Tallinn Max Tegmark Alan Turing Frank Wilczek Roman Yampolskiy Eliezer Yudkowsky
Books	Do You Trust This Computer? Human Compatible If Anyone Builds It, Everyone Dies Our Final Invention Superintelligence: Paths, Dangers, Strategies The Precipice: Existential Risk and the Future of Humanity
Other	Artificial Intelligence Act Open letter on artificial intelligence Statement on AI Risk
Category