Book Chapter10.1007/978-3-662-54033-6_5
Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda
Nate Soares,Benya Fallenstein +1 more
- 01 Jan 2017
- pp 103-125
70
TL;DR: In this chapter, a host of technical problems that AI scientists could work on to ensure that the creation of smarter-than-human machine intelligence has a positive impact are discussed.
read more
Abstract: In this chapter, we discuss a host of technical problems that we think AI scientists could work on to ensure that the creation of smarter-than-human machine intelligence has a positive impact Although such systems may be decades away, it is prudent to begin research early: the technical challenges involved in safety and reliability work appear formidable, and uniquely consequential Our technical agenda discusses three broad categories of research where we think foundational research today could make it easier in the future to develop superintelligent systems that are reliably aligned with human interests:
1
Highly reliable agent designs: how to ensure that we built the right system
2
Error tolerance: how to ensure that the inevitable flaws are manageable and correctable
3
Value specification: how to ensure that the system is pursuing the right sorts of objectives
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Superintelligence: Paths, Dangers, Strategies
TL;DR: The first ultraintelligent machine is the last invention that man need ever make, provided that the machine i... as mentioned in this paper, 2014.Hardcover: 352 pagesYear: 2014Publisher: Oxford University PressISBN-13: 978019967811212
790
The logic of decision
Franco Taroni,Alex Biedermann,Silvia Bozza,Paolo Garbolino,Colin Aitken +4 more
- 18 Jul 2014
449
•Posted Content
Scalable agent alignment via reward modeling: a research direction.
TL;DR: This work outlines a high-level research direction to solve the agent alignment problem centered around reward modeling: learning a reward function from interaction with the user and optimizing the learned reward function with reinforcement learning.
303
Artificial intelligence and the limits of legal personality
TL;DR: In this article, the authors argue that although most legal systems could create a novel category of legal persons, such arguments are insufficient to show that they should, and they argue that such categories should be replaced by natural persons.
AGI Safety Literature Review
Tom Everitt,Gary Lea,Marcus Hutter +2 more
- 03 May 2018
TL;DR: In this paper, the authors provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety, and review the current public policy on AGI.
References
Causality: models, reasoning, and inference
TL;DR: The art and science of cause and effect have been studied in the social sciences for a long time as mentioned in this paper, see, e.g., the theory of inferred causation, causal diagrams and the identification of causal effects.
14.9K
A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955
TL;DR: The 1956 Dartmouth summer research project on artificial intelligence was initiated by this August 31, 1955 proposal, authored by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, along with the short autobiographical statements of the proposers.
•Book
Superintelligence: Paths, Dangers, Strategies
Nick Bostrom
- 03 Jul 2014
TL;DR: In this paper, Bostrom's work picks its way carefully through a vast tract of forbiddingly difficult intellectual terrain, and the writing is so lucid that it somehow makes it all seem easy.
1.5K
•Journal Article
Programming a computer for playing chess
TL;DR: This paper is concerned with the problem of constructing a computing routine or “program” for a modern general purpose computer which will enable it to play chess.
940
Related Papers (5)
Nick Bostrom
- 03 Jul 2014
Pieter Abbeel,Andrew Y. Ng +1 more
- 04 Jul 2004
Laurent Orseau,Stuart Armstrong +1 more
- 25 Jun 2016
A. M. Turing
- 01 Jan 1950