Terminal value

From LessWrong
Jump to navigation Jump to search
Wikipedia has an article about

A terminal value (also known as an intrinsic value) is an ultimate goal, an end-in-itself. The non-standard term "supergoal" is used for this concept in Eliezer Yudkowsky's earlier writings.

In an artificial general intelligence with a utility or reward function, the terminal value is the maximization of that function. The concept is not usefully applicable to all Als, and it is not known how applicable it is to organic entities.

Terminal vs. instrumental values

Terminal values stand in contrast to instrumental values (also known as extrinsic values), which are means-to-an-end, mere tools in achieving terminal values. For example, if a given university student studies merely as a professional qualification, his terminal value is getting a job, while getting good grades is an instrument to that end. If a (simple) chess program tries to maximize piece value three turns into the future, that is an instrumental value to its implicit terminal value of winning the game.

Some values may be called "terminal" merely in relation to an instrumental goal, yet themselves serve instrumentally towards a higher goal. However, in considering future artificial general intelligence, the phrase "terminal value" is generally used only for the top level of the goal hierarchy of the AGI itself: the true ultimate goals of the system; but excluding goals inside the AGI in service of other goals, and excluding the purpose of the AGI's makers, the goal for which they built the system.

Human terminal values

It is not known whether humans have terminal values that are clearly distinct from another set of instrumental values. Humans appear to adopt different values at different points in life. Nonetheless, if the theory of terminal values applies to humans', then their system of terminal values is quite complex. The values were forged by evolution in the ancestral environment to maximize inclusive genetic fitness. These values include survival, health, friendship, social status, love, joy, aesthetic pleasure, curiosity, and much more. Evolution's implicit goal is inclusive genetic fitness, but humans do not have inclusive genetic fitness as a goal. Rather, these values, which were instrumental to inclusive genetic fitness, have become humans' terminal values (an example of subgoal stomp).

Humans cannot fully introspect their terminal values. Humans' terminal values are often mutually contradictory, inconsistent, and changeable.

Non-human terminal values

Future artificial general intelligences may have the maximization of a utility function or of a reward function (reinforcement learning) as their terminal value. The function will likely be set by the AGI's designers.

Since people make tools instrumentally, to serve specific human values, the assigned value system of the artificial general intelligence may be much simpler than humans'. This will pose a danger, as an AI must seek to protect all human values if a positive human future is to be achieved. The paperclip maximizer is a thought experiment about an artificial general intelligence with consequences disastrous to humanity, with the the apparently innocuous terminal value of maximizing the number of paperclips in its collection,

An intelligence can work towards any terminal value, not just human-like ones. AIXI is a mathematical formalism for modeling intelligence. It illustrates that the arbitrariness of terminal values may be optimized by an intelligence: AIXI is provably more intelligent than any other agent for any computable reward function.

In a Friendly AI

For an artificial general intelligence to have a positive and not a negative effect on humanity, its terminal value must be benevolent to humans. It must seek the maximization of the full set of human values (for the humans' benefit, not for itself).


Eliezer Yudkowsky, Terminal Values and Instrumental Values