Computation Hazard

From LessWrong
Jump to navigation Jump to search

A Computation Hazard is a large possible negative consequence arising from vast amounts of complex computation [1]. It is a risk inherently more likely in any kind of vast amount of complex computation, since when a large number of computations and algorithms are run, more likely that some of these algorithms are a serious hazard. They aren't specific to a particular computation such as an example of a unfriendly AI.

Fewer and less complex computations are less probable to be a hazard, as a very short and simple program will most likely not be a computation hazard on a normal computer. Some computations will certainly be a hazard because (1) they will run almost all possible computations, hence almost all possible computational hazards (e.g.: a Solomonoff induction algorithm or any Turing complete game simulation) or (2) they are particularly likely to run algorithms that are computation hazards (e.g.: agents, predictors and oracles). The first case doesn't have any kind of specificity, they are only a hazard since they will include the computations in the second case. The examples for the second case are analyzed in more detail bellow.

Agents can be a hazard since they are defined by having the intention of maximizing a goal, and this goal may be detrimental to humanity, the most classical example been the paperclip maximizer – an AGI with the solely goal of maximizing the total number of paper clips. Recursive self-improving agents are especially dangerous since their powers can grow rapidly and unpredictably. They also will probably need to simulate other agents (i.e.: humans) behavior, hence they would also present the hazard of simulating a lot of consciousness suffering. For example, imagine an agent that has to predict the behavior of humans while in pain. In other to do so, it may need to accurately simulate a large group of humans feeling pain, and it may do so in great detail to the point of instantiating actual conscious human suffering.

A predictor is a computation which takes data as input, and predicts what data will come next. Oracles are computations designed to answer questions, which can be predictions or questions about predictions. Ar first glance they may seem unharmful since they can't, in principle, have structured goals and a direct influence in the world. However, they can also be a hazard. A predictor may influence the world by trying to be more accurate, emitting predictions that are more likely to be true if they are emitted - self-fulfilling prophecies. Oracles or predictors might end up containing agents for several reasons: they might have to simulate agents whose behavior they are asked to predict; they may also simulate the minds of its creators in order to better answer their questions. While those agents inside a larger oracle/predictor could't directly influence the world for achieving their goal, they might be instantiating consciousnesses suffering. In addition, these agents may eventually start to self-improve and dominate the predictor/oracle behavior, thus having a direct impact on the world.

There are two main strategies one could follow to avoid these kinds of risk. First, to keep the computations small and simple until some clear reassurance of their safety is known. Second, to use some kind of agent detectors - similar to a non-person predicates -, which would ensure that a computation doesn't contain agents or persons.

References