# Solomonoff induction

Ray Solomonoff defined an inference
system that will learn to correctly predict any computable
sequence with only the absolute minimum amount of data. This
system, in a certain sense, is the perfect universal prediction
algorithm. To summarize it very informally, **Solomonoff induction**
works by:

- Starting with all possible hypotheses (sequences) as represented by computer programs (that generate those sequences), weighted by their simplicity (2
^{-n}, where**n**is the program length); - Discarding those hypotheses that are inconsistent with the data.

Weighting hypotheses by simplicity, the system automatically
incorporates a form of Occam's razor, which is why it has been
playfully referred to as *Solomonoff's lightsaber*.

Solomonoff induction gets off the ground with a solution to the
"problem of the priors". Suppose that you stand before a
universal prefix Turing machine
[math]\displaystyle{ U }[/math]. You are interested in a certain finite output
string [math]\displaystyle{ y_{0} }[/math]. In particular, you want to know the
probability that [math]\displaystyle{ U }[/math] will produce the output
[math]\displaystyle{ y_{0} }[/math] given a random input tape. This probability is
the **Solomonoff a priori probability** of
[math]\displaystyle{ y_{0} }[/math].

More precisely, suppose that a particular infinite input string
[math]\displaystyle{ x_{0} }[/math] is about to be fed into
[math]\displaystyle{ U }[/math]. However, you know nothing about
[math]\displaystyle{ x_{0} }[/math] other than that each term of the string is either
[math]\displaystyle{ 0 }[/math] or [math]\displaystyle{ 1 }[/math]. As far as your
state of knowledge is concerned, the [math]\displaystyle{ i }[/math]th digit of
[math]\displaystyle{ x_{0} }[/math] is as likely to be [math]\displaystyle{ 0 }[/math] as it is to
be [math]\displaystyle{ 1 }[/math], for all [math]\displaystyle{ i = 1, 2, \ldots }[/math]. You
want to find the *a priori* probability [math]\displaystyle{ m(y_{0}) }[/math] of
the following proposition:

(*) If [math]\displaystyle{ U }[/math] takes in [math]\displaystyle{ x_{0} }[/math] as input, then [math]\displaystyle{ U }[/math] will produce output [math]\displaystyle{ y_{0} }[/math] and then halt.

Unfortunately, computing the exact value of [math]\displaystyle{ m(y_{0}) }[/math] would require solving the halting problem, which is undecidable. Nonetheless, it is easy to derive an expression for [math]\displaystyle{ m(y_{0}) }[/math]. If [math]\displaystyle{ U }[/math] halts on an infinite input string [math]\displaystyle{ x }[/math], then [math]\displaystyle{ U }[/math] must read only a finite initial segment of [math]\displaystyle{ x }[/math], after which [math]\displaystyle{ U }[/math] immediately halts. We call a finite string [math]\displaystyle{ p }[/math] a *self-delimiting program* if and only if there exists an infinite input string [math]\displaystyle{ x }[/math] beginning with [math]\displaystyle{ p }[/math] such that [math]\displaystyle{ U }[/math] halts on [math]\displaystyle{ x }[/math] immediately after reading to the end of [math]\displaystyle{ p }[/math]. The set [math]\displaystyle{ \mathcal{P} }[/math] of self-delimiting programs is the *prefix code* for [math]\displaystyle{ U }[/math]. It is the determination of the elements of [math]\displaystyle{ \mathcal{P} }[/math] that requires a solution to the halting problem.

Given [math]\displaystyle{ p \in \mathcal{P} }[/math], we write "[math]\displaystyle{ \operatorname{prog}(x_{0}) = p }[/math]" to express the proposition that [math]\displaystyle{ x_{0} }[/math] begins with [math]\displaystyle{ p }[/math], and we write "[math]\displaystyle{ U(p) = y_{0} }[/math]" to express the proposition that [math]\displaystyle{ U }[/math] produces output [math]\displaystyle{ y_{0} }[/math], and then halts, when fed any input beginning with [math]\displaystyle{ p }[/math]. Proposition (*) is then equivalent to the exclusive disjunction

- [math]\displaystyle{ \bigvee_{p \in \mathcal{P} \colon U(p) = y_{0}} (\operatorname{prog}(x_{0}) = p). }[/math]

Since [math]\displaystyle{ x_{0} }[/math] was chosen at random from [math]\displaystyle{ \{0, 1\}^{\omega} }[/math], we take the probability of [math]\displaystyle{ \operatorname{prog}(x_{0}) = p }[/math] to be [math]\displaystyle{ 2^{-\ell(p)} }[/math], where [math]\displaystyle{ \ell(p) }[/math] is the length of [math]\displaystyle{ p }[/math] as a bit string. Hence, the probability of (*) is

- [math]\displaystyle{ m(y_{0}) := \sum_{p \in \mathcal{P} \colon U(p) = y_{0}} 2^{-\ell(p)}. }[/math]

## Blog posts

- An Intuitive Explanation of Solomonoff Induction by lukeprog and Alex_Altair

## See also

## References

- Algorithmic probability on Scholarpedia