# Solomonoff induction

Ray Solomonoff defined an inference system that will learn to correctly predict any computable sequence with only the absolute minimum amount of data. This system, in a certain sense, is the perfect universal prediction algorithm. To summarize it very informally, Solomonoff induction works by:

• Starting with all possible hypotheses (sequences) as represented by computer programs (that generate those sequences), weighted by their simplicity (2-n, where n is the program length);
• Discarding those hypotheses that are inconsistent with the data.

Weighting hypotheses by simplicity, the system automatically incorporates a form of Occam's razor, which is why it has been playfully referred to as Solomonoff's lightsaber.

Solomonoff induction gets off the ground with a solution to the "problem of the priors". Suppose that you stand before a universal prefix Turing machine ${\displaystyle U}$. You are interested in a certain finite output string ${\displaystyle y_{0}}$. In particular, you want to know the probability that ${\displaystyle U}$ will produce the output ${\displaystyle y_{0}}$ given a random input tape. This probability is the Solomonoff a priori probability of ${\displaystyle y_{0}}$.

More precisely, suppose that a particular infinite input string ${\displaystyle x_{0}}$ is about to be fed into ${\displaystyle U}$. However, you know nothing about ${\displaystyle x_{0}}$ other than that each term of the string is either ${\displaystyle 0}$ or ${\displaystyle 1}$. As far as your state of knowledge is concerned, the ${\displaystyle i}$th digit of ${\displaystyle x_{0}}$ is as likely to be ${\displaystyle 0}$ as it is to be ${\displaystyle 1}$, for all ${\displaystyle i=1,2,\ldots }$. You want to find the a priori probability ${\displaystyle m(y_{0})}$ of the following proposition:

(*) If ${\displaystyle U}$ takes in ${\displaystyle x_{0}}$ as input, then ${\displaystyle U}$ will produce output ${\displaystyle y_{0}}$ and then halt.

Unfortunately, computing the exact value of ${\displaystyle m(y_{0})}$ would require solving the halting problem, which is undecidable. Nonetheless, it is easy to derive an expression for ${\displaystyle m(y_{0})}$. If ${\displaystyle U}$ halts on an infinite input string ${\displaystyle x}$, then ${\displaystyle U}$ must read only a finite initial segment of ${\displaystyle x}$, after which ${\displaystyle U}$ immediately halts. We call a finite string ${\displaystyle p}$ a self-delimiting program if and only if there exists an infinite input string ${\displaystyle x}$ beginning with ${\displaystyle p}$ such that ${\displaystyle U}$ halts on ${\displaystyle x}$ immediately after reading to the end of ${\displaystyle p}$. The set ${\displaystyle {\mathcal {P}}}$ of self-delimiting programs is the prefix code for ${\displaystyle U}$. It is the determination of the elements of ${\displaystyle {\mathcal {P}}}$ that requires a solution to the halting problem.

Given ${\displaystyle p\in {\mathcal {P}}}$, we write "${\displaystyle \operatorname {prog} (x_{0})=p}$" to express the proposition that ${\displaystyle x_{0}}$ begins with ${\displaystyle p}$, and we write "${\displaystyle U(p)=y_{0}}$" to express the proposition that ${\displaystyle U}$ produces output ${\displaystyle y_{0}}$, and then halts, when fed any input beginning with ${\displaystyle p}$. Proposition (*) is then equivalent to the exclusive disjunction

${\displaystyle \bigvee _{p\in {\mathcal {P}}\colon U(p)=y_{0}}(\operatorname {prog} (x_{0})=p).}$

Since ${\displaystyle x_{0}}$ was chosen at random from ${\displaystyle \{0,1\}^{\omega }}$, we take the probability of ${\displaystyle \operatorname {prog} (x_{0})=p}$ to be ${\displaystyle 2^{-\ell (p)}}$, where ${\displaystyle \ell (p)}$ is the length of ${\displaystyle p}$ as a bit string. Hence, the probability of (*) is

${\displaystyle m(y_{0}):=\sum _{p\in {\mathcal {P}}\colon U(p)=y_{0}}2^{-\ell (p)}.}$