# User:Cousin it/AI cooperation in practice

## Problem 1: A program that proves statements about its own output

Consider a program that knows its own source code (by quining) and contains a strong enough proof checker. The program's algorithm is as follows: try all proofs of length up to nmax, looking for a proof that the program itself returns 1. If such a proof is found, return 1. Otherwise do the same for 2, then 3, then return 0.

The question is: what will the program return?

## Solution to problem 1

I believe that the program will in fact return 1. Because the proof is quite involved, I will first write out the program itself, using a JavaScript-like syntax, with large parts snipped.

   function eval(code)
{
// Runs code and returns whatever it evaluates to.
// Uses the definitions of other functions directly.
...
}

   function proves(code, maxsteps)
{
// Enumerates all proofs up to maxsteps, searching for a
// proof that code evaluates to true. Returns true if found.
// Uses the definitions of other functions known by quining.
...
}

   function outputs(x)
{
// Returns a string of code claiming that main returns x.
return "main()==" + x;
}

   function implies(bool1, bool2)
{
// Returns true if one boolean value implies another.
if (bool1)
return bool2;
else
return true;
}

   function main()
{
var nmax = ...; // A big number
if (proves(outputs(1), nmax))
return 1;
if (proves(outputs(2), nmax))
return 2;
if (proves(outputs(3), nmax))
return 3;
return 0;
}


Now for the actual proof. It will closely follow the proof of Lob's theorem, but add length limits where needed. Let n1, ... ,n6 be a suitably fast-growing sequence of numbers smaller than nmax. The number n1 should also be large enough to start with; the proof will show exactly how large. I will list twelve code snippets that are proved to return true by the program:

1. By the Diagonal Lemma, there exists a code string "box" such that the following will be proved:

  eval(box) == implies(
proves(box, n1),
eval(outputs(1)))


2. The program can inspect the source code of main() and notice that, if the first conditional happens to be true, the program will output 1:

  implies(
proves(outputs(1), n6),
eval(outputs(1)))


3. An obvious consequence of 1:

  implies(
eval(box),
implies(
proves(box, n1),
eval(outputs(1))))


4. If we can prove 3, we can prove that we can prove it:

  proves(
"implies(
eval(box),
implies(
proves(box, n1),
eval(outputs(1))))",
n2)


5. Pull "implies" outside of "proves", adding some length limits:

  implies(
proves(box, n1),
proves(
"implies(
proves(box, n1),
eval(outputs(1)))",
n3))


6. Push "proves" inside "implies", adding more limits:

  implies(
proves(box, n1),
implies(
proves("proves(box, n1)", n4),
proves(outputs(1), n5)))


7. If we know box has a short proof, that fact has a slightly longer proof:

  implies(
proves(box, n1),
proves("proves(box, n1)", n2))


8. Logic on 6 and 7, using the fact that n2 < n4:

  implies(
proves(box, n1),
proves(outputs(1), n5))


9. Logic on 8 and 2, using the fact that n5 < n6:

  implies(
proves(box, n1),
eval(outputs(1)))


10. By luck, 9 is exactly the definition of box from 1:

   eval(box)


11. If we can prove 10, we can prove that we can prove it. This is the crucial step that tells us how large n1 must be: large enough that 11 follows from 10.

   proves(box, n1)


12. Logic on 9 and 11:

   eval(outputs(1))


Now, If the proof checker can prove only true statements, this means the program will in fact output 1. QED.

## Problem 2: Two programs that prove statements about each other

Consider two programs, each of which knows its own source code via quining. The programs are playing a variant of the Prisoner's Dilemma: each receives as input the source code of the other, and outputs true for Cooperate or false for Defect. The algorithm of each program is as follows: try all proofs up to a certain length (the proof system and the max length depending on the program), looking for a proof that both programs output the same value (i.e. that I cooperate if and only if the other guy cooperates). If a proof is found, the program outputs true, otherwise false.

The question is: will the two programs cooperate?

## Solution to problem 2

Like in Problem 1, assume that the two programs A and B have parts named mainA and mainB, provesA and provesB, etc., accessible to both programs. Define a new function "proves" as follows:

  function proves(code, maxsteps)
{
return provesA(code, maxsteps) && provesB(code, maxsteps);
}


Now use the same solution as in Problem 1, except in the beginning:

1. By the Diagonal Lemma, construct a code string "box" so that the following will be proved by both programs:

  eval(box) == implies(
proves(box, n1),
(mainA() == mainB()))


2. Both programs can prove the following by inspecting the source of A and B (recall that "proves" implies provesA and provesB):

  implies(
proves("mainA() == mainB()", n6),
(mainA() == mainB()))


Steps 3 through 12 are the same as in the solution to Problem 1. They use properties of the "proves" function that hold if both proof systems are strong enough - they don't have to be equivalent. Finally we see both programs will prove that mainA() == mainB(), which implies that they will both return true. QED.