# User:Cousin it/AI cooperation in practice

## Problem 1: A program that proves statements about its own output

Consider a program that knows its own source code (by quining) and contains a strong enough proof checker. The program's algorithm is as follows: try all proofs of length up to nmax, looking for a proof that the program itself returns 1. If such a proof is found, return 1. Otherwise do the same for 2, then 3, then return 0.

The question is: what will the program return?

## Solution to problem 1

I believe that the program will in fact return 1. Because the proof is quite involved, I will first write out the program itself, using a JavaScript-like syntax, with large parts snipped.

function eval(code) { // Runs code and returns whatever it evaluates to. // Uses the definitions of other functions directly. ... }

function proves(code, maxsteps) { // Enumerates all proofs up to maxsteps, searching for a // proof that code evaluates to true. Returns true if found. // Uses the definitions of other functions known by quining. ... }

function outputs(x) { // Returns a string of code claiming that main returns x. return "main()==" + x; }

function implies(bool1, bool2) { // Returns true if one boolean value implies another. if (bool1) return bool2; else return true; }

function main() { var nmax = ...; // A big number if (proves(outputs(1), nmax)) return 1; if (proves(outputs(2), nmax)) return 2; if (proves(outputs(3), nmax)) return 3; return 0; }

Now for the actual proof. It will closely follow the proof of Lob's theorem, but add length limits where needed. Let n1, ... ,n6 be a suitably fast-growing sequence of numbers smaller than nmax. The number n1 should also be large enough to start with; the proof will show exactly how large. I will list twelve code snippets that are proved to return true by the program:

1. By the Diagonal Lemma, there exists a code string "box" such that the following will be proved:

eval(box) == implies( proves(box, n1), eval(outputs(1)))

2. The program can inspect the source code of main() and notice that, if the first conditional happens to be true, the program will output 1:

implies( proves(outputs(1), n6), eval(outputs(1)))

3. An obvious consequence of 1:

implies( eval(box), implies( proves(box, n1), eval(outputs(1))))

4. If we can prove 3, we can prove that we can prove it:

proves( "implies( eval(box), implies( proves(box, n1), eval(outputs(1))))", n2)

5. Pull "implies" outside of "proves", adding some length limits:

implies( proves(box, n1), proves( "implies( proves(box, n1), eval(outputs(1)))", n3))

6. Push "proves" inside "implies", adding more limits:

implies( proves(box, n1), implies( proves("proves(box, n1)", n4), proves(outputs(1), n5)))

7. If we know box has a short proof, that fact has a slightly longer proof:

implies( proves(box, n1), proves("proves(box, n1)", n2))

8. Logic on 6 and 7, using the fact that n2 < n4:

implies( proves(box, n1), proves(outputs(1), n5))

9. Logic on 8 and 2, using the fact that n5 < n6:

implies( proves(box, n1), eval(outputs(1)))

10. By luck, 9 is exactly the definition of box from 1:

eval(box)

11. If we can prove 10, we can prove that we can prove it. This is the crucial step that tells us how large n1 must be: large enough that 11 follows from 10.

proves(box, n1)

12. Logic on 9 and 11:

eval(outputs(1))

Now, If the proof checker can prove only true statements, this means the program will in fact output 1. QED.

## Problem 2: Two programs that prove statements about each other

Consider two programs, each of which knows its own source code via quining. The programs are playing a variant of the Prisoner's Dilemma: each receives as input the source code of the other, and outputs true for Cooperate or false for Defect. The algorithm of each program is as follows: try all proofs up to a certain length (the proof system and the max length depending on the program), looking for a proof that both programs output the same value (i.e. that I cooperate if and only if the other guy cooperates). If a proof is found, the program outputs true, otherwise false.

The question is: will the two programs cooperate?

## Solution to problem 2

Like in Problem 1, assume that the two programs A and B have parts named mainA and mainB, provesA and provesB, etc., accessible to both programs. Define a new function "proves" as follows:

function proves(code, maxsteps) { return provesA(code, maxsteps) && provesB(code, maxsteps); }

Now use the same solution as in Problem 1, except in the beginning:

1. By the Diagonal Lemma, construct a code string "box" so that the following will be proved by both programs:

eval(box) == implies( proves(box, n1), (mainA() == mainB()))

2. Both programs can prove the following by inspecting the source of A and B (recall that "proves" implies provesA and provesB):

implies( proves("mainA() == mainB()", n6), (mainA() == mainB()))

Steps 3 through 12 are the same as in the solution to Problem 1. They use properties of the "proves" function that hold if both proof systems are strong enough - they don't have to be equivalent. Finally we see both programs will prove that mainA() == mainB(), which implies that they will both return true. QED.