User:Cousin it/AI cooperation in practice
Problem 1: A program that proves statements about its own output
Consider a program that knows its own source code (by quining) and contains a strong enough proof checker. The program's algorithm is as follows: try all proofs of length up to nmax, looking for a proof that the program itself returns 1. If such a proof is found, return 1. Otherwise do the same for 2, then 3, then return 0.
The question is: what will the program return?
Solution to problem 1
I believe that the program will in fact return 1. Because the proof is quite involved, I will first write out the program itself, using a JavaScript-like syntax, with large parts snipped.
function eval(code) { // Runs code and returns whatever it evaluates to. // Uses the definitions of other functions directly. ... }
function proves(code, maxsteps) { // Enumerates all proofs up to maxsteps, searching for a // proof that code evaluates to true. Returns true if found. // Uses the definitions of other functions known by quining. ... }
function outputs(x) { // Returns a string of code claiming that main returns x. return "main()==" + x; }
function implies(bool1, bool2) { // Returns true if one boolean value implies another. if (bool1) return bool2; else return true; }
function main() { var nmax = ...; // A big number if (proves(outputs(1), nmax)) return 1; if (proves(outputs(2), nmax)) return 2; if (proves(outputs(3), nmax)) return 3; return 0; }
Now for the actual proof. It will closely follow the proof of Lob's theorem, but add length limits where needed. Let n1, ... ,n6 be a suitably fast-growing sequence of numbers smaller than nmax. The number n1 should also be large enough to start with; the proof will show exactly how large. I will list twelve code snippets that are proved to return true by the program:
1. By the Diagonal Lemma, there exists a code string "box" such that the following will be proved:
eval(box) == implies( proves(box, n1), eval(outputs(1)))
2. The program can inspect the source code of main() and notice that, if the first conditional happens to be true, the program will output 1:
implies( proves(outputs(1), n6), eval(outputs(1)))
3. An obvious consequence of 1:
implies( eval(box), implies( proves(box, n1), eval(outputs(1))))
4. If we can prove 3, we can prove that we can prove it:
proves( "implies( eval(box), implies( proves(box, n1), eval(outputs(1))))", n2)
5. Pull "implies" outside of "proves", adding some length limits:
implies( proves(box, n1), proves( "implies( proves(box, n1), eval(outputs(1)))", n3))
6. Push "proves" inside "implies", adding more limits:
implies( proves(box, n1), implies( proves("proves(box, n1)", n4), proves(outputs(1), n5)))
7. If we know box has a short proof, that fact has a slightly longer proof:
implies( proves(box, n1), proves("proves(box, n1)", n2))
8. Logic on 6 and 7, using the fact that n2 < n4:
implies( proves(box, n1), proves(outputs(1), n5))
9. Logic on 8 and 2, using the fact that n5 < n6:
implies( proves(box, n1), eval(outputs(1)))
10. By luck, 9 is exactly the definition of box from 1:
eval(box)
11. If we can prove 10, we can prove that we can prove it. This is the crucial step that tells us how large n1 must be: large enough that 11 follows from 10.
proves(box, n1)
12. Logic on 9 and 11:
eval(outputs(1))
Now, If the proof checker can prove only true statements, this means the program will in fact output 1. QED.
Problem 2: Two programs that prove statements about each other
Consider two programs, each of which knows its own source code via quining. The programs are playing a variant of the Prisoner's Dilemma: each receives as input the source code of the other, and outputs true for Cooperate or false for Defect. The algorithm of each program is as follows: try all proofs up to a certain length (the proof system and the max length depending on the program), looking for a proof that both programs output the same value (i.e. that I cooperate if and only if the other guy cooperates). If a proof is found, the program outputs true, otherwise false.
The question is: will the two programs cooperate?
Solution to problem 2
Like in Problem 1, assume that the two programs A and B have parts named mainA and mainB, provesA and provesB, etc., accessible to both programs. Define a new function "proves" as follows:
function proves(code, maxsteps) { return provesA(code, maxsteps) && provesB(code, maxsteps); }
Now use the same solution as in Problem 1, except in the beginning:
1. By the Diagonal Lemma, construct a code string "box" so that the following will be proved by both programs:
eval(box) == implies( proves(box, n1), (mainA() == mainB()))
2. Both programs can prove the following by inspecting the source of A and B (recall that "proves" implies provesA and provesB):
implies( proves("mainA() == mainB()", n6), (mainA() == mainB()))
Steps 3 through 12 are the same as in the solution to Problem 1. They use properties of the "proves" function that hold if both proof systems are strong enough - they don't have to be equivalent. Finally we see both programs will prove that mainA() == mainB(), which implies that they will both return true. QED.