The important difference is that in the simulation case, your goal is only to certify that they are…

2 min readJul 17, 2018

The important difference is that in the simulation case, your goal is only to certify that they are going to hand off the keys to their simulation. That’s relatively easy, because you can simulate them up to very close to that moment (e.g. until their computers in aggregate are 10% as expensive as yours), and then you can construct lots of rough simulations of the critical moments themselves.

In general, figuring out what some simulated people are going to do in their simulation, using roughly the same amount of computing power we currently have, seems a lot easier than figuring out what they are going to do in the real world once given much more computing power. It’s a pretty special circumstance that lets us be happy with that.

In the AI alignment case, your goal would be to certify that they are going to pursue your values for the rest of time, or at least up until some critical stabilization point that we couldn’t have navigated ourselves? That seems pretty hard, and in particular it involves (by hypothesis) some key situation where you can’t understand what they are doing and so can’t verify their behavior.

That said, if you could break down this hard-to-certify key task into simpler tasks performed by individuals from the civilization, such that we could understand the decisions made by the individuals and it is sufficient to check that each individual was behaving basically nicely, then that might work. (At that point it’s basically a convoluted version of what amplification does.)

(It still only works under some assumption about a relatively near-term stabilizing decision that we can almost understand. If you want to extend beyond that, then you are back to the full analysis of amplification.)

Written by Paul Christiano

No responses yet