Paul Christiano
1 min readDec 17, 2017

--

I agree that calling this problem statistically possible, without further assumptions, is wrong. (I edited the post to correct.) I think I wrote the post this way because (a) I was imagining the environment interaction as limited, and considering queries to the QA checker as a computational rather than statistical limitation, since we can run them as often as we like, (b) I was comparing to the usual formulation of safe exploration, which definitely can’t be solved under any kind of “normal” statistical assumptions.

I’m imagining an assumption on the QA checker that makes the adversary’s problem tractable. I agree that if the human is completely unlearnable then the adversary is reduced to an exponential search and you are out of luck. In a similar way, supervised learning isn’t statistically possible without a carefully chosen benchmark or a statistical assumption. Articulating the analogous assumptions here is more subtle, it’s not obvious it can work, and this post doesn’t do it.

--

--

No responses yet