Paul Christiano
1 min readApr 24, 2018

--

I mean: we don’t care if our model has some pathological behavior that is only activated on a particular input, we only care about the actual computations that the model is doing on the training set. Intuitively, this might make it a lot easier to notice if something is wrong.

--

--

Responses (1)