I don’t mean to include “leverage Alice’s translation abilities by training an AI to imitate her”…

2 min readMar 16, 2018

I don’t mean to include “leverage Alice’s translation abilities by training an AI to imitate her” as part of the alignment problem. It’s part of the broader scope I’m not intending to solve.

It’s clear that problems of this form are insoluble in general (consider the alien message case). In practice, I don’t expect them to have a clean solution.

With respect to the particular case of humans pursuing strange goals under strange conditions, it’s not actually clear what the “right” answer is. If I pursue goals X under conditions Y, and under rational behavior I in fact experience conditions Y from time to time, then in a naive accounting I would normally include X as part of the basket of goals that humanity is pursuing (with a weight determined by its influence). New technology can change the relative influence of different parts of human value, and such changes will be disagreeable to the incumbent distribution of value.

I expect in the long run we’ll have a clearer understanding of these issues (in addition to better coordination mechanisms), and that future society would have no trouble resolving issues like this (presumably gradually entrenching whatever distribution of values exists as our technology/understanding improves, though I don’t know if that’s really what would happen and the decision theory is complicated if nothing else). I don’t think we can resolve these issues with clever tricks. I don’t think it destroys a large amount of value per year.

With respect to competition amongst AI systems in the relatively near term, some of whom learn recklessly from insecure human abilities, I think (a) this will mostly be handled by aligned AI systems better understanding psychology and being careful about how they query humans, rather than by any unified understanding or approach to AI design, (b) I don’t think this is a big issue: (i) I think most human abilities will be quickly obsolete so this only matters during a transient period, (ii) the insecurity is not that large or complicated, so it can be handled by crude means, (iii) I suspect these holes don’t (mostly) systematically favor any particular values and so just impose a tax on everyone (like introducing a new class of cybersecurity vulnerability) rather than systematically shifting the distribution of values.

I could be convinced that this (or any other particular question) is much more important than I currently estimate. I don’t see all of these questions as likely to be addressed by the same technology, so I see this mostly as a big constellation of claims rather than a single claim.

I do agree that there is a common ingredient of a certain kind of philosophical progress, which we’d need in order to (a) prioritize these problems, (b) make progress on them and understand their shape.

Written by Paul Christiano

No responses yet