Main point
You seem to confuse two kinds of myopia. A search doesn’t have to be local to optimize for the score of the next action.
If you prefer, you can replace “model optimized by gradient descent” with “AIXI with a very short time horizon.” The analysis is unchanged.
You seem to have the implicit model that more powerful systems necessarily have longer-term preferences. But why?
Consequentialism
I agree that if you have systems engaged in longer-term consequentialist reasoning with values you don’t like, you are in trouble. (I say as much in the post.)
But I have not been talking about this kind of arrangement, and I’ve emphasized that fact. I don’t think anyone is talking about this kind of arrangement. So this probably comes back to the question from the last section: do we get consequentialism without asking for it?
My stance
The rest of this post is going wander further from the object level issue, towards what seem like core disagreements. If we want to continue the conversation about this topic, it may make most sense for you to just answer the question from the first section though and mostly ignore the remainder.
Suppose that we had a solution to the control problem that looked like it could scale to handle foreseeable developments (e.g. better optimization, better hardware, better training procedures, better model classes…).
Once the resulting systems are smart enough, they can guide internal cognitive dynamics and contribute to AI research / “self-improvement.” And if the control problem is solved, and the control technique is scalable to the kinds of further AI research that these systems are doing, then it looks like this is all fine.
So now our job is looking for future developments that the approach won’t scale to. If we can identify explicit possible problems that’s great, and if we can’t we can try and think abstractly about unknown future developments that might pose a problem.
To me this hypothetical situation sounds extremely good for our prospects on AI control. It seems like more-or-less the best we can hope for. And it seems like a very concrete and tractable project. So from my perspective this looks like a no-brainer goal to shoot for.
Your view seems to be something like: “It’s obvious that foreseeable developments won’t get us to human-level AI. What’s more, it’s very likely that the unknown future developments are the ones that are interesting from an AI control perspective. So techniques that deal with foreseeable developments are just not especially useful — it’s likely we’ll have to throw them out when we actually get to the real stuff.”
Now there is little doubt that new techniques will be needed, and some of these will probably change the nature of the AI control problem. But if you want to convince me that they will change the nature of the control problem in a particular way, it seems like you will need to make a much more precise argument.
As of now, it’s not clear to me why more sophisticated optimization is going to invalidate the kinds of arguments I want to make.
For example, you could provide some examples of the kinds of capabilities you think we might obtain and explain why they would necessarily couple capability with longer-distance agency, or with other fundamental challenges to this kind of approach.
The promise of existing techniques
Although I don’t think it is critical to this particular argument, I do find it is plausible that we will get very sophisticated AI without any fundamentally new ingredients (in the same sense that we haven’t really gotten fundamentally new ingredients in the last 25 years).
For example, you might be able to learn to think by doing reinforcement learning in a rich internal cognitive environment, bootstrapping from there. I think this is unlikely, but more like 90% unlikely than 99% unlikely.
Or we might take the expert system framework, plug in powerful learning as subsymbolic processing, and have things just work (again, with lots of knowledge about how to think). Also unlikely, but why immensely so?
These are just points, probably not even the most probable points, from the large space of approaches that might go forward without any fundamentally new ingredients.
If you can’t see how these could happen, that just seems like a failure of imagination. If you can see how it could happen, but you have arguments that it definitely won’t, I’m interested.
Even a 10% chance of existing techniques leading to very powerful AI soon would be enough to shift my priorities for AI control research. And even if there were a 0% chance, then I would still expect that these models of AI-with-no-new-developments are some of our best available models for future powerful AI. I discuss my general reasoning for focusing on the present here.