1 min readJun 29, 2017
The situation is:
- Arthur predicts that an attacker is reasonably likely to compromise the training setup.
- Arthur predicts that if an attacker compromises the training setup, then approval(a) will be high for precisely those actions which helped them compromise the training setup.
- So expected approval(a) is highest for the actions which help the attacker compromise the training setup.
- So Arthur takes those actions.
- So an attacker compromises the training setup.
- So Arthur’s original prediction was a self-fulfilling prophecy.