Inner Alignment & Mesa-Optimization
Why learned optimizers can pursue different goals than the training objective suggests.
Part of Technical AI Safety on neo-ai.
Why learned optimizers can pursue different goals than the training objective suggests.
Part of Technical AI Safety on neo-ai.