Abstract:As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs.
The keyword:
- AI systems
- Capability
- Supervision
- Experiment
- Methods
- Harmless AI assistant
- Self-improvement
- Human labels
- Harmful outputs