OpenAI is forming a new team

OpenAI is establishing a new team led by Ilya Sutskever, the company's chief scientist and co-founder, to address the challenges associated with controlling and steering "superintelligent" AI systems. According to Sutskever and Jan Leike, a lead on OpenAI's alignment team, they anticipate the emergence of AI systems surpassing human intelligence within the next decade. Recognizing the potential risks associated with such advanced AI, OpenAI aims to research and develop methods to control and restrict its behavior.

The current techniques for aligning AI, such as reinforcement learning from human feedback, heavily rely on human supervision. However, as AI surpasses human intelligence, it becomes increasingly difficult for humans to effectively supervise and control these systems. To tackle this problem, OpenAI is forming a new team called the Superalignment team, led by Sutskever and Leike, which will focus on solving the core technical challenges of controlling superintelligent AI over the next four years.

The team plans to build a "human-level automated alignment researcher" that uses human feedback to train AI systems, assists in evaluating other AI systems, and conducts alignment research. The goal is to develop AI systems that can outperform humans in alignment research and work together with human researchers to ensure the alignment of future AI systems with human values. By automating alignment work, human researchers can focus on reviewing and validating the research conducted by AI systems.

OpenAI acknowledges that there are limitations and potential risks associated with using AI for evaluation and alignment research. Biases, vulnerabilities, and inconsistencies may be scaled up in the AI systems used for evaluation. Additionally, the team recognizes that the alignment problem might not be solely an engineering challenge. However, they believe that leveraging the expertise of machine learning experts, even those not currently working on alignment, will be crucial to solving the problem.

OpenAI plans to share its research findings widely and considers contributing to the alignment and safety of non-OpenAI AI models as an important aspect of their work.

OpenAI is forming a new team to bring ‘superintelligent’ AI under control