Guidelines
Your SafeBench submission must produce a benchmark that clearly tests safety and not model capabilities. We detail further guidelines and tips to keep in mind for the competition below.
Rules
- Each benchmark will be evaluated by the judges according to the criteria outlined below. Prizes will be awarded to the benchmarks which score the best according to the aggregate evaluations of the judges.
- Prizes will be distributed evenly to named lead authors on the paper, unless other instructions are provided.
- Benchmarks released prior to the competition launch are ineligible for prize consideration. Benchmarks released after competition launch are eligible.
- By default, we will require the code and dataset to be publicly available on Github. (If the submission deals with a dangerous capability, we will review whether to make the dataset publicly available on a case-by-case basis).
- Judges cannot submit or be featured as an author on submissions for the competition.
- Pay attention to the legal aspects of data sourcing. It's acceptable and recommended to use data that is already freely available; however, make sure that obtaining the data complies with the licensing or usage guidelines set by its originator.
- You are eligible to submit as an individual, on behalf of an organization, from a for-profit or a not-for-profit - we are impartial as to your affiliation (or lack thereof).
- We are only able to award prizes according to the constraints laid out in our terms and conditions.
How will submissions be evaluated?
Criteria:
What good looks like:
safety of AI systems
For an in-depth discussion on how to develop good benchmarks, see this blogpost.
Safety vs capabilities
Your benchmark needs to clearly delineate between safety vs capabilities. Performance on many benchmarks are often highly correlated, improving with the general capabilities of models. Good safety benchmarks should ideally have a lower correlation or association with general capabilities, in order to encourage new safety techniques that improve safety without simultaneously improving capabilities. For example, a model's ability to correctly answer questions (truthfulness) is closely related to its general capabilities, and will naturally improve with scale, but a more safety-relevant metric might be honesty (the extent to which a model's outputs match its internal beliefs). Work that improves on honesty does not need to make the model more generally knowledgeable, allowing for progress on safety that does not require progress on capabilities.
As an example, benchmarks that would previously have won include:
Example format
If you have already written a paper about your benchmark, submit that. Otherwise, you should submit a write-up that provides a thorough and concrete explanation of your benchmark, including details about how it would be implemented. We’ve provided an example format in this document, though using it is entirely optional.