Scale Safety Research

Enterprise

community

AI & ML interests

None defined yet.

Recent Activity

dpaleka authored a paper 2 days ago

Pitfalls in Evaluating Language Model Forecasters

abhayesian updated a collection 8 days ago

Alignment Faking Datasets

abhayesian updated a collection 8 days ago

Alignment Faking Datasets

View all activity

scale-safety-research's activity

dpaleka

authored a paper 2 days ago

Pitfalls in Evaluating Language Model Forecasters

Paper • 2506.00723 • Published 5 days ago • 3

abhayesian

updated a collection 8 days ago

Alignment Faking Datasets

11 items • Updated 8 days ago

abhayesian

updated a collection about 2 months ago

Gemma 2 9b Emergent Misalignment

6 items • Updated Apr 16

abhayesian

updated a dataset 2 months ago

scale-safety-research/new_rlhf_not_purely_good_docs

Viewer • Updated Mar 27 • 13.6k • 23