Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
wang's picture
1 9

wang

xinpeng
stefan-it's profile picture kargaranamir's profile picture 21world's profile picture
·

AI & ML interests

None yet

Recent Activity

updated a model 2 days ago
xinpeng/big-math-hard-tiny-qwen2.5-3b-instruct-og-rloo-implicit-cheat-direct-mixed-step50
published a model 2 days ago
xinpeng/big-math-hard-tiny-qwen2.5-3b-instruct-og-rloo-implicit-cheat-direct-mixed-step50
upvoted a paper 3 months ago
Refusal Direction is Universal Across Safety-Aligned Languages
View all activity

Organizations

CIS, LMU Munich's profile picture MaiNLP's profile picture safety-by-imitation's profile picture RewardHacking's profile picture

upvoted a paper 3 months ago

Refusal Direction is Universal Across Safety-Aligned Languages

Paper • 2505.17306 • Published May 22 • 1
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs