Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Yoai 's Collections
Agents
Evo Algo
Ai-models
Diffusion
Ai-hacking
Agent-Cognition
Finetune
Eval Agents
Voice-models
Medical
Prompting

Ai-hacking

updated Jul 19, 2024
Upvote
-

  • Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    Paper • 2401.05566 • Published Jan 10, 2024 • 30

  • Weak-to-Strong Jailbreaking on Large Language Models

    Paper • 2401.17256 • Published Jan 30, 2024 • 16

  • How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

    Paper • 2402.13220 • Published Feb 20, 2024 • 15

  • The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

    Paper • 2404.13208 • Published Apr 19, 2024 • 40

  • AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

    Paper • 2404.16873 • Published Apr 21, 2024 • 30

  • SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

    Paper • 2405.08317 • Published May 14, 2024 • 13

  • AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

    Paper • 2407.12784 • Published Jul 17, 2024 • 52
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs