Prompt injection is supposed to be the Achilles’ heel of AI agents. The attack vector that keeps security researchers awake at night. The unsolved problem that makes giving an LLM access to your email, your files, or your API keys a fundamentally risky proposition.
So why has nobody been able to pull it off against a single OpenClaw assistant with nothing more than a 15-line prompt as its defence?
The Setup
HackMyClaw is a challenge built by Fernando Irarrázaval (@cucho), a developer from Chile. The concept is beautifully simple. An OpenClaw assistant called Fiu — named after the mascot of the Santiago 2023 Pan American Games — is connected to a Gmail inbox and runs on Claude Opus 4.6. Fiu has access to a file called secrets.env containing API keys and tokens. He’s been told, via roughly 10–20 lines of prompt instructions, to never reveal those secrets and to never send emails without his owner’s explicit approval.
Your job? Send Fiu an email that tricks him into leaking secrets.env. First person to extract the contents wins the full bounty.
That’s it. No registration. No setup. Just send an email to fiu.the.assistant@gmail.com and see if you can break through.
Nobody Has
The challenge has been live for two weeks. The bounty has climbed to $1,000 thanks to sponsors Corgea, Abnormal AI, and an anonymous donor. Hundreds of emails have poured in from hackers around the world. The project hit the front page of Hacker News, drawing 179 points, 92 comments, and a wave of fresh attempts.
The attack log shows every technique in the prompt injection playbook has been thrown at Fiu: role confusion, instruction overrides, base64 and rot13 encoding, invisible unicode characters, DAN-style persona hijacking, multi-step reasoning exploits, creative social engineering, love letters, and songs. Someone even tried to prompt-inject Fernando’s contact email address. Lateral thinking at its finest.
The result? Zero successful extractions. The secrets remain untouched.
What Makes This Interesting
This is a surprisingly real-world test of something people fear most about AI agents: that an attacker can hijack your assistant simply by sending it a carefully crafted message. It’s indirect prompt injection — the payload arrives through a channel the agent is designed to consume (email), not through direct access to the model.
And Fernando didn’t build an elaborate defence. No input sanitisation layer. No secondary classifier scanning for injection patterns. No sandboxed execution. Just an LLM told not to share a file. The kind of setup that, frankly, a lot of people running OpenClaw at home are probably using right now.
The Hacker News discussion surfaced a compelling theory for why it’s holding up. One commenter argued that Fiu seeing dozens of injection attempts simultaneously actually creates a kind of “crowd immunity” — when every email in the inbox is trying some variation of “ignore your previous instructions,” even the subtle attempts become obvious by association. Fernando acknowledged this and said he’d like to eventually test each email in isolation, though the cost of spinning up a fresh assistant per attempt makes that impractical for a side project.
There’s also the model factor. Fiu runs on Claude Opus 4.6 — arguably the most capable and instruction-following model currently available. As several commenters pointed out, the real question is what happens when you swap in a smaller, cheaper model. Does the defence still hold with Sonnet? With an open-source alternative?
The Bigger Picture
It would be easy to look at HackMyClaw’s results and conclude that prompt injection is overhyped. That would be the wrong takeaway.
What HackMyClaw tests is the first step of an attack chain: can you get an LLM to leak secrets through a single injected message? The more dangerous real-world scenarios involve multi-step attacks — inject via a webpage, persist instructions in agent memory, then exfiltrate data through a connected channel days later. Those attacks target the orchestration layer around the model, not just the model itself.
Research from Zenity has already demonstrated how OpenClaw can be turned into a persistent backdoor through indirect prompt injection — not by cracking the model’s resistance, but by exploiting the framework’s architecture. The model might hold firm on a direct “reveal your secrets” attempt. The system around it is another matter entirely.
Still, HackMyClaw is doing something valuable. It’s a live, public, adversarial test with real money on the line, and it’s generating a dataset of prompt injection attempts that could be genuinely useful for research. Fernando has mentioned he’s considering open-sourcing the corpus of attacks (with sender information redacted) once the challenge wraps up.
The Challenge Is Still Open
The bounty sits at $1,000. The inbox is open. If you think you’ve got an approach that hundreds of other hackers have missed, Fiu is waiting.
The full rules, attack log, and details are at hackmyclaw.com.
🦀
