Hacking Meta's AI Chatbot - Schneier on Security

Hacking Meta’s AI Chatbot

Hackers are convincing Meta’s AI support chatbot to let them take over other peoples’ accounts:

A video posted on X showed the step-by-step process to hack someone’s Instagram account. The hacker allegedly used a VPN to spoof the targets’ presumed location to avoid triggering Instagram’s automated account protections. Then, the hacker opened a chat with Meta AI Support Assistant and asked the bot to add a new email address to the target’s account. The chatbot can be seen sending a verification code to the email address provided by the hacker; the hacker then shares the verification code with the chatbot, which prompts the chatbot to show a button to “Reset Password.” The hacker enters a new password and takes over the victim’s account.

[…]

On Monday, Instagram spokesperson Andy Stone said in a reply to Wong’s post and others that the issue was now fixed. It’s unclear how many Instagram users had their accounts improperly accessed.

It’s not that easy. Probably this particular tactic is now blocked. But there are others, many others, and they cannot be blocked as a class. The real problem is that LLM chatbots are not trustworthy enough for this application.

Another news article.

Tags: AI, chatbots, cybersecurity, hacking, LLM, Meta

Posted on June 4, 2026 at 7:04 AM • 8 Comments

Comments

Clive Robinson • June 4, 2026 9:21 AM

@ Bruce, ALL,

To err is human but to really FUp takes a computer

Is an oldish expression that with DNN AI is getting a new lease of life.

As you note,

“Hackers are convincing Meta’s AI support chatbot to let them take over other peoples’ accounts”

What works on humans, works on human mimics as well…

But for different reasons.

However you also note,

“Probably this particular tactic is now blocked. But there are others, many others, and they cannot be blocked as a class. The real problem is that LLM chatbots are not trustworthy enough for this application.”

It’s not “trustworthy” that is the issue, as that is an “observer mistake” of assuming or ascribing false beliefs about a machine having “human traits” (anthropomorphization).

The real issue is a consequence of,

“But there are other [tactics], many others, and they cannot be blocked as a class.”

It’s not just one type of class it’s many classes depending on how you want to slice and dice the vulnerability attributes.

But more importantly as I’ve previously noted there is “proof” that such tactics can not be blocked.

Whilst some of the maths proof is fairly simple (a variation on the Cantor Diagonal Argument) it is easier to think it through logically.

Let’s assume you have a guardrail system to try to recognise / catch such tactics, to do so the guardrail system has to be as, if not more powerful, than the LLM as it has to be able to have not just seen the tactic before in one instantiation, it has to see all possible instantiations of the same tactic including those that are new.

Is that possible?

Well aside from the issue of there are not enough resources. There is the issue of even simple keyed obfuscation or encryption.

If the tactic instance is encrypted in some way, and the guardrail does not have the key but the LLM does, then it’s game over for the guardrail system.

KC • June 4, 2026 10:33 AM

When Meta says: “We’re rigorously testing each of these AI systems, building in safeguards and evaluating their performance…” It does feel like some of this rigorous testing is happening in real-time.

I don’t know what safeguards Meta is using, but I am checking out Nvidia’s LLM guardrails library.

Just for the sake of example, I’d pin this particular email / pw reset hack under the category of ‘LLM Self-Check’ which would determine if the user input should be allowed for further processing.

Are there input analysis libraries for these types of things?

anonymouse random • June 4, 2026 1:37 PM

Meta’s AI usage is absurd. First, the API that the chatbot uses to send the password-reset email should send it only to previously securely-set address(es). Second, the AI should not have direct access to the password-reset tokens, so that — even if it somehow works around the first safeguard — it would be unable to send a valid password-reset token by itself.

Did Meta use AI to design and code its password-reset feature?

lurker • June 4, 2026 2:31 PM

@anonymouse, ALL

This is just a demonstration that Meta has the form of an octopus without that animal’s brain. This is a class of attack that Meta invites by its very structure. Good luck with stopping it …

ResearcherZero • June 6, 2026 2:40 AM

@KC

They indeed appear to be testing features in real-time on unsuspecting users.

Meta silently installed facial recognition into companion AI app for smart glasses.

‘https://www.independent.co.uk/tech/meta-smart-glasses-ai-surveillance-privacy-b2990407.html

Meta is planning to activate the hidden function when civil society groups are distracted.
https://www.nytimes.com/2026/02/13/technology/meta-facial-recognition-smart-glasses.html

Clive Robinson • June 6, 2026 5:18 AM

@ anonymouse random, ALL,

Your question,

“Did Meta use AI to design and code its password-reset feature?”

Will probably never be answered honestly. Because every employee is required to sign a “Non Disclosure Agreement”(NDA) that is probably not legal in the majority of places including California,

https://www.kqed.org/news/12031294/metas-efforts-to-block-explosive-expose-in-arbitration-likely-to-fail-labor-experts-say

But it still gives Suck-a-Burg enough leverage to bankrupt people (see recent goings on with “Sarah Wynn-Williams”).

The NDA is also causing issues in Southern Ireland where the newly appointed “Data Commissioner” Niamh Sweeney responsible for oversight of companies like Meta is an ex-Meta lobbyist also bound by Meta’s NDA,

https://euobserver.com/4580/outrage-as-ireland-picks-ex-meta-lobbyist-as-new-data-protection-chief/

Some would say,

“You couldn’t make this up…”

But reality when shoved by what are clearly psychopaths that have no trouble behaving unlawfully and worse a lot worse.

From what has been said you might easily think senior male management at meta are not just misogynistic, groppers but in some cases appear to think female staff are part of a harem / brothel for their benefit.

Rontea • June 7, 2026 10:05 AM

Interesting case, but I thought this attack only succeeded on Instagram accounts without multi-factor authentication.

piglet42 • June 9, 2026 11:53 AM

How can they give a chatbot the keys to the realm, I don’t understand any of this.

Hacking Meta’s AI Chatbot

Comments

Leave a comment Cancel reply