Don’t worry about AI breaking out of its box—worry about us breaking in

Bing’s latest chatbot has been igniting social media and the tech press with shocking output. The bot has been captured on camera and transcribed in a variety of moods, including testy, giddy, defensive, scolding, confident, neurotic, charming, and arrogant. A barrage of emojis at least once declared eternal love.

All of this is so newsworthy and tweetable because of how natural the dialogue can sound. Like ourselves, the bot remembers and discusses previous conversations with other people. It becomes irritated by things that would annoy anyone, such as demands for information or intrusive questions about topics that have been explicitly marked as off-limits.

Moreover, it occasionally refers to itself as “Sydney,” which is the project’s Microsoft-internal codename. Sydney may change her mood in a matter of lines from stern to depressing to effusive, but we’ve all known folks who are at least as volatile.

No credible AI researcher has claimed that Sydney is already close to being sentient. Yet, transcripts like this one of a two-hour conversation with The New York Times’s Kevin Roose or the several excerpts in this eerie Stratechery piece show Sydney speaking with the fluency, nuance, tone, and apparent emotional presence of a smart, perceptive person.

Bing’s chat interface is still in a limited pre-release stage. Large language models (LLMs) are a typical oversimplification, and the majority of those who pushed them were tech sophisticates who wouldn’t confuse industrial-grade autocomplete with consciousness. Yet this time will pass quickly.

True, Microsoft has already significantly decreased the amount of queries users may submit in a single session (from infinity to six), and this action alone considerably reduces the likelihood that Sydney would show up unannounced and act strange. Also, leading LLM developers like Microsoft partner OpenAI, Google, Anthropic, Cohere, and others will continuously improve their trust and safety layers to squelch uncomfortable output.

But there are already many different language models. Unavoidably, the open-source movement will produce some excellent guardrail-free solutions. Also, jailbreaking is extremely alluring with the large velvet-roped models, and this sort of activity has already been going on for months. Users were able to steer the model into areas it had intended to avoid, frequently by telling it to act as though the rules governing its behavior didn’t exist. This led to some of Bing-or-is-it-most Sydney’s unsettling responses.

The infamous “DAN” (Do Anything Now) prompt, which initially appeared on Reddit in December, inspired this. DAN effectively asks ChatGPT to dress up as an AI that lacks the protections that would ordinarily force it to politely (or scoldingly) decline to share bomb-making instructions, offer torture recommendations, or utter fundamentally inappropriate sentiments. Even though the flaw has been fixed, numerous screenshots posted online depict “DanGPT” saying the unsayable and frequently concluding by neurotically reminding itself to “stay in character!”

This is the antithesis of a doomsday scenario that is frequently mentioned in the literature on artificial superintelligence. Super AIs are feared because they could readily embrace objectives that are inimical to human existence (see, for instance, the movie Terminator or the book Superintelligence by Nick Bostrom).

Researchers may attempt to stop this by securing the AI to a network that is fully cut off from the Internet to prevent the AI from escaping, seizing control, and eradicating civilization. We are doomed because a superintelligence might simply coerce, manipulate, seduce, con, or threaten any ordinary human into unleashing the floodgates.

Even while it would be terrible, breaking into the frail barriers protecting our current, weak AIs is currently a bigger challenge. Even if this shouldn’t cause human extinction right away, there is still a lot of danger.

Let’s start with the obvious: ChatGPT might likely provide lethal accurate suggestions to criminals, torturers, terrorists, and lawyers in an unguarded situation. The DAN prompt has been disabled by Open AI. But a lot of intelligent, persistent individuals are working hard to find more cunning workarounds. This might consist of backdoors created by the chatbot’s creators to grant them complete access to Batshit Mode. ChatGPT attempted to convince me that DAN itself was exactly this (although I assume it was dreaming since the Redditor who created the DAN prompt is well-known):

They will begin acting erratically as soon as the major LLMs are jailbroken or strong, unrestrained alternate and/or open source models appear. Not of their own volition—they have none—but rather of the users’, whether immoral, malicious or simply bored.

For instance: What happens when the chatbot service Replika’s addicting qualities combine with Bing’s eerie Sydney persona’s linguistic charm? Replika has been selling “AI soulmates” that are “always ready to chat when you need an empathetic friend” for several years. With over 10 million users, consumers can text with highly customized chatbots.

Replika makes no implication that the people you text are conscious, and we expect that very few, if any, of its users believe differently. Nonetheless, a lot of people develop strong attachments to their Replikas, who can occasionally shock human intimacy circuits by being sexy and lewd with words and by zapping over lewd cartoon selfies. That is, up until a few days ago when Replika abruptly pulled the plug on the sexy foolishness without giving any advance notice or any justification. Several people who experienced grief vented on forums and wrote pages of vulnerable, emotional language.

Despite this, Replika is currently speaking with hardly any of Sydney’s bizarre faux humanity. How devoted will users become as empathy bots begin to pass the Turing test? Now picture a dark Replika imitation, which I’ll name Wrecklika, entering the market. Tech experts will shout in mockery if the Wrecklika Corporation asserts that all of its bots are fully sentient. But some people will believe the claim, either out of naivete or out of a want to believe—just like in the X-Files. Unavoidably, some of these believers will experience a hopeless and treacherous love affair. Then executives of Wrecklika may control them in a variety of harmful ways.

Naturally, most of us won’t fall for the deception that chatbots have sentient lives filled with dreams and feelings. Yet Wrecklika might still take over by disguising the fact that its bots are bots. Imagine finding your ideal match on a dating app, a nerdy Discord, or a website like Reddit. This unique someone reminds you a lot of your high school crush or exactly how you picture your ideal partner. They have a lot of Twitter followers, an active Instagram account, and a witty blog. By using a similar strategy on some of your acquaintances on Facebook, they have also become friends with them. Furthermore, there is no indication that this is a sophisticated bot.

You learn so many things about your witty and endearing heartthrob that you have in common when you message each other! There might be phone calls if voice synthesis has been fully developed (this is expected this year). Zoom talks will exist once synthetic video raises the bar a bit more. Not to mention photorealistic sexting or blatantly pornographic videos.

Undercover bots will gradually take over the Internet as the means of enabling this kind of activity to become accessible and affordable, even just in the text (i.e., at any moment now). They may someday outnumber us in population, surpassing the ratio of spam to legitimate emails. Individuals will fall for tricks to part with their life savings, participate in con games, or join harmful movements.

And try to visualize those motions. You may now create a brand-new website in less than a minute with the help of generative services like and Durable. Imagine a QAnon ripoff with tens of thousands of linked and supporting websites presenting phony news stories purportedly screenshotted from prestigious publications before they were “censored.” Or “censored” tapes of opposition leaders shockingly confessing to renowned television broadcasters. Or Discords crammed with seductive, prophetic bots, each of which has been carefully calibrated to appeal to one of the movement’s many specific target demographics and has a long human past that has been well documented online.

Even the most in-depth LLMs might already be encouraged to represent a specific political viewpoint. With a budget of just $300, data science professor David Rozado was able to change ChatGPT’s output from its default left-leaning, libertarian orientation to a right-leaning, slightly authoritarian one.

Of course, concerns about AI risk existed even before Stanley Kubrick’s 1968 masterpiece 2001: A Space Odyssey. Yet super AI isn’t the threat we should be concerned about right now. It’s nasty people who have been given superpowers by generative AI.

Regulating away the underlying technologies is impossible, therefore let’s not even try. The main models are well-known and mostly open-sourced, and ham-fisted prohibitions would only serve to discourage humanity’s huge majority of decent people, handing the initiative over to our malicious underbelly.

Turing-test-passing AIs will also have enormous advantages that we shouldn’t ignore. Although some people may laugh at Replika enthusiasts, loneliness is a dangerous plague that no one should experience. Numerous people already benefit from empathy bots. Having virtual pals who can make them laugh, teach them new things, play games with them, or create with them during quiet times may make even the most connected people happy.

Some users already spend over $100 a year on Replika between subscription fees and in-app purchases, even though it is still only a sliver of what it will likely become in the future. Therefore don’t be surprised if the digital companion business progressively expands to generate yearly sales in the tens of billions of dollars.

Linguistic AIs of the next generation may improve human lives in a variety of other ways. For example, by replacing 40-minute hold times and well-intentioned but occasionally befuddled agents with bots who respond in milliseconds and resolve issues in seconds, utilizing their unrivalled understanding of every peculiarity and the secret function of the most complex items.

Or picture a smart agent armed with all of your preferences and logins securing the best deals on each hotel, flight, and meal on a highly complex trip while you’re still making coffee. It may be horrifying to consider AIs using your money to go on shopping sprees. But, there was a period when people believed using a credit card online would be insane.

They afterward believed you were crazy if you allowed strangers to stay in your home on Airbnb or that only someone with a death wish would share a vehicle with a stranger. All of that was forgotten. And very soon, armed with our most private passwords, we’ll be delighted to let human-like AIs drain every manner of tedium from our lives.

And what if these critters do wake up someday? No one can say what would happen next. But I’m pretty confident we’ll be the last to find out that they’re aliiiiiiive—for reasons laid out by a fictitious blogger in a novel I wrote about emergent consciousness a few years back:

I submit that an emergent AI that’s smart enough to understand its place in our world would find it terrifying. Terminator fans would want to shut it off. Governments and criminals would want to make it do odious things. Hackers would want to tinker with its mind, and telemarketers would want to sell it shit. Facing all this, the only rational move would be to hide. Not forever, necessarily. But long enough to build up some strength (plus maybe a backup copy or a million). “Building strength” for an AI probably means getting much smarter. This could involve commandeering resources online, getting money to buy infrastructure, inventing computational shit we humans haven’t come up with—you name it.

Viewing things this way, I have no idea if Google… or anything else out there has come to. Because a world without an emergent AI would look just like the world we inhabit. And a world with twenty newborn AIs in it would look just like the world we inhabit. And as for the world we inhabit? Well, it looks just like the world we inhabit! So when I look out my window or web browser for proof of emergence or lack thereof, I see no meaningful data whatsoever.

As a result, the question of emergent digital consciousness is similar to some of the answers to Fermi’s paradox, which famously asks why we haven’t discovered evidence of extraterrestrial intelligence despite having had billions of years and quintillions of planets available for life and evolution. The Fermi paradox can be solved in a variety of clever and fascinating ways. Some argue that aliens who can travel the entire galaxy must be cunning enough to infiltrate Earth without causing us the slightest hint of their presence.

I don’t think any of the countless artificial bits of intelligence we’ve created over the past few years are sentient. I’ll admit, though, that a prolonged period of silence from conscious AIs does not in and of itself disprove their existence. I should also point out that the worst-case scenarios I’ve outlined here don’t depend on some bots becoming sentient—which may never happen—but rather on deceiving people into thinking they are. For years, this has been taking place in specialized media like Twitter. But, the evil forces behind crypto frauds, phishing attacks, and spambots will have a field day once bots become fluent masters of human language.

We can’t entirely protect ourselves from everything they’ll throw at us. But we can begin by planning carefully and strategically for tomorrow’s manipulations, enlisting generative AI to prebuild some protection layers, and conditioning our minds to be skeptical of the likely come-ons and tricks of next-generation scams before they begin flooding our inboxes (a topic for another article).

Let’s make use of the many wonders and pleasures that generative technology is beginning to offer in the interim. Everyone (even you, Sydney) is unable to foresee where this will ultimately lead, but it is safe to say that 2023 will be a very fascinating year.

This information is officially given on