GPT2, Counting Consciousness and the Curious Hacker

Disclaimer: I would like it to be made very clear that I am absolutely 100% open to the idea that I am wrong about anything in this post. I don’t only accept but explicitly request arguments that could convince me I am wrong on any of these issues. If you think I am wrong about anything here, and have an argument that might convince me, please get in touch and present your argument. I am happy to say “oops” and retract any opinions presented here and change my course of action.

As the saying goes: “When the facts change, I change my mind. What do you do?”

TL;DR: I’m a student that replicated OpenAI’s GPT2–1.5B. I plan on releasing it on the 1st of July. Before criticizing my decision to do so, please read my arguments below. If you still think I’m wrong, contact me on Twitter @NPCollapse or by email ([email protected]) and convince me. For code and technical details, see this post.

This essay is split into three parts. The first lays out the background of what GPT2 is and why it is noteworthy. In the second (long) part I lay out my thoughts about what GPT2 means for our society. In part three, I talk about a bit of a tangential topic that I think is valuable to our general discussion about AI safety, and was the ultimate reason I started this project: The mind of the curious hacker.

I apologize for the length, but I had so much to say, and could have easily written twice as much. I appeal to you to please withhold judgement as you read and follow me through to the end. I often branch off into seemingly unrelated tangents, but if you stick to it, I promise I get back to an interesting point (usually…hopefully…).

Not too long ago, OpenAI revealed details about their latest experiment in language creation by an AI system, GPT2. This wasn’t the first time such a system had been built and even though its results far outshone any other attempts to date, that arguably wasn’t what drew the most attention to this project. OpenAI decided not to release the full scale version of their model (1.5B), because they were afraid of its potential security implications, in particular in generating fake news. They did release a smaller version (117M) and later a middle sized version (345M). In theory, there was no significant barrier to scaling up 117M or 345M to 1.5B, but there was a practical one: Compute. The estimated cost to create 1.5B is around 40k$ in cloud computing. This follows a long trend in state of the art AI becoming more and more computationally expensive.

The decision to not release their model into the open, despite the “Open” in their name, garnered a wide range of responses. But to this day, 1.5B remains unreleased (except to a few research partners) and, to my knowledge, unreplicated. No individual or reasonable academic research group would have access to enough resources to create 1.5B from scratch.

Well, I replicated 1.5B.

I’m not working under the direction of any government, or university, or large corporation (though I feel like I owe Google my firstborn child or something for the amount of free support they’ve given me). I’m just a curious undergrad student that spends his free time experimenting with AIs instead of going outside and talking to girls.

In this essay, I would like to explain not the technical details of how I did it (I have done so in another post), but rather I’d like to take this opportunity when I have some eyes on me to discuss my views on the whole GPT2 situation and what it means for AI and AI safety.

Why does this matter?

If you take a look at the samples of GPT2 at work, you should pretty quickly get why this might be a big deal. They are at times eerily coherent, in style if not content. This example in particular is somewhat hilarious yet also worrying.

What differentiates GPT2 from many other models is how fully generic it is. It was trained on all websites linked to from with at least three karma (a kind of human prefiltering of quality). I’ve experimented quite extensively with the smaller models, and have had great fun with it. Me and friends have had literal hours of fun trying to get the thing to generate funny texts for us to read aloud to each other (We’ve found it makes especially hilarious religious rants if you feed it with biblical quotes). Me and a friend have been developing a video game incorporating GPT2 and even once during a party we sat down around a piano and prompted the AI until it spit out something resembling lyrics and so my buddy Sebastian (@shyteagames) turned it into an impromptu space opera musical (Not something I was expecting to be doing that evening, but hell was it fun).

Now of course, just like most of my diet and exercise habits, anything that is fun has to be bad for you. The same generality that lets GPT2 create hilariously dadaist erotic fanfics about your friends also means it can do lots of less frivolous things: Fake reviews, inflammatory political messages (it loves to talk about Brexit) and other things the internet already has way too much of. For these reasons, OpenAI decided to not release the full model (which generates far more convincing text than the small ones). But did it work? Was it the right choice? What does it mean that someone like me has replicated this “dangerous” technology?

A watershed moment?

“Morality is how the world ought to work, economics is how the world actually works.”

Lets take a step back. Or lots of steps. What is it that makes GPT2 potentially dangerous? What properties does it have that we need to be wary of? How do those properties affect the real world?

The main concern people have is that GPT2 could royally mess up the online landscape even more than it’s already messed up. Ok, well how would it do that? The argument goes that using GPT2, malicious actors could produce a veritable flood of fake content in order to push a narrative, manipulate reviews for products or just be extremely annoying. But how does GPT2 enable that? GPT2 produces text. Humans can produce text. So that isn’t anything new. No, GPT2 (and AI and technology in general) brings us back to one of the economist’s favorite tools: Cost.

The argument is that, using a technology such as GPT2, we aren’t engaging in a wholly new activity per se, but we are lowering the cost to generate convincing text. If you are thinking “Well, that sounds incredibly boring”, you are sorely missing out on how amazing economics is (I used to think economics is just like business and money and stuff. Boy was I wrong, economics is among the coolest sciences there is). By lowering the cost of a product, you can sometimes allow completely new industries and applications to become feasible. Take the obvious example, computers. You could, in principle, have done basically any computation you can nowadays on a 1980s computer, if you built it large enough and waited long enough. But in practice the cost (in both money and time) would have been astronomical for many of the applications we now consider routine. There is a trend that software growth seems to track hardware growth; as hardware becomes faster, new kinds of algorithms become possible. But an economist doesn’t hear “faster”, he hears cheaper.

So, hypothetically, GPT2 could represent a kind of watershed moment, in which it lowers the cost of producing text so much that it enables some kind of novel malicious activity. And I can see the point to this argument, but I think the devil is really in the details here.

And to explain my reasoning, we’re going to have to take even more steps back. Please tolerate my extremely nitpicky deconstruction of the problem, I think it leads us to an interesting conclusion.

Why “defense” must fail and may be more dangerous than helpful

I want to make something very clear: I take AI safety extremely seriously (though with a few caveats I’ll discuss in Part 3). I’m from the MIRI school of thought that AI and its safe use is the most important issue of our time. But, similar to MIRI, my thoughts on the topic are a bit different from many mainstream views. (Just to be clear: I am in no way affiliated with MIRI, though I wish I was)

With that said, I think a lot of the thoughts around GPT2 are, unfortunately, poisoned by our current political meme-sphere. This isn’t a political post, I promise, but I do believe that the term “fake news” has become, at this point, one hell of a meme. I’m not dismissing the existence or danger of fake news, but I feel like the term has become so vague and politicized that people aren’t always putting in enough thought about what fake news actually is, how much of a threat it is, and how to effectively combat it.

I can really recommend this series of mini documentaries on the topic. They’re really well made, and the fact Lithuania has an army of “elves” that counter-troll trolls is the single most amazing thing I’ve heard in a long time.

Fake news is real, pervasive and potentially very harmful. What fake news most definitely is not is new.

There is a tendency in every generation to think everything is so much worse than it used to be and, if one actually does a bit of reading on history, this quickly turns out to be one of the most ludicrous beliefs a majority of people hold. I highly recommend the books “The Better Angels of Our Nature” and “Enlightenment Now” by Steven Pinker and “Factfulness” by Hans Rosling to get a bit of a taste of how amazingly better things are nowadays, and how incomprehensibly hellish the world used to be in so many ways.

The views many people have about fake news also fall into this category. Yes, there are people that think vaccines cause autism. Yes, there are people that think the world is flat. Yes, there are people that hold even more dangerous views. And yes, often they gain these bizarre and sometimes dangerous ideas from the internet.

But have you SEEN the premodern era?

Yes, getting all of your news about the world off of your politicized Facebook feed isn’t good. You know what’s worse? Getting your views on not just politics, but also physics, biology, economics and who-to-burn-at-the-stake from your local religious official. This was the default for most of human history. I’d much rather have flat earthers than the Spanish inquisition, thank you very much (Didn’t expect to see them in this post, did you?).

“Ok”, you might be thinking, “Sure, fine. But we can do better than this! We still need to oppose falsehood, even if it’s less bad than it used to be.” And this I fully agree with. I’m not saying we shouldn’t oppose falsehood, actually the exact opposite.

Lately OpenAI released a dataset of sample outputs from GPT2, with the expressed purpose being to help develop counter measures to detect such fake text. I think this is not only misguided, but downright dangerous. The reason is simple:

The truth isn’t free.

Not only isn’t it free, truth is costly. Finding out the truth about the world takes work, often a lot of work. If you want to know the truth about something, you will have to invest the work to find that truth, there’s no way around it.

Well, there is one possible way around it, the ultimate reason why we humans are so much more successful than every other creature on this planet in our achievements.

The most valuable commodity

Riddle me this: What is the most valuable commodity in a human economy?

If you thought of “money” or “oil”, I’d argue you aren’t thinking abstract enough. No, I’d argue the correct answer is “trust”.

What makes humans different from other animals? Why can we send spaceships to the moon and talk to people on the opposite side of the planet by literally shooting laser beams underneath the oceans? The obvious answer would be “intelligence”, but I think intelligence is only a part of the story. Take the smartest human to have ever lived, or lets say a newborn clone of them, and put them in a cave. No education, no language, nothing. And now have them try to invent industrial technology. You’re going to be waiting a long time.

No, intelligence is an enabling factor for something even more powerful: Cooperation. By working together, humans are capable of achieving feats of physical and intellectual prowess that would be completely impossible for an individual. Imagine a world where humans literally never worked together, ever. There would be no accumulation of knowledge over the generations. Every time anyone figured out something useful, the next generation would forget it and have to start from scratch. Some generations and individuals would be better off, some wouldn’t, but overall absolutely nothing would change, there would be no progress. This is basically the state Chimpanzees are in.

But humans invented something powerful, the ability to trust and cooperate with complete strangers. Ultimately our success as humans has been in accumulating more and more useful cultural knowledge through these wide ranging interactions. I could write about this at length, but you should instead read “Sapiens” and “Homo Deus” by Yuval Harari, “From Bacteria to Bach and Back” by Daniel Dennett and other much better treatise on the topic.

What matters for our discussion here is this: How dangerous and/or useful a piece of information is depends on how much we trust it.

If everyone immediately recognized fake news on sight and just didn’t trust it, there would be no threat from it other than it clogging up bandwidth. But we’ve already established that this can’t possibly be so easy. If you could immediately spot any lie, that would mean you were omniscient, which you (probably) aren’t.

And so we come to my point: I think methods to detect fake news, or hate speech or whatever, are dangerous, at least in the way they are currently handled. And the reason for that is both subtle and obvious, I think: If we have a system charged with detecting what we can and can’t trust, we aren’t removing our need to invest trust, we are only moving our trust from our own faculties to those of the machine. (I’d like to call out this phenomenal essay by Bruce Schneier that helped me come to this conclusion)

If I give you an article, and tell you absolutely nothing about where it’s from, how do you know whether it’s trustworthy? It’s tricky, you’ll have to read it, consider how it fits with your existing knowledge, how you might find out more to verify these claims etc. Your assignment of trust into what to believe here lies in your own bullshit-detection algorithms built into your brain and experiences.

But say I give you an article that was by the New York Times. Well, that’s a different story, right? The New York Times is a very reputable source of information, so even if it includes some information you’d normally be skeptical of, you’re probably a lot more inclined to update your beliefs, since you know you can trust this source.

But why do we trust the New York Times? Because the New York Times is composed of humans using their brains to do exactly what these detection algorithms try to do: Detect bad stories and find truth. You can see the New York Times as a great big, hybrid biological-synthetic, filtering algorithm, taking in information about the world, most of it noise, some of it deliberately misleading, and extracting valuable, (hopefully) true information for you to consume.

In theory, I see no problem with the existence of algorithms that do this without human intervention. Humans are machines like any others and have biases, so there’s no reason there couldn’t exist an AI algorithm that is even better at generating truth. But we better be damn sure it actually is better. And I think no currently existing technique can even scratch the capabilities humans have in this area, even with all of their biases. Current AI can literally still not tell correlation from causation, and we want that to be a judge over what information we see? This can’t end well.

We have to keep track of where our trust is. Because we cannot not trust, we have to trust something, we just have to do our best to trust the right things. If we use current types of algorithms to try and “defend” against “threats” such as GPT2, I think we will cause more harm than good. Because not only will they be incapable of recognizing and curating the truth, there is an even more profound, and uncomfortable, problem…

The uncomfortable truth: Babblers

I’ve spent a fair share of time playing with AIs. Probing their outputs, looking at their datasets, tweaking parameters, all kinds of things. Since GPT2 has been out, I’ve spent a lot of time with text generating AIs. I remember one afternoon, I was hanging out with a friend and we were looking at some datasets we found on the internet we could maybe train our next AI on. We stumbled upon a dataset of “toxic comments” and took a look inside. After reading for a little bit, my friend asked a question much more profound than I think either of us realized at the time:

“Wait, these are the ones generated by the AI, right?”

No, they were not. These were real, human comments. But suddenly it clicked in my head as I looked at them. You know how you sometimes just feel like there’s nothing going on in someone’s head? That words come out of their mouth, but there is no comprehension? They form full sentences, express thoughts and wishes, but they just seem…like a zombie, impervious to reason. Probably you’re thinking of some political rival right now, because of your own human biases. Now don’t feel too high and mighty because I can guarantee that you have been that empty person, too. But seeing the texts, and GPT2, I finally had a concrete example to point to when describing that feeling. These comments, these people, they sounded like GPT2! Like a mere statistical concatenation of words and sentences!

As always way ahead of me, Robin Hanson I found out had already written about this exact topic, and called this phenomena “babbling”. I strongly recommend you pause here and read his blog post, it’s not very long. Done? Good. (Sarah Constantin expands on this exact matter in her very interesting blog post)

So what does this mean? What it means is that, as far as I can tell, AI is not capable of generating coherent, true text. But what AI is capable of doing now, is babbling. And babbling is not just some minor part of human communication. I think there’s a strong case to be made that the majority of communicating we do is babbling. Just think about small talk or school essays. Though I think it goes much, much deeper than that.

And what does that mean for our conversation? It means that even if we had a system that can perfectly detect AI generated babbling and deployed it, it would censor not just AI, but a good chunk of real, human communication as well. We like to think we’re somehow fundamentally different from machines and computers, but we’re not. And GPT2 shows it in yet another new way.

What do we do with babbling?

So, ok, we’ve set up a framework for understanding our situation now.

  • We have AIs that are capable of doing something very similar or identical to the human behavior of “babbling”, and they’ll only get better from here. The genie’s out of the bottle.
  • We want to improve, or at least preserve, the quality of our online interactions in a world with these artificial babblers.

Obviously, we want to banish low quality babbling from our platforms. The way I see things, there are two axis along which one could identify such babble: The content of the babble itself, or its source.

Lets take a look at the content of the babble. In theory, we could develop methods to detect babble and then just filter it out, like we do for spam email. But the point I’ve been trying to make in this part is that this cannot work, because, unlike spam, what differentiates AI babble from human babble isn’t just the style, but the content and its truth value. And as long as we don’t have systems that can automatically detect whether a political claim is true, this won’t work. We could filter anyways, and, you know, maybe that wouldn’t be all that bad, if all it filtered was low quality interactions, whether they originated from AIs or humans. But I doubt this. I think that the line between babbling and high level thought is blurry at best, and often plainly impossible to detect from a piece of text alone. So if we filter babbling, we’re filtering a lot of genuine, high quality human interaction, creating a weird, artificial environment designed by an opaque algorithm’s inscrutable preferences. Yikes.

What about the other side, the source? Well, if a new post arrived at our webserver, and we immediately knew it was generated by an AI, we could just simply deny it and we’re good! But of course, this doesn’t work, because we can’t actually know the source of a given piece of text. It’s just a floating blob of data, no author identity attached (usually). Even if it was, how would we know the author data is real?

One of the big reasons that fake news can confuse us so easily is that our minds evolved in an environment where the only source of high level, language-based information was other humans. Despite claims to the contrary, there were no talking animals, spirits or rocks (unfortunately). This means we have a kind of instinct wired into our mind that when we see text or speech, we know it’s made by a human, and that shapes how we trust that information.

But we’re not idiots, we know humans can lie and manipulate, so it’s not as if something being created by another human immediately makes us trust it. The reason you don’t trust some stranger coming up to you and making a wild claim is because it’s cheap (in the economics sense of the word) to tell a lie. It costs me basically nothing to claim the moon doesn’t exist (hah, those moon landing skeptics, actually believing the moon exists, what a bunch of sheeple!). Is that claim true? I leave this as an exercise to the reader.

But there are some things about the very nature of how humans work that are not cheap. For example, while it’s easy for me to claim the moon isn’t real, it’d be much harder for me to get 1000 people to claim the moon isn’t real (though not as hard as I wish it was). So if you heard just me claiming the moon isn’t real, you wouldn’t spare me much thought. But what if 1000 people claimed it? What if your whole family claimed it? What if the entire world and all its scientists claimed it? If this was truly the case, believing the moon was real would be the silly thing to believe!

Conformity gets a lot of hate, and rightfully so. But it exists for a reason. Because what makes us humans so powerful is cooperation, and part of cooperation is admitting you might be wrong. And if a huge amount of people, many of them probably a lot more qualified than you or me, claim something, it would be irrational not to change your opinion, unless you have some really damn extraordinary evidence.

The Biological Blockchain

Now here’s where things get weird (more weird than I’ve already made them, at least).

This whole idea works basically for one reason: Making more humans and convincing them of something is costly. In a way, human reproduction and maturation time fulfills the same purpose as Bitcoin’s proof of work algorithm.

Let me go on (yet another) little tangent to explain a few things about blockchains and distributed trust.

Say one day a friend comes up to me and says “I sent you 100$ on PayPal”. How do I find out if I actually received money? Well, easy, I check PayPal and trust what they tell me. PayPal is an example of centralized trust. I trust PayPal to handle everything to make sure the transaction exists and was indeed valid (whether my friend had enough money to send to me, whether they actually sent it, various anti fraud measures etc). So when I log into my PayPal account and see 100$ dollars sent by my friend, we can agree that it was a legitimate transaction that really happened.

But this system has a central point of failure: PayPal itself. There’s no technical reason one rogue employee (maybe said friend themselves) couldn’t have edited some lines in a database at PayPal, giving me 100 more dollars even though they never sent them. This means the database is now in an “inconsistent” state, because there’s now more money in the system than was put in, which probably isn’t something we want. But in theory PayPal could do anything they damn well please with their database, inconsistent or not (at least until the Feds show up, or customers lose all trust).

So is there some way to avoid having one central, fallible entity you have to trust? I want to make clear that there is no such thing as a fully trustless system (as people such as Bruce Schneier have been saying for a long time). But there is something called distributed trust, of which cryptocurrencies such as Bitcoin form the most notable examples.

The way it works is basically this: We have (a) some kind of commonly shared repository of information (the “blockchain”) that everyone can see. Now this isn’t too different from standing in a public square and talking to people. Information can be exchanged, but it’s still cheap to create false information. What makes blockchains special is that (b) modifying them in undesirable ways is very, very expensive (or impossible). This is usually accomplished by a method called “Proof of Work”. Basically, cryptocurrencies like Bitcoin use very clever algorithms to force users to create a “proof” that they have expended a certain amount of computation. Only with such a (one time use) proof can you then modify the blockchain in a limited way. And even this limited edit is only accepted if other people check it and verify it as true. This makes it so that reliably editing the blockchain in ways that would benefit you unfairly (such as by giving yourself more money than should exist) is extremely expensive, to the point it isn’t worth it or is practically impossible. Not actually impossible, just close enough (you’d need more than 50% of the computing power of all computing running Bitcoin, which is not impossible, but still a barely comprehensibly expensive endeavor).

The result of this is that you can trust the Bitcoin blockchain, as long as you don’t believe anyone has the super powerful compute necessary to tamper with it (and that the code doesn’t have any fatal bugs). Which is, in general, a very reasonable assumption. So trusting in the Bitcoin blockchain makes sense.

The reason this matters is that the human species forms a kind of biological blockchain. Trusted information is like transactions on the blockchain. They only get accepted as true if a significant portion of the people running the blockchain agree they are valid. Humans work the same way. We accept things as “common knowledge” and “true” once a critical amount of people accept the new information (this isn’t the only way we find truth, but it is an important way). Of course, it’s infinitely more messy and complicated than a nice, clean, mathematically precise blockchain, but the principle is the same.

We have a) a common data repository in the form of our common beliefs, and b) a method to make it hard to tamper with: You need to get a lot of people to accept and spread your new info. And the reason this is hard is because there’s an inherent limitation to how many humans there are and how easy you can access and convince them. Remember:

“You can fool all the people some of the time, and some of the people all the time, but you cannot fool all the people all the time.” — Abraham Lincoln

Even Lincoln understood the concept of the biological blockchain! It made perfect biological sense for our brains to associate “lots of people believe X” with “X must be true”. If I wanted to convince my stone age tribe of my newfangled idea, it was hard, it was costly. Not like just making a wild claim at all! And if I succeeded, it was a reliable sign that my idea had been vetted and passed at least some levels of reasonable scrutiny. Again, this method is by no means perfect, but it’s a hell of a lot better than nothing (recall the Chimpanzees).

Undermining the Biological Blockchain

And finally this gets me to my point: This is why fake news is dangerous.

I catch myself doing this all the time: I read some reddit headline I don’t know much about and read the first comment or two below it and, just instinctively, without thinking, I believe what those comments say is true. I’ve genuinely trained myself to notice when this happens and try to snap out of it, but it’s hard. It’s an inbuilt system of our brain, it’s how we humans think. And it makes sense to think like this. Looking at those top comments, my brain sees speech, which means it was created by real humans, with real human brains, and not only that, they have a ton of upvotes too, meaning a lot of other, real humans believe this! And to add to that, as Daniel Kahneman (the father of cognitive biases research) says, “What You See Is All There Is”. My brain sees a large amount of my fellow humans committing a piece of information to the biological blockchain, and nothing else, so I rate it as trustworthy.

And generally, even on the internet, this isn’t half bad. The top voted comments usually are better (though by far not always). But this system breaks down in several ways, many of which I’m sure you can think of (such as how infuriating messages can spread more than their quality merits), but for now lets just focus on the one most relevant to our discussion.

The failure mode I’m interested in is the one that the internet has enabled to whole new degrees: This whole heuristic of believing “popular” things works if, and only if, our “proof of work” holds, meaning that “making” more humans that believe our thing is expensive. But with the advent of mass media, at least as far back as the invention of print and broadening much further with the internet, this no longer holds quite so cleanly.

The reason for that is, as we discussed previously, we don’t (usually) see the source of a piece of information at all. So instead, if we want to judge whether to apply our biological blockchain heuristic to this new piece of information, we first use some methods to try and determine whether the source is a real human (which is the requirement for us adding this new piece of information to the biological blockchain). Our oldest method was simple: “If it’s speech/writing, it’s human”. And for most of history this worked fantastically well. If I heard you tell me something, I could be sure of its source. And even after writing was invented, I knew writing could have only been produced by another human (though a biased sample of humans, since so few could write).

This basic heuristic has been falling apart ever since the printing press was invented (and was already damaged after the invention of writing), since now a small number of humans could produce amounts of text that, if made by actual humans by hand, would have required an army. As the cost of producing text fell, is became a less and less reliable sign of the text’s origin. And this trend has only continued. Audio recordings lowered the cost and value of speech, video recording of sight. And the internet brought these costs crashing down even further.

Take simple text. If it was the middle ages and I got 100 handwritten signatures from different people supporting my point, that meant quite a lot. But how much would you trust my point if I presented you with an unmarked webpage of 100 names? Not at all, hopefully. Ever since the earliest days of the internet, people have figured out that generating text is extremely cheap.

So, being the clever apes that we are, we started developing counter measures. Some of these, like spam filters, were based on filtering the content, because while producing simple text was cheap, producing high quality (even babbler quality) text was still expensive. Other methods instead tried to identify whether the source was human, such as CAPTCHA, which works reasonably well in many situations, though there are plenty of ways for robots to evade it, and sometimes it has some other kinds of problems as well.

This meant that, for the most part, if we saw a somewhat coherent, complex piece of text, it was still reasonable to assume it was generated by a human and let its content affect our biological blockchain.

And this is my core thesis: The bar for what we considered acceptable proof to influence our biological blockchain has steadily risen over the years and GPT2 represents the latest rise in that bar, marking the point where we can no longer trust “babble” level text.

Now I think there’s a strong argument to be made that we passed that point a long time ago, basically since the concept of written propaganda was invented. But GPT2 is the moment for babbler-level text generation where the cost has fallen to basically nothing, like the internet was for cheap and low quality spam email a decade or two prior. (For those curious, here’s a very interesting Defcon presentation about email spam)

Is there anything we can do? I think there is no easy solution. I think we have reached a point where it is no longer, in general, possible to determine whether a given text is human generated or not. But I think it’s not a single “point” at all, but a spectrum, starting from the simplest spam emails back in the 90s to GPT2 today, to who knows what tomorrow.

I think this is an intrinsically hard problem that cannot have an easy solution. What do we want to do, give every human a “consciousness license” that they have to show when writing a comment? Honestly, I think it would improve the quality of our biological blockchain, but I shouldn’t have to spell out the cyberpunk-dystopian undertones. The best tool for this we currently have remains the human brain. We should focus on educating humans on the realities of their own biases and how information is created on the internet and elsewhere.

I really do think that if people just knew that something like GPT2 exists and is out there in the wild, it would force them to improve their standards for what information they trust. It was the same way with spam email. When the Nigerian prince and other similar laughably primitive scams first came into being, they worked. It was the same with computer viruses. Many of the first viruses worked because no one even thought that a file could be malicious. Why do security researchers hack into systems and reveal their flaws? Because it’s the only way to get people to listen, you have to prove that your threat actually exists (and even then, any security expert, or climate activist, could tell you plenty of stories of how many people still don’t take the risks seriously enough). People only develop resistance when they know a threat is real.

And this is why I am planning on releasing 1.5B to the public, on the 1st of July. I’m waiting because I want to give people time to convince me I’m wrong if I missed something. Because pretending these kinds of things don’t exist or are “only in the hands of good people like OpenAI” is a recipe for abuse by secretive parties that are much less benign than OpenAI or me. I think we should make it as widely known as possible that GPT2 and other babbling methods exist and are in the hands of many, and, like with email spam and viruses, we need to adapt.

You can’t put the genie back in the bottle.

Bringing it all together: GPT2, babbling and trust

So, finally, here we are, with my main arguments laid out. To recap:

  1. Current AIs can’t generate valuable truthful text, but they can babble.
  2. A large portion of real human interaction is babbling.
  3. Current techniques cannot detect or generate truth.

The consequences I draw from these arguments are:

  1. GPT2 is not uniquely dangerous, but is rather the latest step in an inevitable trend. It may or may not lower the cost to generate believable text, but powerful organizations can already produce massive amounts of much better content with humans and other algorithms already and GPT2 is not out of their reaches anyway.
  2. Fighting low quality babble, AI generated or not, is and will remain critical, but I think moving trust into demonstrably flawed “anti babble” algorithms is the wrong way to go. Someday, systems worthy of our trust will exist. Today is not that day.
  3. Instead of training people to rely on flawed algorithms, we should help them find the correct places to place their trust, first and foremost by training their own critical thinking skills. The human brain is currently still by far the most powerful truth generating machine we know.
  4. To catalyze that adaption of humans, I think these methods should be spread as wide as possible, so no one can hide behind a false sense of security. To further this (and put my money where my mouth is), I am planning to release 1.5B to the public.

This is the core of my arguments and this essay. This is the part directly related to GPT2 and its consequences, but I have one more thing to discuss that I think is worth talking about. This is basically a separate topic but I think it adds something to our current discussion.

HACKER [originally, someone who makes furniture with an axe] n.
1. A person who enjoys learning the details of programming systems and how to stretch their capabilities, as opposed to most users who prefer to learn only the minimum necessary.
2. One who programs enthusiastically, or who enjoys programming rather than just theorizing about programming.
3. A person capable of appreciating hack value (q.v.).
4. A person who is good at programming quickly. Not everything a hacker produces is a hack.
5. An expert at a particular program, or one who frequently does work using it or on it; example: “A SAIL hacker”. (Definitions 1 to 5 are correlated, and people who fit them congregate.)
6. A malicious or inquisitive meddler who tries to discover information by poking around. Hence “password hacker”, “network hacker”.

– The Hacker’s Dictionary (

In this essay, I’ve tried to keep my person out, because I think who I am doesn’t matter for my core points (which, hopefully, rest on their objective arguments alone). But for this third part, I’d like to talk about a topic that might seem a bit unrelated: The psychology of what I term “The Curious Hacker” (I would have just called it “The Hacker”, but that term has unfortunately gained many negative connotations).

For some people, the fact that I did all of this, created a “dangerous” AI, at no financial gain to myself (quite the opposite, this cost me a lot of time and nerves that could have been spent on more directly productive things), and then want to release it to the public, is baffling. For many of the people I expect to be reading this essay, I expect them to think the reasons why I acted this way are blatantly obvious and understandable. And that’s because they’re like me, they’re curious hackers.

But to many people, the minds of people like me are quite strange and they don’t understand why we do these kinds of things. This isn’t because they can’t or don’t want to. It’s just hard to understand people that think differently from what you are used to from yourself and your associates. But I think curious hackers are an incredibly important factor in our lives, even if we might not realize it, and I think understanding their psychology is a valuable necessity in understanding our world, and how to approach arising problems.

So I’d like to invite you into my mind for a bit to understand why I do the things I do, why I think there are other people like me that are much more important, and why understanding those people will be critical.

What makes the Curious Hacker?

200 hours of work. That’s around what I’d estimate I’ve invested into this project. Now not all of this was highest quality work, a lot of it was stuff like staring at a black screen full of numbers waiting until something happened, but a lot of it was hardcore, 8+ hours a day coding and reading scientific papers. And I wasn’t paid a cent for any of it. No one told me to do this, no one supported me in doing this (other than Google through hardware), I could have spent that time with my friends or studying for university or doing literally anything else.

So why did I do it, if not for profit or because I was told to? I’d like to say it was for all the reasons mentioned in Part 2, that I had this grand philosophical/moral code I followed unerringly to this final goal of advancing what I believe to be true and right. And I think many people similar to me try to pass that off for their own projects, that they acted with well thought out, responsible judgement ahead of time. But that’s not true (at least for me usually).

I did it because, dammit, it was cool!

And this is I think the defining characteristic of the curious hacker. I do crazy things, often hard and with no clear payout, because they’re cool (and, in my personal case even more often, because they’re funny). The curious hacker does things because they can. There’s an intrinsic joy for me (and people like me) in doing hard things, not because they are easy, but precisely because they are hard (though some go too far).

GPT2 was not only incredibly cool, but also incredibly funny, which is just irresistible to me. Add to that the mystique of something you’re not supposed to do, and you have the curious hacker hooked like a fish. It’s the perfect storm of factors to make someone like me obsessed with it.

Curious Hackers matter, a lot

The curious hacker likes to play. They like to take hard, complicated things and tinker with them, break them, reassemble them. Someone like me with a proper science or math problem is like a cat with a ball of yarn. We like doing this. (Why do you think scientists accept the terrible working conditions they all too often endure?)

And I’m not alone here. This very specific blend of curiosity, playfulness and obsessive love of solving problems is a potent cocktail of character traits that you see over and over in a very specific kind of person: Scientists.

I’m just some student nobody, but what about people like Linus Torvalds, Albert Einstein and Richard Feynman? (Or, maybe more directly, people like Bill Gates, Larry Page and Steve Wozniak) If you read anything about their personalities, you will quickly be struck by just how much of the curious hacker they have in them. How did Einstein come up with his famous special relativity? When he was 16, he tried to imagine what it would be like to ride along with a beam of light. No one was paying him, he was just playing around with a neat idea.

Some people outside of academia imagine scientists to be very rigid, formal people and that research happens along a very formal schedule (yes, to my academic and techie readers, I know this thought is weird to you, but many do believe this, research is a foreign world to many people). Sometimes this is true, but, usually, nothing could be further from the truth. The best scientists are creative, curious and, more often than not, weird. You see it time and time again when you hear scientists explain why they got into science, or how they came up with their greatest ideas: They follow their intrinsic curiosity more than any kind of extrinsic motivators. It’s crazy how many absolutely vital technologies of our time resulted just because some guy was playing around with something in a lab that one time. Remember:

“The only difference between screwing around and science is writing it down”
Alex Jason

We have curious hackers to thank for modern physics, computers, the internet and who knows how much more of our modern world. These people matter, they make a difference, which is why I think understanding their psychology is important, the same way understanding the personalities of political leaders is important (which tend to be very different).

But, I also think it’s important for another reason: For when these, very positive and benign seeming, personalities become dangerous.

The Dark Side of the Hack

Curious hackers made the internet, curious hackers made lots of great things. Now I hate to be the guy that always goes to this place when talking about AI, but do you know what was definitely made by curious hackers? The nuclear bomb.

Feynman’s account of his experiences of the Manhattan Project in his book “Surely You’re Joking, Mr. Feynman” reads like a National Lampoons movie script. He plays pranks on his fellow scientists, gets into hilarious hijinks with administration and experiences all kinds of awesome stuff. I remember when first reading that as a teenager, and all I could think was “Holy crap, this sounds fun!”. And it wasn’t just Feynman either, a lot of scientists have similar (though less borderline cartoonish) stories of the Manhattan project. And I totally get why. You are allowed to work on the coolest, craziest tech of your era, with basically no restrictions, no budget limitations, surrounded by the smartest and coolest people of your generation. That sounds like heaven to me.

But then, July 16th, 1945. There are multiple accounts of how this affected the scientists witnessing it, and for many, it was a sudden, dawning horror.

“What have we done?”

A chilling quote from Robert Oppenheimer, a seemingly endless source of chilling quotes, illustrates this perfectly:

“When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success. That is the way it was with the atomic bomb.”
– Robert Oppenheimer

And this is why understanding how the curious hackers work, even if (or especially if) you are one, is so important.

Curious hackers are, for the most part, some of the nicest, most benign people you will ever meet. There were plenty of hawks at the Manhattan Project for sure, but you just need to look at how many ex-Manhattan scientists ended up as some of the most vocal anti-nuclear activists to understand that things went so terribly wrong.

I’m probably one of the most benign and best intentioned people you could imagine. I don’t want to hurt anybody, I never have. And that lulls you into a false sense of security. “I don’t want to hurt anybody, so I just won’t do it, easy! I’m a good person, I would never do bad things. My actions could never be a danger to others.”

But many curious hackers, including myself, often are, for better or worse, slaves to their curiosity. Sometimes, while feverishly hacking away at my latest AI abomination (GPT2 is not my weirdest project, believe me), I pause for a moment, cyberpunk synthwave music blaring from my headphones, and wonder to myself: “Am I a Black Mirror character?”

I’m as anti-war, pro-human as you could imagine, but if I was alive in the 40s, and the US offered me to work on the Bomb…I don’t know if I could have said no.

Not because I wanted to hurt people, this is the important thing to understand about the curious hacker, but because it was just so damn cool. Splitting the literal building blocks of matter itself to make a giant explosion? That’s fucking awesome!

If curiosity really does kill the cat, then, well…meow

Embracing the Curious Hacker

Now there will be plenty of the Very Serious People that read this and scoff in patronizing dismissal, saying something like: “So what you’re saying is you’re rationalizing your cruel, selfish nature with a cute label? Just don’t do those bad things!

This is nothing more than an applause light, something that’s meant to sound wise and great and make everyone nod in agreement and clap, but that lacks any kind of actual useful contribution. My entire point is that, yes, there are bad people that do bad things for bad reasons, and there is plenty that has been said about those kinds of dangerous behaviors. But the curious hacker is not that archetype, they do dangerous things for different reasons. We see plenty of portrayal in media of “traditional” kinds of personality traits that lead to dangerous behavior (greed, sadism etc), but I think too little attention is paid to the source of danger that is the curious hacker archetype. (Tony Stark from the Marvel Cinematic Universe is a notable exception. I often get weird stares when I express that I completely understand why Tony built Ultron in Avengers 2. Many people find his behavior in that movie completely stupid and incomprehensible) Everyone understands that bad people do bad things, but it’s hard for us to grasp how good people can be led to do bad things (even so it’s arguably the default case in the real world).

We shun the curious hacker at our own risk. Demonizing people, especially smart and well meaning people, for mistakes they make is terribly short sighted. Curious hackers have given us many of the best things the modern world has to offer, and they will continue to do so. Because they can’t stop, they know they can’t.

Instead of taking this as a criticism of the curious hacker, I want this to be a call for understanding and self improvement. We, both the curious hackers and not, need to understand the light and dark sides of the archetype. So many of the problems we see out of Silicon Valley is because it’s basically become an enclave of curious hackers, a digital 21st century Manhattan project (Which, again, sounds like heaven to me).

For those that are not curious hackers, I want you to understand the way they think. Of course there are exceptions, but in general, curious hackers are not bad people. They are often some of the most idealistic and well meaning people there are. As Yuval Harari (a notable Silicon Valley critic) puts it:

“I’ve met a number of these high-tech giants, and generally they’re good people. They’re not Attila the Hun. In the lottery of human leaders, you could get far worse.”
Yuval Harari

Curious hackers do bad, dangerous things, but so do we all. And by understanding each other, we can forgive, and find ways to improve.

And to those of you that are curious hackers, the message from me is a simple one: Be aware of your dark side. You might not feel like you’re doing anything bad, and I believe you. I believe that you are only acting out of playful curiosity and optimistic hope for mankind. How could there be anything more pure than that? But history teaches a clear lesson. Until 2014, almost no one even took AI safety seriously. Many still don’t, even though all it takes to knock down most counterarguments is one Stuart Russell lecture. We need to be better.

We need to not shun the curious hackers, but embrace them. We need to use all the good they are capable of and find responsible ways of managing the risk.

And so I too am trying to improve, in what little ways I can. I could have dumped 1.5B directly onto the net. I wanted to. When I first read OpenAI wasn’t releasing 1.5B, I was pretty annoyed. Information wants to be free, dammit, I want to see! But once I had the genie, I took my time. I thought about what I wanted to do before letting it out of the bottle. I still came to the conclusion that I wanted to let it out, but now I did so with a reason, not just because it was cool. And that’s also why I’m waiting a bit longer to release, because I’m humble enough to realize that maybe I made a terrible mistake in my logic, and I shouldn’t release it after all. I’ve accepted my dark side and am doing what I can to handle it responsibly. And I think that’s good, because this is still low stakes, no one is going to die from GPT2 (probably). But what happens, in a decade or two, when I have my PhD, and the new Manhattan Project comes knocking?

I hope I’ll have learned my lesson.