This AI Can Generate Convincing Text—and Anyone Can Use It

The makers of Eleuther hope it will be an open source alternative to GPT-3, the well-known language program from OpenAI.
pile of letters
Photograph: Caspar Benson/Getty Images

Some of the most dazzling recent advances in artificial intelligence have come thanks to resources only available at big tech companies, where thousands of powerful computers and terabytes of data can be as copious as free granola bars and nap pods.

A new project aims to show this needn’t be the case, by cobbling together the code, data, and computer power needed to reproduce one of the most epic—and potentially useful—AI algorithms developed in recent years.

Eleuther is an open source effort to match GPT-3, a powerful language algorithm released in 2020 by the company OpenAI that is sometimes capable of writing strikingly coherent articles in English when given a text prompt.

Eleuther is still some way from matching the full capabilities of GPT-3, but last week the researchers released a new version of their model, called GPT-Neo, which is about as powerful as the least sophisticated version of GPT-3.

Open sourcing big AI projects could make the technology more accessible and widespread at a time when it has become increasingly entrenched at big tech firms. It also could affect efforts to make money on the back of key AI advances and could increase the likelihood that AI tools will misbehave or be misused.

“There is tremendous excitement right now for open source NLP and for producing useful models outside of big tech companies.” says Alexander Rush, a computer science professor at Cornell University, referring to a subfield of AI known as natural language processing that’s focused on helping machines use language. “There is something akin to an NLP space race going on.”

If that’s the case, then GPT-3 might be considered the field’s Sputnik. GPT-3 consists of an enormous artificial neural network that was fed many billions of words of text scraped from the web. GPT-3 can be startlingly eloquent and articulate, although it can also spurt out gibberish and offensive statements. Dozens of research groups and companies are seeking ways to make use of the technology.

The code for GPT-3 has not been released, but the few dozen researchers behind Eleuther, who come from across academia and industry, are drawing on papers that describe how it works.

Rush, who isn’t affiliated with Eleuther, says the project is one of the most impressive of a growing number of open source efforts in NLP. Besides releasing powerful language algorithms modeled after GPT-3, he says the Eleuther team has curated and released a high-quality text data set known as the Pile for training NLP algorithms.

Mohit Iyyer, a computer science professor at the University of Massachusetts Amherst, is using data and models from Eleuther to mine literary criticism for insights on famous texts, among other projects. This includes training an algorithm to predict which sections of a book such as Jane Eyre will be cited in a particular piece of criticism. Iyyer says this might help produce a program with a more subtle grasp of language. “We are definitely thankful that they aggregated all this data into one resource,” Iyyer says.

Perhaps the biggest challenge for any open source AI project is the large amount of computing power required. Training GPT-3 required the equivalent of several million dollars worth of cloud computing resources. OpenAI recently said the computer power required for cutting edge AI projects had increased about 300,000 times between 2012 and 2018.

The Eleuther project makes use of distributed computing resources, donated by cloud company CoreWeave as well as Google, through the TensorFlow Research Cloud, an initiative that makes spare computer power available, according to members of the project. To ease access to computer power, the Eleuther team created a way to split AI computations across multiple machines. But it isn’t clear how the computational requirements might be met if the project continues to grow.

OpenAI is betting that GPT-3 can be commercialized. In July 2019, OpenAI received a $1 billion investment from Microsoft, which a year later got exclusive rights to license GPT-3. OpenAI says that over 300 GPT-3 projects are in the works, using a limited-access API. These include a tool for drawing insights from customer feedback, a system that auto-generates emails from bullet points, and never-ending text-based adventure games. Eleuther might make it easier to build similar tools without access to the GPT-3 API.

OpenAI declined to comment on the Eleuther project.

an abstract depiction of screens and bubbles connected
Everything you ever wanted to know about Linux, GNU, and how big companies are making money off of free, collaboration-based software.

The project highlights another challenge with opening access to powerful AI systems. Because GPT-3 and similar large language models draw from random text, they can reproduce bias or produce abusive or discriminatory speech. It’s also conceivable that a tool like GPT-3 could be used to generate fake news or fraudulent messages. This is one reason that OpenAI has given for not releasing the full version of GPT-3.

The data set that Eleuther is using is more diverse than GPT-3, and it avoids some sources such as Reddit that are more likely to include dubious material. Connor Leahy, an independent AI researcher and cofounder of Eleuther, says the Eleuther project has “gone to great lengths over months to curate this data set, make sure that it was both well filtered and diverse, and document its shortcomings and biases.”

Rush, of Cornell, believes it’s better for such tools to be developed openly. “I find the closed-source argument in the exact wrong direction,” he says, noting that many academics are interested in studying the way language models can misbehave and finding solutions to the problem. “Open source efforts have been and will be essential to these efforts and progress.” he says.


More Great WIRED Stories