What Algorithms Can (and Can’t) Tell Us About Gender and Literature

M. Lynx Qualey

Staff Writer

M. Lynx Qualey is the founder of, a website that brings together translators, authors, publishers, critics, academics, and readers around discussions of Arabic literature in translation. She works as a book critic, reader, editor, and ghostwriter. You can follow her at @arablit.

Last month, a story came out about five scholars who’d set up an algorithm to read 3.5 million books. The five co-authors were looking for adjectives and gender. They mapped those more commonly used to describe women, and those more commonly used to describe men.

The resulting paper, “Unsupervised Discovery of Gendered Language through Latent-Variable Modeling,” is hardly earth-shaking.

In a nutshell, co-authors Alexander Hoyle, Lawrence Wolf-Sonkin, Hanna Wallach, Isabelle Augenstein, and Ryan Cotterell write: “Positive adjectives used to describe women are more often related to their bodies than adjectives used to describe men.” Women are more likely to be called pretty. Men are more likely to be called interesting.

Study authors looked for the most common adjectives across 3.5 million books. Among the words most commonly used to describe women: beautiful, lovely, chaste, fertile. Among those most commonly used to describe men: just, sound, righteous, and rational.

Lest you think there is some “rational” basis for these choices, the authors note: “Even within 24 hours of birth, parents describe their daughters as beautiful, pretty, and cute far more often than their sons.”

Fewer Women Characters?

Another recent paper to use large-scale data to look at gender in literature was  “The Transformation of Gender in English-Language Fiction,” published in the Journal of Cultural Analytics in 2018. These co-authors—Ted Underwood, Sabrina Lee, and David Bamman—set up a series of machine-learning models to look at 104,000 works of fiction written between 1700 and 2010. According to The Economistthis body of 104,000 books “contains almost all classic novels, but only about half of the books that have been listed in Publishers Weekly[.]”

What did they find? For one thing, women apparently used to write more. The percentage of books written by women fell from around 50% at the start of the 19th century to less than a quarter by the 1960s. It rebounded to around 40% in the 21st.

They found that women characters also plummeted. Their model—which promises to guess gender of characters by names and pronouns “with more than 90% accuracy” saw that the “share of the narrative given to fictional women declined over 150 years, before recovering slightly.”

The Limits of Algorithm Criticism

Of course, as an essay in Aeon notes, AI criticism is limited by its human trainers. And some number crunching seems ripe for overreach, as when Matthew Jockers declared there were only six (or sometimes seven) essential plots.

In a critique in the Chronicle of Higher Education, Nan Z. Da comes down hard on these big-data studies of literature. Da accuses this branch of the digital humanities of generating bad literary criticism, but also lacking  rigor. “Its findings are either banal or, if interesting, not statistically robust.”

Da cites a 2019 book by Ted Underwood, Distant Horizons, that makes a range of assertions on the back of big data. This includes, according to Da, that “gender becomes more blurred after 1840.” Da quite reasonably suggestions that a “blurring” of gender descriptions might mean absolutely nothing. And indeed, it seems to contradict what the authors of “Unsupervised Discovery of Gendered Language through Latent-Variable Modeling” have found.

In general, algorithm  findings seem to work better—at least thus far—when they are working on more clear-cut patterns (number of women characters) rather than messier ones (“essential plot types”).

In “Unsupervised Discovery of Gendered Language through Latent-Variable Modeling,” the co-authors acknowledge that their study has several limitations. First, their search ignores the demographics of the speaker. It also ignores genre, so that romance and cowboy fiction are all thrown into the same basket. And third, it ignores the time when a work was published: They looked through roughly 3.5 million published between 1900 to 2008. Hopefully, books published in 2006 were at least edging away from “chaste,” “barren,” and “vivacious” as ways of describing women.

Insights with Google Ngrams

Unscientifically, it’s always interesting to check out the Google Ngram to see when words fall in and out of use in the Google Books corpus.

Use of the word girlish, for example, peaked around 1900. Use of the word slut zigzags, going up and down. It peaks in 1900, goes down, heads up around 1930, and then takes off in 1980. After that, slut continues its rise. Whore, on the other hand, was more popular in 1810—at least according to Google Ngram—than it is today.

Interestingly, rape is something we didn’t talk about much before 1970, and that goes double for sexual assault.

Surely, there are things we can learn from algorithms about the way we talk about gender over time. But also, as with any bad use of data, much fluff and nonsense.