Breaking News: Gender inequality in the Press

A study on how different genders are represented in newspapers, in terms of proportions, quote themes and comparison between English-speaking countries throughout the years 2015-2020.

Gender representation in most work-related domains has for a long time been largely dominated by men. But in the last decades, women and gender diverse individuals’ rights and voices have gained awareness and freedom. How does this evolution translate in the newspapers? How differently women and gender minorities are represented in English newspapers compared to men?

This exploration has been conducted using the Quotebank dataset, a corpus of speaker-attributed quotations extracted from millions of English news articles from the web between August 2008 and April 2020. In order to get more informations on authors of the quotations, a Wikidata set of speaker attributes.

Gender Representation

Distribution of genders in the most read newspapers in the United Kingdom

In the United Kingdom, the population consists of about 49% of women and 51% of men. But this equality is not as well represented in most newspapers. With a selection of 14 newspapers, from the tabloids to more serious daily journals, a significant disparity in the genres of authors can be seen without exceptions. Women are underrepresented.

The disparity remains more or less constant throughout the 5-year span with an average between 13% and 37%. Females are best represented in Daily Mail and Daily Star which are both tabloid newspapers.

A statistical hypothesis testing on UK’s dataset on women in newspapers (more information in the notebook) shows that there is a correlation between time and the percentage of women in the following newspapers : Daily Mirror, Daily Express, i, Financial Times and The Sun.

Overall it is difficult to establish a link between certain types of newspapers and how they represent women. There seems to be a general tendency for the publisher in each journal which is not necessarily related to their content.

But are women the only gender underrepresented?

Detailed view on gender minorities

The Quotebank also represent gender diverse individuals such as non-binary, transgender or genderfluid individuals (see Lexicon for more details). Statistics for the representation of those gender minorities are extremely hard to get. They either don’t exist yet or are misled due to a part of the society being still sceptical about the existence of other genders than cisgender male and female. A survey conducted in 2011 by the Equality and Human Rights Commission estimates that they consist between “0.1-2% of the general population depending on the inclusion criteria and geographic location.”

Knowing that these minorities consist of a fraction between 0.1% and 2% of the population, it is more difficult to assess if they are really underrepresented in the press. For the selection of newspapers the representation stays constant throughout the 5-year span and the average fluctuates between 0.05 and 0.8% which corresponds to the reality.

Can these first conclusions be extended to some other countries with a completely different cultural background?

Comparison with other countries

The United States, India and Nigeria are all English-speaking countries located on different continents.

As before, a time independence is observed and no conclusion can be issued between the gender representation and the types of newspaper. However an interesting new observation can be made. The averages over the 5-year span, let it be for women or gender minorities’ representation, differ among countries.

This can be justified by their history and their consideration for women and gender diverse individuals. Nigeria still does not recognize LGBT rights and women face numerous inequalities and difficulties compared to men. This is well represented here with the lowest proportion of female (<10%) and gender diverse individuals(<0.1%). The disparity becomes even greater than in a European country like the United Kingdom, with an average above 30% for some newspapers, where people can be seen as more open minded and understanding on the question of gender diversity. That being said, inequalities are still present everywhere. A similar analysis can be made for India where the situation for women and gender diverse individuals can also be full of difficulties and unfairness.

For the United Kingdom, we can see an overall evolution in time. As we saw earlier, this effect is caused by only a few newspapers. The newspapers influencing this evolution do not seem to share a common content type.

This supports the hypothesis that in the last decades, women and gender diverse individuals’ rights and voices have gained awareness and freedom in some countries. And this progression is also captured in the newspapers. On the other hand the process is still very slow depending on the countries and their society.

In the end, it boils down to who the journalists decide to quote. This could depend on women’s roles in society, which would imply whether women are considered relevant enough to be cited.

Topics analysis

In order to further analyze the gender distribution in the press, it is interesting to study the distribution of genders based on the topics they talk about the most. The focus is solely on the UK newspapers for this more in depth analysis. To perform topic analysis, we used Empath, a tool created by researchers at Stanford University in 2016. It uses a combination of deep learning and crowdsourcing to analyze text over 200 pre-existing categories and output the most probable topics the text is about.

What is the distribution of genders in topics?

In the following analysis we will keep in mind that the male gender is overly represented, hence the topic analysis will be more precise. In the contrary, gender-diverse numbers are almost none, which will have as consequence that our analysis for this gender cannot be extended to other cases.

What does each gender talk about?

Here are the most cited quote of early 2020 per gender :

“[Trump’s tweets] make it impossible for me to do my job.” William Barr (US politician), male, 07/02/2020 .

work to become financially independent.” Duchess of Sussex, female, 08/01/2020 .

They are standing for us, and I am immensely proud of them,” Rose McGowan (actress and activist), non-binary, 05/01/2020 .

The 10 most popular topics per gender :

Most popular topics are highly similar between genders, but the relative importance is different : the highest topic score for men is business, while for women it is positive emotion and for gender-diverse individuals it is optimism. This result is in line with the tendency of our society to put pressure on women to be likeable and come across as nice and non-aggressive, while men do not undergo the same social phenomenon and tend to be more direct in their speech. Concerning gender-diverse individuals, we can notice the predominance of an emotional lexical, tending to show that they talk more about personal experiences than facts on general news.

What’s the evolution of topics per gender throughout the past 5 years ?

Topics percentages are generally constant, with a majority of positive sentiments related to actions and achievements.

As for men, it is not possible to detect any evolution in women’s topics distribution. The emotions lexicon is the prevalent one here with topics related to family and friends and communication.

The lack of data is the origin of the rougher edges of the graph. However, we can still notice that negative emotions are very high in contrast to positive ones which might be a consequence of the fact that “LGBT respondents are less satisfied with their life than the general UK population (rating satisfaction 6.5 on average out of 10 compared with 7.7). Trans respondents had particularly low scores (around 5.4 out of 10)”, according to an UK survey for the LGBT community in 2007.


As expected, men still represent the majority of speakers in any type of newspaper. Every country studies display this inequality depending on their cultural background. The developing countries show a more important gap. However, the time span on which the analysis has been conducted is too short to display a significant progression in the representation of women and gender diverse individuals. To really show how their voices gained awareness, it would have been necessary to use some data over multiple decades. It has been a slow process and 5 years is not sufficient to capture enough important events capable of changing people’s mind.

Throughout this analysis, no real evolution was found concerning topics that each gender talked about between 2015 and early 2020. It is however possible to see different lexicons for each gender. Men are in the action and business, women talk about emotions and caring, while gender minorities share their possibly difficult life experiences.


The results we found in this analysis need to be taken with a grain of salt as we encountered a few limitations and had to make some assumptions:


Thank You for your attention! This story was presented to you by: Moritz Waldleben, Marie Knoepfel, Lorena Egger and Paloma Cito.