Blog Cambridge

The Future of Artificial Intelligence: Language, Gender, Technology 800 800 Stefanie Ullmann

The Future of Artificial Intelligence: Language, Gender, Technology

second workshop

17 May 2019

Centre for research in the arts, social sciences and humanities (crassh), cambridge

The Giving Voice to Digital Democracies project hosted its second workshop on 17th May 2019 at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH). The project is part of the Centre for the Humanities and Social Change, Cambridge, funded by the Humanities and Social Change International Foundation.

The project investigates the social impact of Artificially Intelligent Communications Technology (AICT). The talks and discussions of this second workshop focused specifically on different aspects of the complex relationships between language, gender, and technology. The one-day event brought together experts and researchers from various academic disciplines, looking at questions of gender in the context of language-based AI from linguistic, philosophical, sociological, and technical perspectives.


Professor Alison Adam (Sheffield Hallam University) was the first speaker of the day, and, she asked the very pertinent question of how relevant traditional feminist arguments and philosophical critiques of AI are nowadays in relations to gender, knowledge, and language. She pointed out that while older critiques focused on the distinction between machines and humans, newer critical approaches address concerns for fairness and bias. Despite these shifts, Adam stressed the abiding need to allow feminist arguments to inform the discussion.


Taking an approach based on formal semantics and computational psycholinguistics, Dr Heather Burnett (CNRS – Université Diderot Paris) presented the results of a study investigating the overuse of masculine pronouns in English and French. The talk ranged across numerous topics, including the way in which dominance relations can affect similarity judgments, making them no longer commutative. For instance, in male dominated professions women are likely to be considered similar to men, but in female dominated professions, men are unlikely to be considered similar to women. These asymmetries have implications for language use.


In his talk, Dr Dirk Hovy (Bocconi University) focused on the relation between gender and syntactic choices. He concluded, for instance, that how people identify in terms of gender (subconsciously) determines syntactic constructions they use in language (e.g., women use intensifiers [e.g.,  ‘very’] more often, while men use downtoners [e.g., ‘a bit’] more often). On the basis of a study of Wall Street Journal articles, he also notes that training data consisting mostly of female writing would in fact be beneficial for both men and women as women’s writing has shown to be more diverse overall. The importance of the corpora used for AICT research was emphasised repeatedly. Any linguistic corpus is a sample of a language, but it is also a sample of a particular demographic (or set of demographics).


Dr Ruth Page (University of Birmingham) discussed ‘Ugliness’ on Instagram and how the perception and representation of ‘ugly’ images on social media relate to identity and gender. She took a multimodal approach combining image and discourse analysis. Her research indicates that perceptions and discourses of ugliness are shifting on social media and, particularly, that users distinguish between playful and ironic illustrations of ugliness (using the hashtag #uglyselfie) and painful, negative post (#ugly). While ‘ugly’ is much more frequent in relation to girls than boys, the opposite is true for man/woman. She also showed that males favour self-deprecation more, whereas women are more likely to use self-mockery. 


In her talk, Dr Stefanie Ullmann (University of Cambridge) presented a corpus study of representations of gender in the OPUS English-German parallel data. She showed that the data sets are strongly biased towards male forms, particularly in German occupation words. The results of her study also indicate that representations of men and women reflect traditional gender-related stereotypes, such as doctors are male, nurses are female or women are caretakers, men are dominant and powerful. Using such clearly skewed texts as training data for machine translation inevitably leads to biased results and errors in translation. 


Finally, Dr Dong Nguyen (Alan Turing Institute, University of Utrecht) took a computational sociolinguistic perspective on the relation between language and gender. She presented the results of an experimental study in which a system (TweetGenie) had been trained to predict gender and age of people based on tweets they had written. She showed how speakers construct their own identify linguistically, and this process involves the gendered aspects of their language. Consequently, gender as manifest in written texts is fluid and variable, rather than something biological and fixed.


The workshop ended with a roundtable discussion involving all speakers, which gave the very engaged and interested audience a final chance to ask questions. It also provided an opportunity for the speakers, after hearing each other’s talks, to reconsider some core issues and discuss overarching themes and issues in more detail. One notable conclusion from the discussion was that all participants had similarly experienced the difficulty of addressing and representing non-binary gender notions in their research. It was observed that technology tends to impose binary gender with very little to no data available for analysis on other forms of gender-identification.   

The workshop demonstrated the great and acute contemporary relevance of the topic of gender in relation to language-based AI. The engaged participation of the audience, which included representatives from several tech companies, emphasised the importance of this issue when seeking to understand the social impact of language-based AI systems. 

Eugen Bär (left, Humanities and Social Change International Foundation) and Una Yeung (right, Giving Voice to Digital Democracies Project Administrator)

Text by Stefanie Ullmann and Marcus Tomalin

Pictures taken by Imke van Heerden and Stefanie Ullmann

Videos filmed and edited by Glenn Jobson

The Future of Artificial Intelligence: Language, Ethics, Technology 1024 341 Stefanie Ullmann

The Future of Artificial Intelligence: Language, Ethics, Technology

Inaugural workshop

25 March 2019

Centre for research in the arts, social sciences and humanities (crassh), cambridge

The Giving Voice to Digital Democracies project hosted its inaugural workshop on 25th March 2019 at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH). The project is part of the Centre for the Humanities and Social Change, Cambridge, funded by the Humanities and Social Change International Foundation.

This significantly oversubscribed event brought together experts from academia, government, and industry, enabling a diverse conversation and discussion with an engaged audience. The one-day workshop was opened by Project Manager Dr Marcus Tomalin who summarised the main purposes of the project and workshop.


The focus was specifically on the ethical implications of Artificially Intelligent Communications Technology (AICT). While discussions about ethics often revolve around issues such as data protection and privacy, transparency and accountability (all of which are important concerns), the impact that language-based AI systems have upon our daily lives is a topic that has previously received comparatively little (academic) attention. Some of the central issues that merit careful consideration are:

  • What might more ethical AICT systems look like? 
  • How could users be better protected against hate speech on social media? 
  • (How) can we get rid of data bias inherent in AICT systems? 

These and other questions were not only key on the agenda for the workshop, but will continue to be central research objectives for the project over the next 3½ years. 

Olly Grender (House of Lords Select Committee on Artificial Intelligence) was the first main speaker, and she argued that we need to put ethics at the centre of AI development. This is something to which the UK is particularly well-placed to contribute. She emphasised the need to equip people with a sufficiently deep understanding not only of AI, but also of the fundamentals of ethics. This will help to ensure that the prejudices of the past are not built into automated systems. She also emphasised the extent to which the government is focusing on these. The creation of the Centre for Data Ethics and Innovation is a conspicuous recent development, and numerous white papers about such matters have been, and will be, published. The forthcoming white paper concerning ‘online harm’ will be especially influential, and the Giving Voice to Digital Democracies project has been involved in preparing that paper.


In her talk, Dr Melanie Smallman (University College London, Alan Turing Institute) proposed a multi-scale ethical framework to combat social inequality caused and magnified by technology. In essence, she suggested that the ethical contexts at different levels of the hierarchy, from individual members of society to vast corporations, can differ greatly. Something that seems ethical justifiable at one level may not be at another level. These different scales need to be factored into the process of developing language-based AI systems. As Smallman reminded us, “we need to make sure that technology does something good”. 


Dr Adrian Weller (University of Cambridge, Alan Turing Institute, The Centre for Data Ethics and Innovation) gave an overview of various ethical issues that arise in relation to cutting-edge AI systems. He emphasised that we must take measures to ensure that we can trust the AI systems we create. He argued that we need to make sure people have a better understanding of when AI systems are likely to perform well, and when they are likely to go awry. While such systems are extremely powerful and effective in many respects, they can also be alarmingly brittle, and can make mistakes (e.g., classificatory errors) of a kind that no human would make. 

In his talk, Dr Marcus Tomalin (University of Cambridge) stressed that traditional thinking about ethics is inadequate for discussions of AICT systems. A more appropriate ethical framework would be ontocentric rather than predominantly anthropocentric, and patient-oriented rather than merely agent-oriented. He also argued that algorithmic decision-making can be hard to analyse in relation to AICT systems. For instance, it is not at all simple to determine where and when a machine translation system makes the decision to translate a specific word in a specific way. Yet such ‘decisions’ can have serious ethical consequences. 


Professor Emily M. Bender (University of Washington) presented a typology of ethical risks in language technology and asked the question: ‘how can we make the processes underlying NLP technologies more transparent?’ Her work centres on the foregrounding of characteristics of data sets in so-called ‘data statements’, providing information (e.g., nature of the data, whose language, speech situation, etc.) about data at all times. The underlying conviction that such statements would help system designers to appreciate in advance the impact that a specific data set may have on the system being constructed (e.g., whether or not it would reinforce an existing bias).


Dr Margaret Mitchell (Google Research and Machine Intelligence) also discussed the problem of data bias. She showed that such biases are manifold and that they interact with machine learning processes at various stages and levels. This is sometimes referred to as ‘bias network effect’ or ‘bias laundering’. Adopting an approach that was similar in spirit to the aforementioned ‘data statements’, she proposed the implementation of ‘model cards’ at the processing level. 


The workshop ended with a roundtable discussion involving the various speakers, with many of the questions coming from the audience. This provided an opportunity to consider some of the core ideas in greater detail and to compare and contrast some of the ideas and approaches that had been presented earlier in the day.

The considerable interest that this inaugural workshop generated confirms once again the great need for genuinely interdisciplinary events of this kind, which bring together researchers and experts from technology, the humanities, and politics to reflect upon the social impact of the current generation of AI systems – and especially those systems that interact with us using language.

Text by Stefanie Ullmann and Marcus Tomalin

Pictures taken by Imke van Heerden and Stefanie Ullmann

Videos filmed and edited by Glenn Jobson