Posts By :

Stefanie Ullmann

Quarantining Online Hate Speech 680 332 Stefanie Ullmann

Quarantining Online Hate Speech

Research Publication

“quarantining online hate speech”

Ethics and information technology

10 October 2019

Press Release by Cambridge University

https://www.cam.ac.uk/research/news/online-hate-speech-could-be-contained-like-a-computer-virus-say-researchers

17 December 2019

Artificial intelligence is being developed that will allow advisory ‘quarantining’ of hate speech in a manner akin to malware filters – offering users a way to control exposure to ‘hateful content’ without resorting to censorship.

We can empower those at the receiving end of the hate speech poisoning our online discourses

Marcus Tomalin

The spread of hate speech via social media could be tackled using the same ‘quarantine’ approach deployed to combat malicious software, according to University of Cambridge researchers.

Definitions of hate speech vary depending on nation, law and platform, and just blocking keywords is ineffectual: graphic descriptions of violence need not contain obvious ethnic slurs to constitute racist death threats, for example.

As such, hate speech is difficult to detect automatically. It has to be reported by those exposed to it, after the intended “psychological harm” is inflicted, with armies of moderators required to judge every case.

This is the new front line of an ancient debate: freedom of speech versus poisonous language.

Now, an engineer and a linguist have published a proposal in the journal Ethics and Information Technology that harnesses cyber security techniques to give control to those targeted, without resorting to censorship.

Cambridge language and machine learning experts are using databases of threats and violent insults to build algorithms that can provide a score for the likelihood of an online message containing forms of hate speech.

As these algorithms get refined, potential hate speech could be identified and “quarantined”. Users would receive a warning alert with a “Hate O’Meter” – the hate speech severity score – the sender’s name, and an option to view the content or delete unseen.

This approach is akin to spam and malware filters, and researchers from the ‘Giving Voice to Digital Democracies’ project believe it could dramatically reduce the amount of hate speech people are forced to experience. They are aiming to have a prototype ready in early 2020.

“Hate speech is a form of intentional online harm, like malware, and can therefore be handled by means of quarantining,” said co-author and linguist Dr Stefanie Ullmann. “In fact, a lot of hate speech is actually generated by software such as Twitter bots.”

“Companies like Facebook, Twitter and Google generally respond reactively to hate speech,” said co-author and engineer Dr Marcus Tomalin. “This may be okay for those who don’t encounter it often. For others it’s too little, too late.”

“Many women and people from minority groups in the public eye receive anonymous hate speech for daring to have an online presence. We are seeing this deter people from entering or continuing in public life, often those from groups in need of greater representation,” he said.

Former US Secretary of State Hillary Clinton recently told a UK audience that hate speech posed a “threat to democracies”, in the wake of many women MPs citing online abuse as part of the reason they will no longer stand for election.

While in a Georgetown University address, Facebook CEO Mark Zuckerberg spoke of “broad disagreements over what qualifies as hate” and argued: “we should err on the side of greater expression”.

The researchers say their proposal is not a magic bullet, but it does sit between the “extreme libertarian and authoritarian approaches” of either entirely permitting or prohibiting certain language online.

Importantly, the user becomes the arbiter. “Many people don’t like the idea of an unelected corporation or micromanaging government deciding what we can and can’t say to each other,” said Tomalin.

“Our system will flag when you should be careful, but it’s always your call. It doesn’t stop people posting or viewing what they like, but it gives much needed control to those being inundated with hate.”

In the paper, the researchers refer to detection algorithms achieving 60% accuracy – not much better than chance. Tomalin’s machine learning lab has now got this up to 80%, and he anticipates continued improvement of the mathematical modeling.

Meanwhile, Ullman gathers more ‘training data’: verified hate speech from which the algorithms can learn. This helps refine the ‘confidence scores’ that determine a quarantine and subsequent Hate O’Meter read-out, which could be set like a sensitivity dial depending on user preference.

A basic example might involve a word like ‘bitch’: a misogynistic slur, but also a legitimate term in contexts such as dog breeding. It’s the algorithmic analysis of where such a word sits syntactically – the types of surrounding words and semantic relations between them – that informs the hate speech score.

“Identifying individual keywords isn’t enough, we are looking at entire sentence structures and far beyond. Sociolinguistic information in user profiles and posting histories can all help improve the classification process,” said Ullmann.

Added Tomalin: “Through automated quarantines that provide guidance on the strength of hateful content, we can empower those at the receiving end of the hate speech poisoning our online discourses.”

However, the researchers, who work in Cambridge’s Centre for Research into Arts, Humanities and Social Sciences (CRASSH), say that – as with computer viruses – there will always be an arms race between hate speech and systems for limiting it.

The project has also begun to look at “counter-speech”: the ways people respond to hate speech. The researchers intend to feed into debates around how virtual assistants such as ‘Siri’ should respond to threats and intimidation.

Text by Fred Lewsey


Deutschlandfunk – Computer und kommunikation (Computer and communication)

11 January 2020

Interview with dr Stefanie Ullmann

Listen to the interview here (in German)


BBC World Service – Digital planet

25 November 2019

Interview with Dr Stefanie Ullmann

Listen to the interview here (relevant bit starting at 13:35mins)


Text by Stefanie Ullmann

Fact-Checking Hackathon 1024 684 Stefanie Ullmann

Fact-Checking Hackathon

Fact-Checking Hackathon

10 January 2020, 10:00 – 12 January 2020, 16:00

Room LR4, Baker Building, Department of Engineering, University of Cambridge,Trumpington Street, Cambridge CB2 1PZ


Overview

Fake news, misinformation and disinformation are being created and circulated online with unprecedented speed and scale. There are concerns that this poses a serious threat to our modern digital societies by skewing public opinion about important issues and maliciously interfering with national election campaigns.

Fact-checking is an increasingly vital approach for tackling the rapid spread of false claims online. Specifically, there is an urgent need for automated systems that detect, extract and classify incorrect information in real time; and linguistic analyses of argument structure, entailment, stance marking, and evidentiality can assist the development of such systems.

We want to bring together people with different kinds of expertise to develop new approaches for tackling the problems posed by fake news, misinformation and disinformation. Taking an existing automated fact-checking system as a baseline, the main hackathon task will be to find ways of improving its performance. The experimental framework will be that used for the FEVER: Fact Extraction and VERification challenge (http://fever.ai). 

 

Is it for me?

The task of dealing with false claims online is necessarily an interdisciplinary task. Therefore, this hackathon will create a collaborative environment for participants from a variety of backgrounds to come together to work in teams. Whether you already have strong coding skills, a specific interest in disciplines such as information engineering or natural language processing, a familiarity with linguistic theory, or even an interest in the philosophy of language, you will certainly be able to make valuable contributions during the hackathon!

In particular we encourage undergraduates and postgraduates:

  • in Engineering / Computer Science, with good programming skills (esp. Python) 
  • in Linguistics / Philosophy / Psychology / Sociology
  • with an interest in language-based AI technologies 

 

Do I need to be able to code?

There will be a variety of ways to get involved and contribute during the hackathon, so coding experience is not essential. For instance, participants with a background in linguistics can analyse the linguistic data in detail, and then work together with coders so that their insights can improve the baseline system.

For those participants who would like to learn more about coding, there will be introductory sessions on Python during the hackathon – so this will be a good opportunity to dip your toe in the water!

 

Why should I attend?

  • A chance to collaborate in interdisciplinary teams to address a language-based technology problem that has huge contemporary importance.
  • An opportunity to learn about the challenges of developing an automated fact-checking system, and benefit from advice and insights from fact-checking experts.
  • A chance to learn Python, if you are new to coding.

 

Further details

The event runs from Friday to Sunday and attendees are expected to participate throughout.

Lunch will be provided on all three days, and there will be coffee and snacks throughout the hackathon, to keep you going!

If you have any questions about the event or would like to discuss any specific requirements please contact Shauna Concannon 

Image by igorstevanovic/Shutterstock.com

The Future of Artificial Intelligence: Language, Society, Technology 1024 647 Stefanie Ullmann

The Future of Artificial Intelligence: Language, Society, Technology

This workshop, the third in a series on the future of artificial intelligence, will focus on the impact of artificial intelligence on society, specifically on language-based technologies at the intersection of AI and ICT (henceforth ‘Artificially Intelligent Communications Technologies’ or ‘AICT’) – namely, speech technology, natural language processing, smart telecommunications and social media. The social impact of these technologies is already becoming apparent. Intelligent conversational agents such as Siri (Apple), Cortana (Microsoft) and Alexa (Amazon) are already widely used, and, in the next 5 to 10 years, a new generation of Virtual Personal Assistants (VPAs) will emerge that will increasingly influence all aspects of our lives, from relatively mundane tasks (e.g. turning the heating on and off) to highly significant activities (e.g. influencing how we vote in national elections). Crucially, our interactions with these devices will be predominantly language-based.

Despite this, the specific linguistic, ethical, psychological, sociological, legal and technical challenges posed by AICT (specifically) have rarely received focused attention. Consequently, the workshop will examine various aspects of the social impact of AICT-based systems in modern digital democracies, from both practical and theoretical perspectives. By doing so, it will provide an important opportunity to consider how existing AICT infrastructures can be reconfigured to enable the resulting technologies to benefit the communities that use them.

Speakers

Maria Luciana Axente (Pricewaterhouse Coopers)

Shauna Concannon (University of Cambridge)

Sarah Connolly (UK Department for Digital, Culture, Media & Sport)

Ella McPherson (University of Cambridge)

Trisha Meyer (Free University of Brussels – VUB)

Jonnie Penn (University of Cambridge)

The workshop is organised by Giving Voice to Digital Democracies, a research project that is part of the Centre for the Humanities and Social Change, Cambridge and funded by the Humanities and Social Change International Foundation.

Giving Voice to Digital Democracies explores the social impact of Artificially Intelligent Communications Technology – that is, AI systems that use speech recognition, speech synthesis, dialogue modelling, machine translation, natural language processing and/or smart telecommunications as interfaces. Due to recent advances in machine learning, these technologies are already rapidly transforming our modern digital democracies. While they can certainly have a positive impact on society (e.g. by promoting free speech and political engagement), they also offer opportunities for distortion and deception. Unbalanced data sets can reinforce problematical social biases; automated Twitter bots can drastically increase the spread of malinformation and hate speech online; and the responses of automated Virtual Personal Assistants during conversations about sensitive topics (e.g. suicidal tendencies, religion, sexual identity) can have serious consequences.

Responding to these increasingly urgent concerns, this project brings together experts from linguistics, philosophy, speech technology, computer science, psychology, sociology and political theory to develop design objectives for the creation of AICT systems that are more ethical, trustworthy and transparent. These technologies will have the potential to affect more positively the kinds of social change that will shape modern digital democracies in the immediate future.

Please register for the workshop here.

Queries: Una Yeung (uy202@cam.ac.uk

Image by Metamorworks/Shutterstock.com

The Future of Artificial Intelligence: Language, Gender, Technology 800 800 Stefanie Ullmann

The Future of Artificial Intelligence: Language, Gender, Technology

second workshop

17 May 2019

Centre for research in the arts, social sciences and humanities (crassh), cambridge

The Giving Voice to Digital Democracies project hosted its second workshop on 17th May 2019 at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH). The project is part of the Centre for the Humanities and Social Change, Cambridge, funded by the Humanities and Social Change International Foundation.

The project investigates the social impact of Artificially Intelligent Communications Technology (AICT). The talks and discussions of this second workshop focused specifically on different aspects of the complex relationships between language, gender, and technology. The one-day event brought together experts and researchers from various academic disciplines, looking at questions of gender in the context of language-based AI from linguistic, philosophical, sociological, and technical perspectives.


Professor Alison Adam (Sheffield Hallam University) was the first speaker of the day, and, she asked the very pertinent question of how relevant traditional feminist arguments and philosophical critiques of AI are nowadays in relations to gender, knowledge, and language. She pointed out that while older critiques focused on the distinction between machines and humans, newer critical approaches address concerns for fairness and bias. Despite these shifts, Adam stressed the abiding need to allow feminist arguments to inform the discussion.


Taking an approach based on formal semantics and computational psycholinguistics, Dr Heather Burnett (CNRS – Université Diderot Paris) presented the results of a study investigating the overuse of masculine pronouns in English and French. The talk ranged across numerous topics, including the way in which dominance relations can affect similarity judgments, making them no longer commutative. For instance, in male dominated professions women are likely to be considered similar to men, but in female dominated professions, men are unlikely to be considered similar to women. These asymmetries have implications for language use.


In his talk, Dr Dirk Hovy (Bocconi University) focused on the relation between gender and syntactic choices. He concluded, for instance, that how people identify in terms of gender (subconsciously) determines syntactic constructions they use in language (e.g., women use intensifiers [e.g.,  ‘very’] more often, while men use downtoners [e.g., ‘a bit’] more often). On the basis of a study of Wall Street Journal articles, he also notes that training data consisting mostly of female writing would in fact be beneficial for both men and women as women’s writing has shown to be more diverse overall. The importance of the corpora used for AICT research was emphasised repeatedly. Any linguistic corpus is a sample of a language, but it is also a sample of a particular demographic (or set of demographics).


Dr Ruth Page (University of Birmingham) discussed ‘Ugliness’ on Instagram and how the perception and representation of ‘ugly’ images on social media relate to identity and gender. She took a multimodal approach combining image and discourse analysis. Her research indicates that perceptions and discourses of ugliness are shifting on social media and, particularly, that users distinguish between playful and ironic illustrations of ugliness (using the hashtag #uglyselfie) and painful, negative post (#ugly). While ‘ugly’ is much more frequent in relation to girls than boys, the opposite is true for man/woman. She also showed that males favour self-deprecation more, whereas women are more likely to use self-mockery. 


In her talk, Dr Stefanie Ullmann (University of Cambridge) presented a corpus study of representations of gender in the OPUS English-German parallel data. She showed that the data sets are strongly biased towards male forms, particularly in German occupation words. The results of her study also indicate that representations of men and women reflect traditional gender-related stereotypes, such as doctors are male, nurses are female or women are caretakers, men are dominant and powerful. Using such clearly skewed texts as training data for machine translation inevitably leads to biased results and errors in translation. 


Finally, Dr Dong Nguyen (Alan Turing Institute, University of Utrecht) took a computational sociolinguistic perspective on the relation between language and gender. She presented the results of an experimental study in which a system (TweetGenie) had been trained to predict gender and age of people based on tweets they had written. She showed how speakers construct their own identify linguistically, and this process involves the gendered aspects of their language. Consequently, gender as manifest in written texts is fluid and variable, rather than something biological and fixed.


The workshop ended with a roundtable discussion involving all speakers, which gave the very engaged and interested audience a final chance to ask questions. It also provided an opportunity for the speakers, after hearing each other’s talks, to reconsider some core issues and discuss overarching themes and issues in more detail. One notable conclusion from the discussion was that all participants had similarly experienced the difficulty of addressing and representing non-binary gender notions in their research. It was observed that technology tends to impose binary gender with very little to no data available for analysis on other forms of gender-identification.   

The workshop demonstrated the great and acute contemporary relevance of the topic of gender in relation to language-based AI. The engaged participation of the audience, which included representatives from several tech companies, emphasised the importance of this issue when seeking to understand the social impact of language-based AI systems. 

Eugen Bär (left, Humanities and Social Change International Foundation) and Una Yeung (right, Giving Voice to Digital Democracies Project Administrator)

Text by Stefanie Ullmann and Marcus Tomalin

Pictures taken by Imke van Heerden and Stefanie Ullmann

Videos filmed and edited by Glenn Jobson

The Future of Artificial Intelligence: Language, Ethics, Technology 1024 341 Stefanie Ullmann

The Future of Artificial Intelligence: Language, Ethics, Technology

Inaugural workshop

25 March 2019

Centre for research in the arts, social sciences and humanities (crassh), cambridge

The Giving Voice to Digital Democracies project hosted its inaugural workshop on 25th March 2019 at the Centre for Research in the Arts, Social Sciences and Humanities (CRASSH). The project is part of the Centre for the Humanities and Social Change, Cambridge, funded by the Humanities and Social Change International Foundation.

This significantly oversubscribed event brought together experts from academia, government, and industry, enabling a diverse conversation and discussion with an engaged audience. The one-day workshop was opened by Project Manager Dr Marcus Tomalin who summarised the main purposes of the project and workshop.


The focus was specifically on the ethical implications of Artificially Intelligent Communications Technology (AICT). While discussions about ethics often revolve around issues such as data protection and privacy, transparency and accountability (all of which are important concerns), the impact that language-based AI systems have upon our daily lives is a topic that has previously received comparatively little (academic) attention. Some of the central issues that merit careful consideration are:

  • What might more ethical AICT systems look like? 
  • How could users be better protected against hate speech on social media? 
  • (How) can we get rid of data bias inherent in AICT systems? 

These and other questions were not only key on the agenda for the workshop, but will continue to be central research objectives for the project over the next 3½ years. 

Olly Grender (House of Lords Select Committee on Artificial Intelligence) was the first main speaker, and she argued that we need to put ethics at the centre of AI development. This is something to which the UK is particularly well-placed to contribute. She emphasised the need to equip people with a sufficiently deep understanding not only of AI, but also of the fundamentals of ethics. This will help to ensure that the prejudices of the past are not built into automated systems. She also emphasised the extent to which the government is focusing on these. The creation of the Centre for Data Ethics and Innovation is a conspicuous recent development, and numerous white papers about such matters have been, and will be, published. The forthcoming white paper concerning ‘online harm’ will be especially influential, and the Giving Voice to Digital Democracies project has been involved in preparing that paper.


In her talk, Dr Melanie Smallman (University College London, Alan Turing Institute) proposed a multi-scale ethical framework to combat social inequality caused and magnified by technology. In essence, she suggested that the ethical contexts at different levels of the hierarchy, from individual members of society to vast corporations, can differ greatly. Something that seems ethical justifiable at one level may not be at another level. These different scales need to be factored into the process of developing language-based AI systems. As Smallman reminded us, “we need to make sure that technology does something good”. 


Dr Adrian Weller (University of Cambridge, Alan Turing Institute, The Centre for Data Ethics and Innovation) gave an overview of various ethical issues that arise in relation to cutting-edge AI systems. He emphasised that we must take measures to ensure that we can trust the AI systems we create. He argued that we need to make sure people have a better understanding of when AI systems are likely to perform well, and when they are likely to go awry. While such systems are extremely powerful and effective in many respects, they can also be alarmingly brittle, and can make mistakes (e.g., classificatory errors) of a kind that no human would make. 

In his talk, Dr Marcus Tomalin (University of Cambridge) stressed that traditional thinking about ethics is inadequate for discussions of AICT systems. A more appropriate ethical framework would be ontocentric rather than predominantly anthropocentric, and patient-oriented rather than merely agent-oriented. He also argued that algorithmic decision-making can be hard to analyse in relation to AICT systems. For instance, it is not at all simple to determine where and when a machine translation system makes the decision to translate a specific word in a specific way. Yet such ‘decisions’ can have serious ethical consequences. 


Professor Emily M. Bender (University of Washington) presented a typology of ethical risks in language technology and asked the question: ‘how can we make the processes underlying NLP technologies more transparent?’ Her work centres on the foregrounding of characteristics of data sets in so-called ‘data statements’, providing information (e.g., nature of the data, whose language, speech situation, etc.) about data at all times. The underlying conviction that such statements would help system designers to appreciate in advance the impact that a specific data set may have on the system being constructed (e.g., whether or not it would reinforce an existing bias).


Dr Margaret Mitchell (Google Research and Machine Intelligence) also discussed the problem of data bias. She showed that such biases are manifold and that they interact with machine learning processes at various stages and levels. This is sometimes referred to as ‘bias network effect’ or ‘bias laundering’. Adopting an approach that was similar in spirit to the aforementioned ‘data statements’, she proposed the implementation of ‘model cards’ at the processing level. 


The workshop ended with a roundtable discussion involving the various speakers, with many of the questions coming from the audience. This provided an opportunity to consider some of the core ideas in greater detail and to compare and contrast some of the ideas and approaches that had been presented earlier in the day.

The considerable interest that this inaugural workshop generated confirms once again the great need for genuinely interdisciplinary events of this kind, which bring together researchers and experts from technology, the humanities, and politics to reflect upon the social impact of the current generation of AI systems – and especially those systems that interact with us using language.

Text by Stefanie Ullmann and Marcus Tomalin

Pictures taken by Imke van Heerden and Stefanie Ullmann

Videos filmed and edited by Glenn Jobson

The Future of Artificial Intelligence: Language, Gender, Technology 724 1024 Stefanie Ullmann

The Future of Artificial Intelligence: Language, Gender, Technology

A report of the event as well as videos of the talks can be found here.

The workshop will consider the social impact of Artificially Intelligent Communications Technology (AICT). Specifically, the talks and discussions will focus on different aspects of the complex relationships between language, gender, and technology. These issues are of particular relevance in an age when Virtual Personal Assistants such as Siri, Cortana, and Alexa present themselves as submissive females, when most language-based technologies manifest glaring gender-biases, when 78% of the experts developing AI systems are male, when sexist hate speech online is a widely-recognised problem and when many Western cultures and societies are increasingly recognising the significance of non-binary gender identities.

Speakers

Professor Alison AdamSheffield Hallam University

Dr Heather BurnettCNRS-Université Paris Diderot

Dr Dirk HovyBocconi University

Dr Dong NguyenAlan Turing Institute, University of Utrecht

Dr Ruth PageUniversity of Birmingham

Dr Stefanie UllmannUniversity of Cambridge

The workshop is organised by Giving Voice to Digital Democracies: The Social Impact of Artificially Intelligent Communications Technology, a research project which is part of the Centre for the Humanities and Social Change, Cambridge and funded by the Humanities and Social Change International Foundation.

Giving Voice to Digital Democracies explores the social impact of Artificially Intelligent Communications Technology – that is, AI systems that use speech recognition, speech synthesis, dialogue modelling, machine translation, natural language processing, and/or smart telecommunications as interfaces. Due to recent advances in machine learning, these technologies are already rapidly transforming our modern digital democracies. While they can certainly have a positive impact on society (e.g. by promoting free speech and political engagement), they also offer opportunities for distortion and deception. Unbalanced data sets can reinforce problematical social biases; automated Twitter bots can drastically increase the spread of malinformation and hate speech online; and the responses of automated Virtual Personal Assistants during conversations about sensitive topics (e.g. suicidal tendencies, religion, sexual identity) can have serious consequences.

Responding to these increasingly urgent concerns, this project brings together experts from linguistics, philosophy, speech technology, computer science, psychology, sociology and political theory to develop design objectives for the creation of AICT systems that are more ethical, trustworthy and transparent. These technologies will have the potential to affect more positively the kinds of social change that will shape modern digital democracies in the immediate future.

Please register for the workshop here.

Queries: Una Yeung (uy202@cam.ac.uk

Image by metamorworks/Shutterstock.com