The results point to one of the most challenging aspects of today’s AI-based hate speech detection: too little moderation, you cannot solve the problem; too much relaxation, you may censor the language used by marginalized groups to enhance and protect yourself: “Suddenly In between, you will punish the communities that are most often the targets of hate,” said Paul Rotger, a PhD candidate and co-author of the paper at the Oxford Internet Institute.
Lucy Vasserman, chief software engineer at Jigsaw, said that Perspective overcomes these limitations by relying on a human moderator to make the final decision. But this process is not scalable for larger platforms. Jigsaw is now working on a feature that will re-prioritize posts and comments based on the uncertainty of Perspective-automatically delete content that it must be annoying, and mark marginal content to humans.
She said the exciting thing about this new study is that it provides a fine-grained way to assess the state of the art. “This paper emphasized a lot of things, such as recycling words is a challenge for these models-this is a well-known thing in the industry, but it is really difficult to quantify,” she said. Jigsaw is now using HateCheck to better understand the differences between its models and areas for improvement.
The academic community is also excited about this research. Maarten Sap, a language artificial intelligence researcher at the University of Washington, said: “This paper provides us with a good clean resource for evaluating industry systems. It allows companies and users to request improvements.”
Thomas Davidson, assistant professor of sociology at Rutgers University, agrees. He said that the limitations of language models and the confusion of language mean that there is always a trade-off between underestimating and overrecognizing hate speech. “The HateCheck data set helps make these trade-offs visible,” he added.