Previous work has examined bias in hate speech datasets with regards to race and with regards to gender, but the intersection of the two has not been examined yet. This is an oversight, since intersectionality is all the rage in social science circles and poses an interesting question to the NLP community.

The dataset investigated here is the Founta Twitter dataset. Labels for gender, race and political party affiliation were generated by models trained on relevant data. For race a previously developed model is used to classify the language used in tweets as closer to African or White American speech. For gender and political affiliation the authors trained models on datasets containing such labels. A big issue with this approach is accuracy. If a white male speaks in afro-american lingo, the tweet will be classified as coming from an African American. And vice-versa, if a black man speaks without dialect, he will likely be marked as a White American.

Inspired by social science literature, the authors hypothesize that tweets from African Americans will be marked as offensive/hateful more often than their White American counterparts. And furthermore, tweets by African American males will be labeled as problematic more often that those written by African American females. This hypothesis is found to hold true, although the intersectional bias seems to be small. Namely, while African American males are almost four times more likely to have tweets labeled as offensive/hateful than White Americans, compared to African American women the increase is only 77%.

An interesting follow-up would be to see whether African Americans agree that tweets by African Americans are hateful or not. Maybe the problem again lies with annotator bias. Maybe though the problem lies with the setup of this study, where these sensitive labels were generated by models and may not very accurately depict reality.


Jae Yeon Kim et al. Intersectional Bias in Hate Speech and Abusive Language Datasets. 2020. arXiv: 2005.05921 [cs.CL].