2 | Francisco Rodriguez

Leveraging Unsupervised Task Adaptation and Semi-Supervised Learning With Semantic-Enriched Representations for Online Sexism Detection

Over the past decade, the proliferation of hateful and sexist content targeting women on social media has become a concerning issue, adversely affecting women's lives and freedom of expression. Previous efforts to detect online sexism have utilized monolingual ensemble transformers combined with data augmentation techniques that incorporate related‐domain data, such as hate speech. However, these approaches often struggle to capture the full diversity and complexity of sexism due to limitations in the size and quality of training data. In this study, we introduce a novel sexism detection system that employs in‐domain unlabeled data through unsupervised task‐adaptation techniques and semi‐supervised learning, using an efficient single multilingual transformer model. Additionally, we incorporate a Sentence‐BERT layer to enhance our system with semantically meaningful sentence embeddings. Our proposed system outperforms existing state‐of‐the‐art methods across all tasks and datasets, demonstrating its effectiveness in detecting and addressing sexism in social media text. These results underscore the potential of our approach, providing a foundation for further research and practical applications.

Detecting sexism in social media: an empirical analysis of linguistic patterns and strategies

With the rise of social networks, there has been a marked increase in offensive content targeting women, ranging from overt acts of hatred to subtler, often overlooked forms of sexism. The EXIST (sEXism Identification in Social neTworks) competition, initiated in 2021, aimed to advance research in automatically identifying these forms of online sexism. However, the results revealed the multifaceted nature of sexism and emphasized the need for robust systems to detect and classify such content. In this study, we provide an extensive analysis of sexism, highlighting the characteristics and diverse manifestations of sexism across multiple languages on social networks. To achieve this objective, we conducted a detailed analysis of the EXIST dataset to evaluate its capacity to represent various types of sexism. Moreover, we analyzed the systems submitted to the EXIST competition to identify the most effective methodologies and resources for the automated detection of sexism. We employed statistical methods to discern textual patterns related to different categories of sexism, such as stereotyping, misogyny, and sexual violence. Additionally, we investigated linguistic variations in categories of sexism across different languages and platforms. Our results suggest that the EXIST dataset covers a broad spectrum of sexist expressions, from the explicit to the subtle. We observe significant differences in the portrayal of sexism across languages; English texts predominantly feature sexual connotations, whereas Spanish texts tend to reflect neosexism. Across both languages, objectification and misogyny prove to be the most challenging to detect, which is attributable to the varied vocabulary associated with these forms of sexism. Additionally, we demonstrate that models trained on platforms like Twitter can effectively identify sexist content on less-regulated platforms such as Gab. Building on these insights, we introduce a transformer-based system with data augmentation techniques that outperforms competition benchmarks. Our work contributes to the field by enhancing the understanding of online sexism and advancing the technological capabilities for its detection.

Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data

During the last decade, hateful and sexist content towards women is being increasingly spread on social networks. The exposure to sexist speech has serious consequences to women's life and limits their freedom of speech. Previous studies have focused on identifying hatred or violence towards women. However, sexism is expressed in very different forms: it includes subtle stereotypes and attitudes that, although frequently unnoticed, are extremely harmful for both women and society. In this work, we propose a new task that aims to understand and analyze how sexism, from explicit hate or violence to subtle expressions, is expressed in online conversations. To this end, we have developed and released the first dataset of sexist expressions and attitudes in Twitter in Spanish (MeTwo) and investigate the feasibility of using machine learning techniques (both traditional and novel deep learning models) for automatically detecting different types of sexist behaviours. Our results show that sexism is frequently found in many forms in social networks, that it includes a wide range of behaviours, and that it is possible to detect them using deep learning approaches. We discuss the performance of automatic classification methods to deal with different types of sexism and the generalizability of our task to other subdomains, such as misogyny.