Abstract: Machine Learning is an interesting tool for stance recognition in a large-scale context, in terms of data size, but also regarding the topics and themes addressed or the languages employed by the participants. Public consultations of citizens using Online Participatory Democracy platforms offer this kind of setting and are good use cases for automatic stance recognition systems. In this paper, we propose to use three datasets of public consultations, in order to train a model able to classify the stance of a citizen within a text, towards a proposal or a debate question. We study stance detection in several contexts: using data from an online platform without interactions between users, using multilingual data from online debates that are in one language, and using data from online intra-multilingual debates which can contain several languages inside the same unique debate discussion. We propose several baselines and methods in order to take advantage of the different available data, by comparing the results of models using out-of-dataset annotations, and binary or ternary annotations from the target dataset. We finally proposed a self-supervised learning method to take advantage of unlabeled data.
We annotated both the datasets with ternary stance labels and made them available.
Mini bio: Valentin Barriere is researcher at the Centro Nacional de Inteligencia Artificial. He is working on multimodal and multilingual natural language, using content from videos, online debate platforms, or social media. He also works using remote sensing data. Before that he was research officer at the European Commission's Joint Research Center in Ispra, Italy, where he worked more specifically on disaster response using social media, multilingual sentiment analysis and stance recognition. During his PhD in Télécom Paris, he worked on affective phenomena detection in oral interaction, using graphical discriminative models using features leveraging the robustness and the high accuracy of Machine Learning algorithms with the fine-grained modeling of linguistic rules. He has been organizing WASSA at EACL21 and ACL22, and is this year organizing WASSA at ACL23 and is the main organizer of a shared task of Touché Lab at CLEF23.
--
Comunicaciones DCC