Scholars

Asia-Pacific Mental Health and Well-being Congress

THEME: "Future Directions: Pioneering Mental Health and Well-being Initiatives"

img2 27-29 Oct 2025
img2 Bali, Indonesia
Anca Dinu

Anca Dinu

University of Bucharest, Romania

Automatic Detection and Classification of Mental Illnesses from General Social Media Texts


Biography

Anca Dinu is Assistant Professor at the University of Bucharest, Faculty Foreign Languages and Literatures and director of The Digital Humanities Research Centre, University of Bucharest. Her main research interests are Digital Humanities, NLP, formal and distributional semantics, and corpus linguistics. She obtained her PhD in Informatics in 2011, under the supervision of Solomon Marcus. She authored the book “A computational perspective on natural language semantics”, co-edited three conference volumes, translated four books (from Italian to Romanian) and written alone or with collaborators over 60 peer-reviewed articles. She has participated in 17 national and international research projects. She is the initiator and Chair of Recent Advances in Digital Humanities conference series. She also initiated and currently coordinates the Digital Humanities master program at the University of Bucharest, for which she has received the “University of Bucharest prize for the most innovative program” award in 2019.

Abstract

Social media is a vast source of unstructured text, including data that could help in early detection and classification of mantal illnesses. Transformer models can efficiently analyze such data. Our main RQ are: if and to what extent it is possible to detect and classify mental illnesses from general texts, rather than from texts from mental health support groups; if some mental illnesses are more difficult to detect and classify than others; if detection and classification may rely on posts only, rather than individuals. We experimented with SMHD mental health conditions dataset from Reddit, which contains general discussion texts grouped on users and illnesses. Individuals were tagged with a mental illness by self-reports in the dedicated support groups. The dataset contains nine illnesses from the psychiatric taxonomy DSM-5 disorders: schizophrenia, bipolar disorder, depression, anxiety, obsessive-compulsive disorders, feeding and eating disorders, trauma/stress, autism, and neurodevelopmental disorders. The dataset contains a control group of individuals who have no posts in the support groups. The texts do not contain terms related to mental health. We extracted the discriminating features for each illness group from a part of the dataset, using a Naïve Bayes classifier. We performed automatic classification between each diagnosed group and the control one, training 3 models with transformer architecture (BERT, RoBERTa, XLNET). We obtained state-of-the-art results (table1). There were significant differences in the classification performance between the nine illnesses. The highest F1 scores were obtained for eating disorders and PTSD, followed by OCD and BPD, based on their powerful discriminant features. Depression classification scored the lowest F1, which indicates that depression is the hardest to identify in linguistic acts. We proved that discriminative features combined with transformer models boost the performance of mental illnesses classification, and that classification of mental illnesses on general texts and on posts alone is feasible.