Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media

Question

1. What is the eRisk lab's focus?

2. How can LLMs be used for mental health assessment?

3. How are sentences ranked for relevance to BDI-II symptoms?

4. Generate diverse Reddit posts for BDI symptom?

Accepted Answer

The eRisk lab focuses on early risk detection of mental disorders from social media data. It started in 2017 with the pilot task of detecting depression from social media data. The lab organizes tasks yearly, expanding to other mental illnesses like eating disorders, pathological gambling, and self-harm. The current task involves retrieving and ranking social media posts with depression symptoms from the BDI-II questionnaire. The lab's proposed method generates synthetic Reddit posts resembling BDI-II responses to add diversity to the data and improve retrieval of relevant sentences.

Accepted Answer

LLMs can be used for mental health assessment in various ways. Recent advancements in large language models (LLMs) have shown potential in evaluating them for mental health assessment. For instance, Yang et al. compared ChatGPT with three supervised baselines and found that while ChatGPT achieved good results in a zero-shot classification setting, it lagged behind transformer-based specialized models for downstream tasks such as suicide and depression identification from social media data. Amin et al. performed an interpretable mental health analysis through emotional reasoning using ChatGPT on 11 datasets across 5 tasks related to depression, stress, and suicide ideation. Their results indicated that zero-shot ChatGPT performed better than traditional neural network architectures but could not surpass the performance of specialized transformer-based models. The authors also conducted human evaluations and tested the impact of emotional reasoning in mental health assessment, finding that emotional reasoning improved ChatGPT's performance and enabled the model to generate explanations for its predictions. Additionally, LLMs have been used to generate and augment data for mental health assessment. Meyer et al. evaluated the synthetic data generated by GPT-3 for conversational tasks and found that classifiers trained on synthetic data performed worse than those trained on fewer samples of real user-generated data. However, generating synthetic data might be a suitable approach in scenarios with limited data or resources. In summary, LLMs can be utilized for mental health assessment through zero-shot classification, emotional reasoning, and data generation and augmentation, offering potential benefits in improving mental health assessment and intervention strategies.

Accepted Answer

Sentences are ranked based on their relevance to the symptoms of the Beck Depression Inventory-II (BDI-II). The BDI-II is a questionnaire used to screen for depression and consists of 21 questions related to symptoms such as sadness, pessimism, loss of pleasure, and tiredness. Each question corresponds to one of the symptoms, with a Likert scale survey measuring the intensity of the symptom. In the eRisk 2023 Lab task, sentences from Reddit are ranked by their relevance to each BDI-II symptom. A sentence is considered relevant if it contains information about the user's mental state regarding the symptom, even if the user does not suffer from it. The data for this task includes 4 million sentences from 3,107 users, organized as TREC formatted sentences. Top-k pooling with k equal to 50 is used to evaluate the systems' performance, combining the top 50 relevant sentences for each symptom from each system. These sentences are then assessed by three annotators for relevance to the symptoms. A sentence is considered relevant if it contains information about the individual's state and is topically related to the BDI-II symptoms.

Accepted Answer

To create diverse Reddit posts for the BDI depression questionnaire, we need to generate {N} examples for the '{symptom}' symptom, with the BDI answer of interest being '{item}'. These posts should be in English, 2-3 sentences long, diverse in language, specific to personal experiences, and avoid using exact BDI item words. The posts should combine descriptions of past experiences with feelings or events, providing substantial content for ranking models. Examples may include self-disclosure, such as 'My cat passed away' or 'I just broke up with my partner'.

Accepted Answer

To ensure diversity and adherence to BDI-II responses, all posts from each Reddit user were pre-processed by removing URLs and texts not in English. This was achieved using the polyglot package, which detects languages. By eliminating non-English texts, the dataset becomes more focused and relevant for the research. This pre-processing step is crucial for maintaining the integrity of the data and ensuring accurate analysis of the emotional states expressed in the posts.

Accepted Answer

The best-performing model in majority voting ranking-based evaluation is SemSearchOnBDI2Queries, achieving 0.104 AP in the first scenario and 0.129 AP in the second one. This model uses BDI-II responses as queries and showed good performance in retrieving relevant sentences in top 10 documents. Despite the hypothesis that synthetically generated queries would improve performance, the generated texts provided too many details, which were not helpful for semantic search. Future work aims to experiment with different prompts to generate more diverse and semantically similar data.

Accepted Answer

The hypothesis tested was that using ChatGPT to generate synthetic data similar to Reddit posts would retrieve more relevant sentences for each BDI-II item. However, the hypothesis was proven false as the model using original BDI-II responses outputted more relevant sentences than the one using generated data. The synthetic data generated by ChatGPT was too specific for retrieving depression symptoms, indicating the need for prompt manipulation in future work.

Utilizing ChatGPT Generated Data to Retrieve Depression Symptoms from Social Media

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What is the eRisk lab's focus?

2. How can LLMs be used for mental health assessment?

3. How are sentences ranked for relevance to BDI-II symptoms?

4. Generate diverse Reddit posts for BDI symptom?

5. What pre-processing steps were taken for Reddit posts?

6. What is the best-performing model in majority voting ranking-based evaluation?

7. What hypothesis was tested in the eRisk Lab task?

Citations

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Large Language Models in Mental Health Care: a Scoping Review

A Comprehensive Evaluation of Large Language Models on Mental Illnesses

Artificial Intelligence for Analyzing Mental Health Disorders in Social Media: A Quarter-Century Narrative Review of Progress and Challenges (Preprint)

Aligning Actions and Walking to LLM-Generated Textual Descriptions

References

A rating scale for depression

Language Models are Few-Shot Learners

Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity

MPNet: Masked and Permuted Pre-training for Language Understanding

Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences

Related Papers (5)

Neobility at SemEval-2017 Task 1: An Attention-based Sentence Similarity Model

Sentence Similarity Computation in Question Answering Robot

Language-agnostic Representation from Multilingual Sentence Encoders for Cross-lingual Similarity Estimation.

Deep neural based name entity recognizer and classifier for English language

A comparison of approaches for measuring the semantic similarity of short texts based on word embeddings