1. What is the main focus of the proposed approach in depression detection?
The main focus of the proposed approach in depression detection is to use a Graph Convolutional Network (GCN) to classify transcribed sessions between a therapist and a subject seeking medical attention. It aims to be data agnostic, low computational cost, and interpretable. The approach introduces a novel weighting approach for self-connection nodes to address the limiting assumptions of locality and equal importance of self-connections vs. edges to neighboring nodes in GCNs. It also evaluates the first inductive implementation of GCNs in depression detection from transcribed interviews, outperforming previous results on benchmark datasets. Additionally, the interpretability potential of the model is demonstrated, aligning with findings in psychology research.
read more
2. What is the purpose of using a Graph Convolutional Network (GCN) in text classification?
The purpose of using a Graph Convolutional Network (GCN) in text classification is to model global word co-occurrences explicitly and learn the relation between words and output labels. GCNs operate directly on a graph and induce embedding vectors of nodes based on the properties of their neighbors. In the context of text classification, GCNs generate a large and heterogeneous text graph that contains word nodes and training document nodes. The GCN architecture consists of two layers: the first layer learns the intermediate representation of the nodes (words and documents), while the second layer learns the output representation. The output representation propagates label information from the documents to the word nodes as output probabilities, enabling the model to learn the relationship between words and output labels. This aspect enhances the interpretability of the model, allowing for a better understanding of how words contribute to the classification of documents. Additionally, GCNs can be optimized using feature selection techniques to reduce the vocabulary size, improving model efficiency and interpretability.
read more
3. What datasets are used for experiments?
The experiments use the DAIC-WOZ and E-DAIC datasets. Both contain semi-structured clinical interviews in North American English, performed by an animated virtual interviewer. The datasets are multimodal, composed of audio and video recordings, transcribed text, and PHQ-8 scores. During experiments, only speech transcripts from subjects' responses are used. The DAIC-WOZ has a smaller vocabulary size compared to E-DAIC, indicating lesser variation in terminology and lower lexical richness.
read more
4. What models were used as baseline in the implementation details?
Six pretrained transformer-based models (bert-base-cased, bert-baseuncased, bert-large-cased, bert-large-uncased, roberta-base, roberta-large) were used as baseline models. Additionally, Support Vector Machine (SVM) with linear kernel and Logistic Regression (LR) model were used as simple models. GCN models were also used with varying vocabulary sizes and node representation sizes.
read more