TL;DR: The results indicate that the information acquired by different communicative modalities is equivalent from a mental processing standpoint, in particular, at the point at which the actor's communicative intention has to be reconstructed.
Abstract: Human communicative competence is based on the ability to process a specific class of mental states, namely, communicative intention. The present fMRI study aims to analyze whether intention processing in communication is affected by the expressive means through which a communicative intention is conveyed, that is, the linguistic or extralinguistic gestural means. Combined factorial and conjunction analyses were used to test two sets of predictions: first, that a common brain network is recruited for the comprehension of communicative intentions independently of the modality through which they are conveyed; second, that additional brain areas are specifically recruited depending on the communicative modality used, reflecting distinct sensorimotor gateways. Our results clearly showed that a common neural network is engaged in communicative intention processing independently of the modality used. This network includes the precuneus, the left and right posterior STS and TPJ, and the medial pFC. Additional brain areas outside those involved in intention processing are specifically engaged by the particular communicative modality, that is, a peri-sylvian language network for the linguistic modality and a sensorimotor network for the extralinguistic modality. Thus, common representation of communicative intention may be accessed by modality-specific gateways, which are distinct for linguistic versus extralinguistic expressive means. Taken together, our results indicate that the information acquired by different communicative modalities is equivalent from a mental processing standpoint, in particular, at the point at which the actor's communicative intention has to be reconstructed.
TL;DR: This is the first large-scale and comprehensive empirical comparison of eleven state-of-the-art modality fusion approaches in two video sentiment analysis tasks, with three SOTA benchmark corpora and shows that the attention mechanisms are the most effective for modelling crossmodal interactions, yet they are computationally expensive.
TL;DR: Wang et al. as mentioned in this paper proposed a multi-modal multi-label recognition TRansformers (M3TR) with the ternary relationship learning for inter-and intra-modalities.
Abstract: Multi-label image recognition aims to recognize multiple objects simultaneously in one image. Recent ideas to solve this problem have focused on learning dependencies of label co-occurrences to enhance the high-level semantic representations. However, these methods usually neglect the important relations of intrinsic visual structures and face difficulties in understanding contextual relationships. To build the global scope of visual context as well as interactions between visual modality and linguistic modality, we propose the Multi-Modal Multi-label recognition TRansformers (M3TR) with the ternary relationship learning for inter-and intra-modalities. For the intra-modal relationship, we make insightful conjunction of CNNs and Transformers, which embeds visual structures into the high-level features by learning the semantic cross-attention. For constructing the interactions between the visual and linguistic modalities, we propose a linguistic cross-attention to embed the class-wise linguistic information into the visual structure learning, and finally present a linguistic guided enhancement module to enhance the representation of high-level semantics. Experimental evidence reveals that with the collaborative learning of ternary relationship, our proposed M3TR achieves new state-of-the-art on two public multi-label recognition benchmarks.
TL;DR: The authors investigated the negotiation of interpersonal relations by interpreters in Chinese government press conferences and found a noticeable trend of explicit use of modal expressions in target speeches in both interpreting modes, i.e., consecutive and simultaneous.
Abstract: This paper investigates the negotiation of interpersonal relations by interpreters in Chinese government press conferences – a major instrument for the promotion of public diplomacy in China. Drawing on the theory of linguistic modality in systemic functional grammar (SFG) and the concept of explicitation (Englund Dimitrova 1993), we present a corpus-based discourse analysis of interpreters’ explicitation of modality and connect it to their participation in negotiating interpersonal relations in such a setting. Quantitative results indicate a noticeable trend of explicit use of modal expressions in target speeches in both interpreting modes, i.e., consecutive and simultaneous. Data from qualitative analysis illustrate the various explicitations that manifest interpersonal relations on different levels between interactants on the scene. We conclude by underlining the role of government press conference interpreters as active co-participants in public diplomatic settings, discussing the contributions of this work to empirical research on interpreters’ agency and its limitations, and suggesting new directions towards which further research might be carried out.
TL;DR: An approach for the author profiling task of the PAN 2013 challenge based on the idea of linguistic modality 3 that has been successfully used in other classification tasks such as authorship attri- bution, which yields good results on gender prediction and age identification.
Abstract: This paper describes an approach for the author profiling task of the PAN 2013 challenge. This work is based on the idea of linguistic modality 3 that has been successfully used in other classification tasks such as authorship attri- bution. We consider three different modalities: syntactic, stylistic, and semantic, each representing a different aspect of text. For each modality, we extract infor- mative meta features by computing the similarity relations between the feature vectors in the test files and the centroids of modality specific clusters. Since we were provided texts in both Spanish and English, we build a language indepen- dent framework for author profiling. For both English and Spanish documents, our system performed well for the age identification task. For gender prediction, although our system could not perform as expected for English, it yielded good results on Spanish.