Although current deep learning-based face forgery detectors achieve impressive performance in constrained scenarios, they are vulnerable to samples created by unseen manipulation methods. Some recent works show improvements in generalisation but rely on cues that are easily corrupted by common post-processing operations such as compression. In this paper, we propose LipForensics, a detection approach capable of both generalising to novel manipulations and withstanding various distortions. LipForensics targets high-level semantic irregularities in mouth movements, which are common in many generated videos. It consists in first pretraining a spatio-temporal network to perform visual speech recognition (lipreading), thus learning rich internal representations related to natural mouth motion. A temporal network is subsequently finetuned on fixed mouth embeddings of real and forged data in order to detect fake videos based on mouth movements without overfitting to low-level, manipulation-specific artefacts. Extensive experiments show that this simple approach significantly surpasses the state-of-the-art in terms of generalisation to unseen manipulations and robustness to perturbations, as well as shed light on the factors responsible for its performance.

Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection

/pdf/lips-don-t-lie-a-generalisable-and-robust-approach-to-face-54fkha3uy8.pdf

Deep Learning (DL) has been effectively utilized in various complicated challenges in healthcare, industry, and academia for various purposes, including thyroid diagnosis, lung nodule recognition, computer vision, large data analytics, and human‐level control. Nevertheless, developments in digital technology have been used to produce software that poses a threat to democracy, national security, and confidentiality. Deepfake is one of those DL‐powered apps that has lately surfaced. So, deepfake systems can create fake images primarily by replacement of scenes or images, movies, and sounds that humans cannot tell apart from real ones. Various technologies have brought the capacity to change a synthetic speech, image, or video to our fingers. Furthermore, video and image frauds are now so convincing that it is hard to distinguish between false and authentic content with the naked eye. It might result in various issues and ranging from deceiving public opinion to using doctored evidence in a court. For such considerations, it is critical to have technologies that can assist us in discerning reality. This study gives a complete assessment of the literature on deepfake detection strategies using DL‐based algorithms. We categorize deepfake detection methods in this work based on their applications, which include video detection, image detection, audio detection, and hybrid multimedia detection. The objective of this paper is to give the reader a better knowledge of (1) how deepfakes are generated and identified, (2) the latest developments and breakthroughs in this realm, (3) weaknesses of existing security methods, and (4) areas requiring more investigation and consideration. The results suggest that the Conventional Neural Networks (CNN) methodology is the most often employed DL method in publications. According to research, the majority of the articles are on the subject of video deepfake detection. The majority of the articles focused on enhancing only one parameter, with the accuracy parameter receiving the most attention.This article is categorized under:
Technologies > Machine Learning
Algorithmic Development > Multimedia
Application Areas > Science and Technology


Deepfake detection using deep learning methods: A systematic and comprehensive review

A comprehensive overview of Deepfake: Generation, detection, datasets, and opportunities

The Metaverse is a multi-user virtual world that combines physical reality with digital virtual reality. The three basic technologies for building the Metaverse are immersive technologies, artificial intelligence, and blockchain. Companies are subsequently making significant investments into creating an artificially intelligent Metaverse, with the consequence that cybersecurity has become more crucial. As cybercrime increases exponentially, it is evident that a comprehensive study of Metaverse security based on artificial intelligence is lacking. A growing number of distributed denial-of-service attacks and theft of user identification information makes it necessary to conduct comprehensive and inclusive research in this field in order to identify the Metaverse’s vulnerabilities and weaknesses. This article provides a summary of existing research on AI-based Metaverse cybersecurity and discusses relevant security challenges. Based on the results, the issue of user identification plays a very important role in the presented works, for which biometric methods are the most commonly used. While the use of biometric data is considered the safest method, due to their uniqueness, they are also susceptible to misuse. A cyber-situation management system based on artificial intelligence should be able to analyze data of any volume with the help of algorithms. To prepare researchers who will pursue this topic in the future, this article provides a comprehensive summary of research on cybersecurity in the Metaverse based on artificial intelligence.

https://www.mdpi.com/2076-3417/12/24/12993/pdf?version=1671606526

Cybersecurity in the AI-Based Metaverse: A Survey

Recent research has demonstrated that lip-based speaker authentication systems can not only achieve good authentication performance but also guarantee liveness. However, with modern DeepFake technology, attackers can produce the talking video of a user without leaving any visually noticeable fake traces. This can seriously compromise traditional face-based or lip-based authentication systems. To defend against sophisticated DeepFake attacks, a new visual speaker authentication scheme based on the deep convolutional neural network (DCNN) is proposed in this paper. The proposed network is composed of two functional parts, namely, the Fundamental Feature Extraction network (FFE-Net) and the Representative lip feature extraction and Classification network (RC-Net). The FFE-Net provides the fundamental information for speaker authentication. As the static lip shape and lip appearance is vulnerable to DeepFake attacks, the dynamic lip movement is emphasized in the FFE-Net. The RC-Net extracts high-level lip features that discriminate against human imposters while capturing the client’s talking style. A multi-task learning scheme is designed, and the proposed network is trained end-to-end. Experiments on the GRID and MOBIO datasets have demonstrated that the proposed approach is able to achieve an accurate authentication result against human imposters and is much more robust against DeepFake attacks compared to three state-of-the-art visual speaker authentication algorithms. It is also worth noting that the proposed approach does not require any prior knowledge of the DeepFake spoofing method and thus can be applied to defend against different kinds of DeepFake attacks.

Preventing DeepFake Attacks on Speaker Authentication by Dynamic Lip Movement Analysis

Alkaline pectate lyase has developmental prospects in the textile, pulp, paper, and food industries. In this study, we selected BacPelA, the pectin lyase with the highest expression activity from Bacillus clausii, modified and expressed in Escherichia coli BL21(DE3). Through fragment replacement, the catalytic activity of the enzyme was significantly improved. The optimum pH and temperature of the modified pectin lyase (PGLA-rep4) were 11.0 and 70 °C, respectively. It also exhibited a superior ability to cleave methylated pectin. The enzyme activity of PGLA-rep4, measured at 235 nm with 0.2% apple pectin as the substrate, was 554.0 U/mL, and the specific enzyme activity after purification using a nickel column was 822.9 U/mg. After approximately 20 ns of molecular dynamics simulation, the structure of the pectin lyase PGLA-rep4 tended to be stable. The root mean square fluctuation (RMSF) values at the key catalytically active site, LYS168, were higher than those of the wildtype PGLA. In addition, PGLA-rep4 was relatively stable in the presence of metal ions. PGLA-rep4 has good enzymatic properties and activities and maintains a high pH and temperature. This study provides a successful strategy for enhancing the catalytic activity of PGLA-rep4, making it the ultimate candidate for degumming and various uses in the pulp, paper, and textile industries. 

/pdf/modification-and-application-of-highly-active-alkaline-27r8twtx.pdf

Modification and application of highly active alkaline pectin lyase

Recently, the field of Text-to-Speech (TTS) has been dominated by one-stage text-to-waveform models which have significantly improved speech quality compared to two-stage models. In this work, we propose EfficientTTS 2 (EFTS2), a one-stage high-quality end-to-end TTS framework that is fully differentiable and highly efficient. Our method adopts an adversarial training process, with a differentiable aligner and a hierarchical-VAE-based waveform generator. These design choices free the model from the use of external aligners, invertible structures, and complex training procedures as most previous TTS works have. Moreover, we extend EFTS2 to the voice conversion (VC) task and propose EFTS2-VC, an end-to-end VC model that allows high-quality speech-to-speech conversion. Experimental results suggest that the two proposed models achieve better or at least comparable speech quality compared to baseline models, while also providing faster inference speeds and smaller model sizes.

EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion

Code-switching is a common phenomenon in multilingual communities. In this paper, we study end-to-end model for Mandarin-English intra-sentential code-switching speech recognition. A lightweight Switch-Routing network is proposed, which includes two experts and a switch router. Two experts, representing Mandarin and English learners, implicitly provide language identification information and skillfully use monolingual data to assist code-switching task training, which solves the problem of data sparsity. In addition, our network is a lightweight structure, which makes use of the advantages of Switch Transformer and discards its weakness of increasing model capacity. Finally, we study the effect of using lightweight Switch Routing in different blocks of encoder and decoder. Compared with Bi-Encoder, proposed model has a better performance on the ASRU code-switching test set, and the most important thing is that it requires much less inference time with RTF decreasing by 31.39 % .

Improving End-to-End Modeling For Mandarin-English Code-Switching Using Lightweight Switch-Routing Mixture-of-Experts

Inspired by EfficientTTS [1], a recent proposed speech synthe-sis model, we propose a new way to train attention-based end-to-end speech recognition models with an additional training objective, allowing the models to learn the monotonic alignments effectively and efficiently. The introduced training objective is differentiable, computationally cheap and most im-portantly, of no constraint on network structures. Thus, it is quite convenient to be incorporated into many speech recognition models. Through extensive experiments on CTC/Attetion architecture with conformer blocks, we observed that the performance of our models significantly outperform baseline models. Specifically, our best performing model achieves WER (Word Error Rate) 3 . 18% on LibriSpeech test-clean benchmark and 8 . 41% on test-other. Comparing with a strong baseline ob-tained by WeNet, the proposed model gets 7 . 6% relative WER reduction on test-clean and 6 . 9% on test-other.

Towards Efficiently Learning Monotonic Alignments for Attention-based End-to-End Speech Recognition

Handle everyday research tasks with reliable, citation-backed results

Your personal Research Agent to handle research tasks with citation-backed results

Popular Tasks used by Researchers

How can I help with your research?

Meet SciSpace

Get more enhanced response by uploading the PDFs you want me to reference.

No relevant PDFs in your library

SciSpace is the AI research assistant for academics. Run systematic literature reviews on 280M+ papers, and write papers with cited sources. Trusted by 1M+ students, PhDs & researchers.

SciSpace | AI for Research

Analyze PDFs

Code & Manuscripts

Funding & Grants

Literature & Patents

Medical & Clinical Data

Systematic Review

Visualize & Present

Web & Data

Build a Google Scholar-like website for your research.

Build a website

Create charts and images for your research

Create a Chart

Write a paper for submission to a journal

Draft a manuscript

Patent Search

Design eye-catching scientific posters in minutes.

Scientific Poster Generation

Systematic Literature Review

One task is running at the moment. Your messages will be shown right after.

Drag and drop or click here to browse

Loved by <highlight>1 million+</highlight> researchers

Extract a list of specific topics and their sources from unstructured text

Topics

Compare and analyze relevant papers that matches with your search

Papers

Get insights from PDFs and bookmarked papers from your library

My library

Recent searches

Try searching for:

Catch AI-generated content in scholarly and non-scholarly content

{ai} Detector

Ai Writer

Get PDF Summaries, highlighted text explanations 

Chat with PDF

Effortlessly create in-text citations and bibliographies in APA and 2,500 other formats

Citation generator

Get explanations, summaries, and answers on academic papers

Ease up your research workflow with {scispace}'s cohort of exciting AI tools

Elevate your academic writing skills and convey your ideas the way you want

Paraphraser

Explore our range of reading and writing tools

Your file is being prepared and should be ready in a few minutes. If it's a large file, it might take a bit longer. You can close this window, and we'll email you the file when it's done.

You have reached a maximum limit of <strong>{limit}</strong> columns in the table. Remove at least <strong>1</strong> column to add or create another one.

Jun Ma

Author Tools

Chat about Author