The impacts of fine-tuning, phylogenetic distance, and sample size on big-data bioacoustics
9
TL;DR: In this paper , the authors used a machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, and evaluated the utility of applying trained models to novel species, finding that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa.
read more
Abstract: Vocalizations in animals, particularly birds, are critically important behaviors that influence their reproductive fitness. While recordings of bioacoustic data have been captured and stored in collections for decades, the automated extraction of data from these recordings has only recently been facilitated by artificial intelligence methods. These have yet to be evaluated with respect to accuracy of different automation strategies and features. Here, we use a recently published machine learning framework to extract syllables from ten bird species ranging in their phylogenetic relatedness from 1 to 85 million years, to compare how phylogenetic relatedness influences accuracy. We also evaluate the utility of applying trained models to novel species. Our results indicate that model performance is best on conspecifics, with accuracy progressively decreasing as phylogenetic distance increases between taxa. However, we also find that the application of models trained on multiple distantly related species can improve the overall accuracy to levels near that of training and analyzing a model on the same species. When planning big-data bioacoustics studies, care must be taken in sample design to maximize sample size and minimize human labor without sacrificing accuracy.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Improving the workflow to crack Small, Unbalanced, Noisy, but Genuine (SUNG) datasets in bioacoustics: The case of bonobo calls
TL;DR: In this article , the most recent advances in machine learning applied to a SUNG dataset succeed in unraveling the complex vocal repertoire of the bonobo, and propose a workflow that can be effective with other animal species.
7
Goal-directed and flexible modulation of syllable sequence within birdsong
Takuto Kawaji,Mizuki Fujibayashi,Kentaro Abe +2 more
TL;DR: This work elucidates the flexibility which songbirds exhibit in the organizing and sequencing of syllables within their songs, and identifies the involvement of the parietal-basal ganglia pathway in orchestrating these flexible modulations of syllable sequences.
5
Crowsetta: A Python tool to work with any format for annotating animal vocalizations and bioacoustics data
TL;DR: Crowsetta as discussed by the authors is a Python tool to annotate animal vocalizations and bioacoustics data with any format for annotating animal vocalization and bioACoustics.
5
Proceedings of the 22nd Python in Science Conference
TL;DR: Blosc2 NDim enables efficient exploration of large multidimensional datasets by adding support for large dimensional datasets and implementing a new two-level data partition. This significantly accelerates slicing speed and compression efficiency.
2
vak: a neural network framework for researchers studying animal acoustic communication
David A. Nicholson,Yarden Cohen +1 more
TL;DR: Researchers developed vak, a neural network framework for studying animal acoustic communication, enabling easy benchmarking and testing of models with user data, and providing a command-line interface and API for domain-specific PyTorch applications.
1
References
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Deep learning
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
67K
Neural networks and physical systems with emergent collective computational abilities
TL;DR: A model of a system having a large number of simple equivalent components, based on aspects of neurobiology but readily adapted to integrated circuits, produces a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size.
19K
Maximum entropy modeling of species geographic distributions
TL;DR: In this paper, the use of the maximum entropy method (Maxent) for modeling species geographic distributions with presence-only data was introduced, which is a general-purpose machine learning method with a simple and precise mathematical formulation.
16.5K
Least squares quantization in PCM
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.