Open AccessProceedings Article
Supervised Text-based Geolocation Using Language Models on an Adaptive Grid
Stephen Roller,Michael Speriosu,Sarat Rallapalli,Benjamin Wing,Jason Baldridge +4 more
- 12 Jul 2012
- pp 1500-1510
TL;DR: The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus and the two grid constructions can also be combined to produce consistently strong results across all training sets.
read more
Abstract: The geographical properties of words have recently begun to be exploited for geolocating documents based solely on their text, often in the context of social media and online content. One common approach for geolocating texts is rooted in information retrieval. Given training documents labeled with latitude/longitude coordinates, a grid is overlaid on the Earth and pseudo-documents constructed by concatenating the documents within a given grid cell; then a location for a test document is chosen based on the most similar pseudo-document. Uniform grids are normally used, but they are sensitive to the dispersion of documents over the earth. We define an alternative grid construction using k-d trees that more robustly adapts to data, especially with larger training sets. We also provide a better way of choosing the locations for pseudo-documents. We evaluate these strategies on existing Wikipedia and Twitter corpora, as well as a new, larger Twitter corpus. The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus. The two grid constructions can also be combined to produce consistently strong results across all training sets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Simplifying Graph Convolutional Networks
TL;DR: In this paper, the authors reduce the complexity of GCN by successively removing nonlinearities and collapsing weight matrices between consecutive layers, which corresponds to a fixed low-pass filter followed by a linear classifier.
1.2K
Text-based twitter user geolocation prediction
Bo Han,Paul Cook,Timothy Baldwin +2 more
TL;DR: This paper presents an integrated geolocation prediction framework, and evaluates the impact of nongeotagged tweets, language, and user-declared metadata on geolocated prediction, and discusses how users differ in terms of their geolocatability.
•Proceedings Article
How Noisy Social Media Text, How Diffrnt Social Media Sources?
Timothy Baldwin,Paul Cook,Marco Lui,Andrew MacKinlay,Li Wang +4 more
- 01 Oct 2013
TL;DR: This work investigates just how linguistically noisy or otherwise text in social media text is over a range of social media sources, in the form of YouTube comments, Twitter posts, web user forum posts, blog posts and Wikipedia, which is compared to a reference corpus of edited English text.
Frege in Space: A Program of Compositional Distributional Semantics
TL;DR: The idea that word meaning can be approximated by the patterns of co-occurrence of words in corpora from statistical semantics and the idea that compositionality can be captured in terms of a syntax-driven calculus of function application from formal semantics are adopted.
A Survey of Location Prediction on Twitter
Xin Zheng,Jialong Han,Aixin Sun +2 more
TL;DR: A survey of location prediction on Twitter can be found in this article, where the authors focus on the prediction of user home locations, tweet locations, and mentioned locations, by summarizing Twitter network, tweet content, and tweet context.
234
References
Mean shift: a robust approach toward feature space analysis
Dorin Comaniciu,Peter Meer +1 more
TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.
12.9K
Multidimensional binary search trees used for associative searching
TL;DR: The multidimensional binary search tree (or k-d tree) as a data structure for storage of information to be retrieved by associative searches is developed and it is shown to be quite efficient in its storage requirements.
8.2K
An Algorithm for Finding Best Matches in Logarithmic Expected Time
TL;DR: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.
A language modeling approach to information retrieval
Jay Ponte,W. Bruce Croft +1 more
- 01 Aug 1998
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.
An Algroithm for Finding Best Matches in Logarithmic Expected Time
Jerome H. Friedman,Jon Louis Bentley,Raphael A. Finkel +2 more
- 01 Jul 1976
TL;DR: In this article, an algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.
1.8K