Supervised Text-based Geolocation Using Language Models on an Adaptive Grid

Open AccessProceedings Article

Supervised Text-based Geolocation Using Language Models on an Adaptive Grid

- 12 Jul 2012

- pp 1500-1510

251

TL;DR: The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus and the two grid constructions can also be combined to produce consistently strong results across all training sets.

Abstract: The geographical properties of words have recently begun to be exploited for geolocating documents based solely on their text, often in the context of social media and online content. One common approach for geolocating texts is rooted in information retrieval. Given training documents labeled with latitude/longitude coordinates, a grid is overlaid on the Earth and pseudo-documents constructed by concatenating the documents within a given grid cell; then a location for a test document is chosen based on the most similar pseudo-document. Uniform grids are normally used, but they are sensitive to the dispersion of documents over the earth. We define an alternative grid construction using k-d trees that more robustly adapts to data, especially with larger training sets. We also provide a better way of choosing the locations for pseudo-documents. We evaluate these strategies on existing Wikipedia and Twitter corpora, as well as a new, larger Twitter corpus. The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus. The two grid constructions can also be combined to produce consistently strong results across all training sets.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Simplifying Graph Convolutional Networks

Felix Wu, +5 more

- 19 Feb 2019

- arXiv: Learning

TL;DR: In this paper, the authors reduce the complexity of GCN by successively removing nonlinearities and collapsing weight matrices between consecutive layers, which corresponds to a fixed low-pass filter followed by a linear classifier.

...read moreread less

1.2K

•Journal Article•10.1613/JAIR.4200

Text-based twitter user geolocation prediction

Bo Han, +2 more

- 01 Jan 2014

- Journal of Artificial Intelligence Resea...

TL;DR: This paper presents an integrated geolocation prediction framework, and evaluates the impact of nongeotagged tweets, language, and user-declared metadata on geolocated prediction, and discusses how users differ in terms of their geolocatability.

...read moreread less

389

•Proceedings Article

How Noisy Social Media Text, How Diffrnt Social Media Sources?

Timothy Baldwin, +4 more

- 01 Oct 2013

TL;DR: This work investigates just how linguistically noisy or otherwise text in social media text is over a range of social media sources, in the form of YouTube comments, Twitter posts, web user forum posts, blog posts and Wikipedia, which is compared to a reference corpus of edited English text.

...read moreread less

271

•Journal Article•10.33011/LILT.V9I.1321

Frege in Space: A Program of Compositional Distributional Semantics

Marco Baroni, +2 more

- 01 Jan 2014

- Linguistic Issues in Language Technology

TL;DR: The idea that word meaning can be approximated by the patterns of co-occurrence of words in corpora from statistical semantics and the idea that compositionality can be captured in terms of a syntax-driven calculus of function application from formal semantics are adopted.

...read moreread less

259

•Journal Article•10.1109/TKDE.2018.2807840

A Survey of Location Prediction on Twitter

Xin Zheng, +2 more

- 01 Sep 2018

- IEEE Transactions on Knowledge and Data ...

TL;DR: A survey of location prediction on Twitter can be found in this article, where the authors focus on the prediction of user home locations, tweet locations, and mentioned locations, by summarizing Twitter network, tweet content, and tweet context.

...read moreread less

234

...

Expand

References

Journal Article•10.1109/34.1000236

Mean shift: a robust approach toward feature space analysis

Dorin Comaniciu, +1 more

- 01 May 2002

- IEEE Transactions on Pattern Analysis an...

TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.

...read moreread less

12.9K

Journal Article•10.1145/361002.361007

Multidimensional binary search trees used for associative searching

Jon Louis Bentley

- 01 Sep 1975

- Communications of The ACM

TL;DR: The multidimensional binary search tree (or k-d tree) as a data structure for storage of information to be retrieved by associative searches is developed and it is shown to be quite efficient in its storage requirements.

...read moreread less

8.2K

•Journal Article•10.1145/355744.355745

An Algorithm for Finding Best Matches in Logarithmic Expected Time

Jerome H. Friedman, +2 more

- 01 Sep 1977

- ACM Transactions on Mathematical Softwar...

TL;DR: An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.

...read moreread less

3.1K

Journal Article•10.1145/3130348.3130368

A language modeling approach to information retrieval

Jay Ponte, +1 more

- 01 Aug 1998

TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.

...read moreread less

2.8K

An Algroithm for Finding Best Matches in Logarithmic Expected Time

Jerome H. Friedman, +2 more

- 01 Jul 1976

TL;DR: In this article, an algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record.

...read moreread less

1.8K

...

Expand

Supervised Text-based Geolocation Using Language Models on an Adaptive Grid

Chat with Paper

AI Agents for this Paper

Citations

Simplifying Graph Convolutional Networks

Text-based twitter user geolocation prediction

How Noisy Social Media Text, How Diffrnt Social Media Sources?

Frege in Space: A Program of Compositional Distributional Semantics

A Survey of Location Prediction on Twitter

References

Mean shift: a robust approach toward feature space analysis

Multidimensional binary search trees used for associative searching

An Algorithm for Finding Best Matches in Logarithmic Expected Time

A language modeling approach to information retrieval

An Algroithm for Finding Best Matches in Logarithmic Expected Time

Related Papers (5)

A Latent Variable Model for Geographic Lexical Variation

Simple supervised document geolocation with geodesic grids

You are where you tweet: a content-based approach to geo-locating twitter users

Text-based twitter user geolocation prediction

Geolocation Prediction in Social Media Data by Finding Location Indicative Words