Referer spam

Topic Tools

Papers

Book Chapter•10.1016/B978-012088469-8.50052-8•

Combating web spam with trustrank

[...]

Zoltan Gyongyi¹, Hector Garcia-Molina¹, Jan Pedersen²•Institutions (2)

31 Aug 2004

TL;DR: This paper proposes techniques to semi-automatically separate reputable, good pages from spam, and shows that they can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

...read moreread less

Abstract: Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

...read moreread less

1,322 citations

Proceedings Article•10.1145/1244408.1244412•

Improving web spam classifiers using link structure

[...]

Qingqing Gan¹, Torsten Suel¹•Institutions (1)

New York University¹

8 May 2007

TL;DR: A two-stage approach to improve the performance of common classifiers is described, which first implements a classifier to catch a large portion of spam in data, and design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood.

...read moreread less

Abstract: Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, including both content spam [16, 12] and link spam [22, 13]. However, any time an anti-spam technique is developed, spammers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Machine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifier to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassified result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.

...read moreread less

61 citations

Patent•

Dynamic web page referrer tracking and ranking

[...]

Robert John Sullivan, Gordon Charles Hotchkiss, Douglas Kent Wilson

24 Sep 2003

TL;DR: In this article, the authors dynamically produce alternate referrer pages substantially similar to pages previously viewed through a web browser by a visitor who linked to a target web page via a link on the previously viewed pages.

...read moreread less

Abstract: The invention dynamically produces alternate referrer pages substantially similar to pages previously viewed, through a web browser, by a visitor who linked to a target web page via a link on the previously viewed pages. When the browser links to the target page, a referrer URL is obtained for the referrer page from which the browser loaded the target page. The referrer URL is stored in a queue. The queue is inspected regularly. If the queue contains an unexamined entry, a request for that entry's referrer URL is executed to obtain the alternate referrer pages. The IP address of the computer running the browser is used to derive a country code corresponding to the IP address. The referrer URL request can be issued through a computer in a geographic region corresponding to the country code so that geographic biasing of the previously viewed pages will be reflected in the alternate pages.

...read moreread less

48 citations

Detecting hit shaving in click-through payment schemes

[...]

Michael K. Reiter¹, Vinod Anupam², Alain Jules Mayer²•Institutions (2)

AT&T Labs¹, Alcatel-Lucent²

31 Aug 1998

TL;DR: This paper explores simple and immediately useful approaches to enable referrers to monitor the number of click-throughs for which they should be paid.

...read moreread less

Abstract: A web user "clicks through" one web site, the referrer, to another web site, the target, if the user follows a hypertext link to the target's site contained in a web page served from the referrer's site. Numerous click-through payment programs have been established on the web, by which (the webmaster of) a target site pays a referrer site for each click through that referrer to the target. However, typically the referrer has no ability to verify that it is paid for every click-through to the target for which it is responsible. Thus, targets can undetectably omit to pay referrers for some number of click-throughs, a practice called hit shaving. In this paper, we explore simple and immediately useful approaches to enable referrers to monitor the number of click-throughs for which they should be paid.

...read moreread less

37 citations

Proceedings Article•10.1145/1810617.1810683•

On the robustness of google scholar against spam

[...]

Jöran Beel¹, Bela Gipp¹•Institutions (1)

University of California, Berkeley¹

13 Jun 2010

TL;DR: The results show it is possible to spam Google Scholar by 'improved' the ranking of articles by manipulating their citation counts and made articles appear in searchers for keywords the articles did not originally contained by placing invisible text in modified versions of the article.

...read moreread less

Abstract: In this research-in-progress paper we present the current results of several experiments in which we analyzed whether spamming Google Scholar is possible. Our results show, it is possible: We 'improved' the ranking of articles by manipulating their citation counts and we made articles appear in searchers for keywords the articles did not originally contained by placing invisible text in modified versions of the article.

...read moreread less

30 citations

...

Expand

Year	Papers
2016	1
2015	3
2014	2
2013	3
2012	6
2011	2

Topic Tools

Papers

Combating web spam with trustrank

Improving web spam classifiers using link structure

Dynamic web page referrer tracking and ranking

Detecting hit shaving in click-through payment schemes

On the robustness of google scholar against spam

Related Topics (5)

Performance Metrics