Topic

Spamdexing

About: Spamdexing is a research topic. Over the lifetime, 1409 publications have been published within this topic receiving 60562 citations. The topic is also known as: search engine spam & search engine spamming.

...read moreread less

Topic Tools

Find unexplored research gaps

Generate a literature review

Explore related concepts

Papers published on a yearly basis

Papers

Journal Article•10.1016/S0169-7552(98)00110-X•

The anatomy of a large-scale hypertextual Web search engine

[...]

Sergey Brin¹, Lawrence Page¹•Institutions (1)

Stanford University¹

1 Apr 1998

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

...read moreread less

16,670 citations

Proceedings Article•

The PageRank Citation Ranking : Bringing Order to the Web

[...]

Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd

11 Nov 1999

TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.

...read moreread less

Abstract: The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.

...read moreread less

16,420 citations

Proceedings Article•10.1145/1341531.1341560•

Opinion spam and analysis

[...]

Nitin Jindal¹, Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

11 Feb 2008

TL;DR: It is shown that opinion spam is quite different from Web spam and email spam, and thus requires different detection techniques, and therefore requires some novel techniques to detect them.

...read moreread less

Abstract: Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research has been focused on classification and summarization of opinions using natural language processing and data mining techniques. An important issue that has been neglected so far is opinion spam or trustworthiness of online opinions. In this paper, we study this issue in the context of product reviews, which are opinion rich and are widely used by consumers and product manufacturers. In the past two years, several startup companies also appeared which aggregate opinions from product reviews. It is thus high time to study spam in reviews. To the best of our knowledge, there is still no published study on this topic, although Web spam and email spam have been investigated extensively. We will see that opinion spam is quite different from Web spam and email spam, and thus requires different detection techniques. Based on the analysis of 5.8 million reviews and 2.14 million reviewers from amazon.com, we show that opinion spam in reviews is widespread. This paper analyzes such spam activities and presents some novel techniques to detect them

...read moreread less

1,731 citations

Book Chapter•10.1016/B978-012088469-8.50052-8•

Combating web spam with trustrank

[...]

Zoltan Gyongyi¹, Hector Garcia-Molina¹, Jan Pedersen²•Institutions (2)

Stanford University¹, Yahoo!²

31 Aug 2004

TL;DR: This paper proposes techniques to semi-automatically separate reputable, good pages from spam, and shows that they can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

...read moreread less

Abstract: Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites.

...read moreread less

1,322 citations

Journal Article•10.1016/J.COMNET.2012.10.007•

Reprint of: The anatomy of a large-scale hypertextual web search engine

[...]

Sergey Brin¹, Lawrence Page¹•Institutions (1)

Stanford University¹

01 Dec 2012-Computer Networks

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext, and looks at the problem of how to effectively deal with uncontrolled hypertext collections.

...read moreread less

970 citations

...

Expand

Performance Metrics

1,470

Papers

12,770

Citations

No. of papers in the topic in previous years
Year	Papers
2025	3
2024	3
2023	20
2022	23
2021	13
2020	10

Spamdexing

Topic Tools

Papers published on a yearly basis

Papers

The anatomy of a large-scale hypertextual Web search engine

The PageRank Citation Ranking : Bringing Order to the Web

Opinion spam and analysis

Combating web spam with trustrank

Reprint of: The anatomy of a large-scale hypertextual web search engine

Related Topics (5)

Performance Metrics