Search rank fraud

The web is full of web pages that exist only to mislead search engines into misleading users to visit certain web sites. Users of search engines tend to look only at the first page of search results (for 85% of the queries only the first result screen is requested). Inclusion in the first result screen, usually showing the top 10 results, can give an increase in traffic to an object, while exclusion means that only a small fraction of the users will actually see a link to the object. Product developers and commercially-oriented sites, whose income depends on receiving traffic, have a high incentive to be ranked within the first page for a query relevant to the content of the site. Deliberately manipulating placement by text-based, link-based, cloaking and doorway pages, or combinations thereof have become common practice (and commercially available services).

  • It can cause an inflated corpus, which in turn causes cost per query to increase.
  • Spamming has become so common that search engines have developed (heuristic and ML) ways to identify and remove spam. Search engines do not publish their anti-spam techniques to avoid helping spammers to circumvent them.
  • Trends indicate that the use and variety of spam will continue to increase.
  • TrustRank is a link analysis technique used for semi-automatic separation of useful webpages from spam.
  • There are challenging research issues involved in both detecting spam and in developing spam-resistant ranking algorithms.

