This paper first proposes the problem of spam pages in link analysis and formalizes it; then introduces the idea of selecting seed page sets from two perspectives; then proposes a spam page detection algorithm based on the improvement of the existing PageRank algorithm; at the same time, several performance indicators that characterize the efficiency of the detection algorithm are given, and finally briefly describes the solution to combat web spam pages based on the trust index. With the rapid development of Internet technology, various information on the Internet has grown exponentially, and the useful information people need is only a very small part of it, which makes the research and design of high-performance search engines very necessary. In the past, search engines generally adopted a flat search method, that is, a simple keyword search; currently, many engines adopt a hierarchical search method, such as the authoritative search engine Google uses page ranking (PageRank) to feature web pages, and then sorts the pages according to the page ranking. In this way, the higher the ranking, the higher the possibility of being retrieved and the higher the ranking in the search results. However, some web pages (such as spam pages, which can be defined as a collection of pages that abuse hyperlinks to mislead search engines) use a variety of methods to improve their own page rankings to increase the possibility of being retrieved. The most typical methods are as follows:
You Might Like
Recommended ContentMore
Open source project More
Popular Components
Searched by Users
Just Take a LookMore
Trending Downloads
Trending ArticlesMore