Contrary to the global "rich get richer" behavior of the World Wide Web as a whole, in which a relatively small number of popular sites receive a disproportionately large share of inbound links and traffic, a new study has found that smaller communities of websites – for example, all university homepages or all newspaper homepages or all public company homepages – accumulate incoming links in a more evenly balanced way.
Dr. C. Lee Giles, the David Reese Professor, Penn State School of Information Sciences and Technology and professor of computer science and engineering, says, "Previous studies imply a very bleak state of competition on the Web, where the ‘winners', -- the Yahoos and the Amazons, for example, -- are dominant while new entrants simply cannot compete. This highly skewed distribution also leaves the network susceptible to malicious attacks. Our results reveal that, in fact, many real networks are considerably less biased and may be more tolerant to attacks and our growth model explains precisely why."
The study, a joint effort of researchers at NEC Research Institute and Penn State, is detailed in the current (April 16) issue of the Proceedings of the National Academy of Sciences in a paper, "Winners Don't Take All: Characterizing the Competition for Links on the Web." The authors are Dr. David M. Pennock, Dr. Gary W. Flake, Dr. Steve Lawrence, and Dr. Eric Glover, all of NEC, and Giles, who holds a joint appointment at NEC and Penn State. The currently accepted description of the WWW says the distribution of links to and from a web page obeys a power law, a mathematical pattern also obeyed by movie actor collaborations, research paper citations, and the power grid in the western United States, to name a few. Essentially, groups that obey a power law tend to have individuals or nodes to which the other members of the group want to and do all try to link. For example, Mel Gibson can be considered a node in the movie actor collaboration network since other actors want to link with him by appearing in movies with him.
The entire WWW was thought to operate in the same way, with new websites choosing to link primarily to the dominant sites that are already broadly linked. This type of power law behavior would make the WWW particularly vulnerable to attack, since by attacking the relatively few dominant nodes, the rest of the web could be brought down.
However, the new NEC/Penn State study has shown that some collections of web pages of the same type -- for example, all American university homepages or all U.S. newspaper homepages or all public company homepages – do not follow the power law pattern that characterizes the WWW as a whole. The researchers suggest a new simple generative model or pattern that incorporates a mixture of preferential and uniform attachment, quantifies the degree to which the rich nodes grow richer and how new and poorly connected nodes can compete.
Giles say, "The Web actually works better than people thought it did. Company websites, for example, are more likely to connect to sites that are relevant rather than simply to sites that are well linked. This implies that the web's growth pattern is driven by rational process rather than simply by a desire to connect to the dominant sites."
The researchers note that relative to their own web communities, winners don't quite take all. Unpopular sites and mediocre sites attract a considerably higher proportion of links than would be the case under a pure power law distribution. Many web pages can fare well when compared against the competing pages within the same category.
The study was funded by NEC Research Institute.
AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert! system.