"This discovery is significant because there is no central authority or process governing the formation and structure of links on the web'' said Dr. Gary Flake of NEC Research Institute, the study's lead author. Individual links on the web are created by millions of different individuals, operating independently, and having different backgrounds, knowledge, goals, and cultures. While previous studies have covered properties of the web graph such as the diameter (Nature, 401, p. 130) and the link distribution (Science, 286, p. 509), this discovery is the first to bind the link structure and text content of the web. An article detailing the discovery by Dr. Flake and co-authors Dr. Steve Lawrence, Dr. C. Lee Giles, and Dr. Frans Coetzee, will appear in IEEE Computer, Volume 35, Number 3, which will be available on March 6, 2002. IEEE Computer is the flagship journal of the IEEE Computer Society, the world's oldest and largest professional society in computing.
The NEC researchers define a web community as a collection of web pages that have more links within the community than outside of the community. This definition can be generalized to identify communities with varying levels of cohesiveness. These communities are self-organized in that the entire web graph determines membership.
The researchers show how the problem of identifying these communities can be efficiently solved by recasting it into a maximum flow framework, and present examples for the identification of communities centered around well-known scientists (Francis Crick, Steven Hawking, and Ronald Rivest). Analysis of the content of the communities shows that the member pages are highly relevant to the initial seed pages and topically related in nontrivial ways. For example, in the Crick community the scientists found references to Rosalind Franklin and other early pioneers in genetics.
Practical applications of the discovery include the creation of improved search engines, the automatic creation of web directories, and content filtering. However, the discovery also opens up the possibility of objective and rigorous analysis of the entire web.
As an increasing percentage of human knowledge and communication goes online, the potential for the analysis of interests and relationships within science and society are great. NEC's discovery allows analysis of communities on the web independent of, and unbiased by, the specific words used on the individual web pages. For example, the relationships within and between countries, scientific disciplines, or other topics of interest can be analyzed while taking into account issues such as the "digital divide''. Such analysis could provide insight into the organization and interests of sectors of society, and may help shape policy, improve the allocation of resources, and generally improve our understanding of the world.
More information and figures can be found at
About NEC Research Institute
NEC Research Institute, founded in 1988 and based in Princeton, conducts basic research in the areas of computer and physical sciences. Its major research elements include Web computing; robust computing; intelligence; vision and language; devices; materials; optics; nano physics; biophysics, theoretical computer sciences and physics. For more information about the Institute, please visit its Web site at www.neci.nec.com.
About NEC Corporation
NEC Corporation (NASDAQ: NIPNY - news; FTSE: 6701q.l) is a leading provider of Internet solutions, dedicated to meeting the specialized needs of its customers in the key computer, network and electron device fields through its three market-focused in-house companies: NEC Solutions, NEC Networks and NEC Electron Devices. NEC Corporation, with its in-house companies, employs more than 150,000 people worldwide and saw net sales of 4,991 billion Yen (approx. US$48 billion) in fiscal year 1999-2000. For further information, please visit the NEC home page at: www.nec-global.com.