Over the past 10 months, Google search has dramatically increased the number of sites around the world from which it serves client queries, repurposing existing infrastructure to change the physical way that Google processes web searches, according to a new study from USC.
From October 2012 to late July 2013, the number of locations serving Google's search infrastructure increased from from a little less than 200 to a little more than 1400, and the number of ISPs grew from just over 100 to more than 850, according to the study.
Most of this expansion reflects Google utilizing client networks (such as Time Warner Cable, for example) that it already relied on for hosting content like videos on YouTube, and reusing them to relay--and speed up--user requests and responses for search and ads.
"Google already delivered YouTube videos from within these client networks," said USC PhD student Matt Calder, lead author of the study. "But they've abruptly expanded the way they use the networks, turning their content-hosting infrastructure into a search infrastructure as well."
Previously, if you submitted a search request to Google, your request would go directly to a Google data center.
Now, your search request will first go to the regional network, which relays it to the Google data center. While this might seem like it would make the search take longer by adding in another step, the process actually speeds up searches.
Data connections typically need to "warm up" to get to their top speed - the continuous connection between the client network and the Google data center eliminates some of that warming up lag time. In addition, content is split up into tiny packets to be sent over the Internet - and some of the delay that you may experience is due to the occasional loss of some of those packets. By designating the client network as a middleman, lost packets can be spotted and replaced much more quickly.
A technical report on the study will be presented at the SIGCOMM Internet Measurement Conference in Spain on October 24. Calder worked with Ramesh Govindan and Ethan Katz-Bassett of USC Viterbi, as well as John Heidemann, Xun Fan, and Zi Hu of USC Vierbi's Information Sciences Institute.
The team developed a new method of tracking down and mapping servers, identifying both when they are in the same datacenter and estimating where that datacenter is. They also identify the relationships between servers and clients, and just happened to be using it when Google made its move.
"Delayed web responses lead to decreased user engagement, fewer searches, and lost revenue," said Katz-Bassett, assistant professor at USC Viterbi. "Google's rapid expansion tackles major causes of slow transfers head-on."
The strategy seems to have benefits for webusers, ISPs and Google, according to the team. Users have a better web browsing experience, ISPs lower their operational costs by keeping more traffic local, and Google is able to deliver its content to webusers quicker.
Xun Fan, graduate student at USC Viterbi, noted that the team had not originally set out to document this growth.
"We had developed techniques to locate the servers, without requiring access to the users they serve, and it just so happened we exposed this rapid expansion," Fan said.
Next, the team will attempt to quantify exactly what the performance gains are for using this strategy, and will try to identify under-served regions.
This research was funded by the National Science Foundation (grant number CNS-905596) the U.S. Department of Homeland Security Science and Technology Directorate, Cyber Security Division, via SPAWAR Systems Center Pacific (under Contract No. N66001-13-C-3001) and by DHS BAA 11-01-RIKA and Air Force Research Laboratory, Information Directorate (under agreement number FA8750-12-2-0344).