Before such devices were common, just ten to 15 years ago, a chemist would have analysed hundreds of compounds annually. Now, he or she may be faced with data on tens of millions of compounds a year. "At GSK, we're trying to look for opportunities to deliver drugs to patients more quickly. We've invested in automating the early phase of the drug discovery process, but this has presented us with lots of informatics challenges," says Stephen Calvert, GSK Vice-President for Cheminformatics.
The new automated processes can be changed quickly in response to changes in scientific understanding, but the supporting software, built using traditional approaches, cannot be changed on the same timescale. "The IT has become the bottleneck in evolving the science," says Stephen. So GSK decided to look for IT technologies that chemists, rather than computer scientists, could use to retrieve and analyse data rapidly. "We wanted to match the cycle of change and hand control of the decision-making process back to the scientists," he says.
InforSense KDE, an output of Discovery Net, met these two challenges. Discovery Net is a pilot project funded by the Engineering and Physical Sciences Research Council's e-Science Core Programme. "KDE is different from traditional technologies because you can build it as you go. The scientists now have an environment that they can modify themselves. We can build a new utility and plug it in to KDE without having to test the whole application. This means we can turn something around in a few weeks instead of three or more months. It is starting to make a significant difference to our ability to respond to the needs of scientists," he says.
GSK has built a proof of concept for KDE in the area of 'library design', that is the process of selecting just a few hundred molecules from the universe of 1060 possible molecules. "The process is a bit like doodling," says Stephen Calvert. "You want to doodle in the universe of molecules to find the best ones to make." Now the company is exploring opportunities for using KDE in other areas of its business including screening molecules for particular activity and genetic research.
KDE has been developed for the commercial market by InforSense, a spin-out company from the Department of Computer Science at Imperial College, London. Designed to require little IT knowledge on the researcher's part, it "uses IT to liberate scientists from IT," according to Professor Yike Guo, Discovery Net principal investigator at Imperial College.
Via a variety of possible interfaces, including a web portal, KDE allows the researcher to build up or modify complex analytic workflows that, for example, compare new data with data stored in heterogeneous, distributed databases, access software for different types of data analysis, and perform visualisations. Workflows can be stored and audited for re-use by the originator or others via web services, portlets or other visual desktop applications.
By using grid technology, the researcher has access to data, software and other services held remotely and can also share his or her resources with others elsewhere. "Before Discovery Net, you would have to move the output from one analysis to the next by saving it in a file and moving it to another machine. Now, you can run analyses with complex analytic workflows that incorporate services running elsewhere. Using grid technologies, data mining workflows can be distributed all over the world," says Professor Guo.
"Before, chemists would have had to learn five or six applications to achieve the same end point. Now, they've got a single environment in which to do it. This offers a huge improvement," says Stephen Calvert.
Stephen Calvert, Vice-President for Cheminformatics, GSK, tel. 01276622715 e-mail: Stephen.H.Calvert@gsk.com
Professor Yike Guo, Imperial College, London tel. 07900241068 e-mail: firstname.lastname@example.org
Judy Redfearn, e-Science/Research communications officer, JISC/e-Science Core Programme tel. 07768 356309 e-mail: email@example.com
EPSRC e-Science programme http://www.
Discovery Net http://www.
UK e-Science Programme www.rcuk.ac.uk/escience
Notes for editors
1. e-Science is the very large scale science that can be carried out by pooling access to very large digital data collections, very large scale computing resources and high performance visualisation held at different sites.
2. A computing grid refers to geographically dispersed computing resources that are linked together by software known as middleware so that the resources can be shared. The vision is to provide computing resources to the consumer in a similar way to the electric power grid. The consumer can access electric or computing power without knowing which power station or computer it is coming from.
3. The UK e-Science Programme is a coordinated £230M initiative involving all the Research Councils and the Department of Trade and Industry. It has also leveraged industrial investment of £30M. The Engineering and Physical Sciences Research Council manages the e-Science Core Programme, which is developing generic technologies, on behalf of all the Research Councils.
4. The UK e-Science Programme as a whole is fostering the development of IT and grid technologies to enable new ways of doing faster, better or different research, with the aim of establishing a sustainable, national e-infrastructure for research and innovation. Further information at www.rcuk.ac.uk/escience.