Supercomputers keep growing ever faster, racing along at the blazing speed of nearly one petaflops - 10 to the fifteenth, or one thousand trillion calculations per second - equivalent to around 250 thousand of today's laptops. In contrast, the experience of a computational scientist can be anything but fast -- waiting hours or days in a queue for a job to run and yield precious results needed for further steps. The unpredictably of queues can impede the course of research, slowing progress with unexpected periods of waiting.
To address this problem, the San Diego Supercomputer Center (SDSC) at UC San Diego has released version 1.0 of a new User Portal, featuring an innovative user-settable reservation system that gives researchers more control over when their jobs will run on the center's supercomputers. The service, not previously offered in high performance computing centers, is debuting on SDSC's DataStar and TeraGrid Cluster systems.
"We've had a lot of feedback in user surveys asking for faster turnaround time," said Anke Kamrath, director of User Services at SDSC. "While we couldn't eliminate the queue, especially on popular machines like DataStar, we realized that a service that lets users themselves schedule 'windows' of reserved time would let them complete jobs more reliably and get more done."
The reservation system can make computing more efficient in various situations. For example, a user with a large allocation may start a full machine job that will run for a day, only to find a minor problem causes it to quickly fail. Instead of being able to simply fix the problem and restart, the user is faced with going to the end of the queue and again waiting hours or days for the job to run. With SDSC's new User Portal, this user can now easily set a reservation for a full-machine job, ensuring that they can complete the job in a timely way, even if minor problems occur.
Another research group may be debugging a new code. To do this they need to run many short jobs in succession, working as a team to troubleshoot the results of each run, and then trying again. But each time they want to restart the code, they have to sit in the queue, potentially wasting many hours as the group awaits the results of each run. Using the reservations feature in the portal, the researchers can now schedule several hours of machine time for multiple debugging runs, making efficient use of the team's time.
Other researchers may need to be sure they run in conjunction with a scheduled event such as observing time on an electron microscope or other instrument. Efforts are also underway to use this capability to support the co-scheduling of jobs to run across TeraGrid-wide systems.
"SDSC's User Portal offers a clean interface that shields users from the complexity of the underlying service," said Diana Diehl, who leads SDSC's Documentation and Portals group. "Just like an airline reservation system makes intricate arrangements in a few minutes for travelers at their computers, SDSC's reservation system carries out complicated tasks to arrange the supercomputer reservation, making sure that it follows policies, doesn't disrupt jobs currently in the queue, interfaces with the user's account, and allows time for preventive maintenance."
While users have always been able to reserve time manually, the process can be slow and cumbersome. SDSC's new user-settable system democratizes access to reliable computing, letting any user log in with either their TeraGrid or SDSC account and easily reserve time themselves. Rather than carving up the machine among various pre-selected users, this approach allows users to reserve up to full machine runs, encouraging use of the power of the full supercomputer to advance science into new realms.
The new user-settable system has been carefully designed to provide reservations that are in balance with existing jobs in the queue, and reservations carry a premium cost over jobs run without a reservation.
Based on GridSphere, the portal offers a Web interface to accomplish tasks such as running jobs and moving data that would ordinarily require complex command-line scripts. In the future, more features will be added to the User Portal through portlets such as accessing the SDSC Storage Resource Broker (SRB) data management system, the HPSS archival tape storage system, and visualization tools.
"It was an enormous task to create such a complex system," said Kamrath. "It required teamwork among groups from Documentation to Production Systems across the center, and couldn't have been done without SDSC's large pool of expertise in a number of areas."
The large team required to create the SDSC User Portal and user-settable reservation system includes, in management and development, Anke Karmrath, Diana Diehl, Patricia Kovach, Nancy Wilkins-Diehr, as well as Fariba Fana, Mona Wong, Ken Yoshimoto, Martin Margo, Andy Sanderson, J.D. Bottorf, Bill Link, Doug Weimer, Mahidhar Tatineni, Eva Hocks, Leo Carson, Tiffany Duffield, Krishna Muriki, and Alex Wu; in testing, Subha Sivagnanam, Leon Hu, Cuong Phan, Nicole Wolter, Kyle Rollin, Ella Xiong, Jet Antonio, and Shanil Daya.