Colleges and universities across the US are creating data science programs to train future professionals to manage the massive amounts of digital data created by a range of sources - from web traffic to digital cameras. This data analysis frequently requires large-scale cyberinfrastructure - advanced computing systems that can deal with terabytes or even petabytes of data. However, few programs teach students how to use such resources effectively.
A new, three-year, $600,000 grant from the National Science Foundation's (NSF) Education and Human Resources directorate to the Texas Advanced Computing Center (TACC) and the University of Louisville (UofL) will support the development of training, tools, and a cloud-based virtual environment to teach data science at the largest scales and provide computational resources for education. The grant is part of NSF's "Improving Undergraduate STEM Education" (IUSE) program.
"The fast pace of technology and software developments makes keeping up with knowledge about big data analytics a challenge not only for the students but also for the educators," said Weijia Xu, a research scientist at TACC and the principal investigator on the project. "TACC and the University of Louisville, both leaders in big data and cloud computing, are uniquely positioned to develop tools to help train students and teachers nationwide."
The grant will allow the team from TACC (Xu, along with Ruizhu Huang, and Rosalia Gomez) and UofL (led by Hui Zhang) to create lightweight tools, training modules and exercises focusing on useful, open-source software for data science including R, Hadoop, Spark, and TensorFlow.
"The project will deliver a full set of interactive documents and video tutorials on using and configuring the platform," said Huang. "The educational activities will use graphical, interactive, simulation-based, and experiential learning components to teach data science concepts and computing skills, accessed through the cloud-based platform. The project aims to help students develop critical workforce skills in data science."
The training will cover both data analytics and machine learning and will introduce students and educators to emerging technologies, such as containers -- a form of virtualization that allows data scientists to work in reproducible environments of their choosing and design.
The training and tools will be available for use on existing campus computing infrastructure and also can leverage resources available at TACC, which has some of the most powerful advanced computing systems in the world.
Students and professors will access these learning tools through a cloud-based virtual environment that TACC and UofL will develop. The project will complement existing curriculum in data science and will enhance the learning experience for students regardless of whether they are at a top data science program or a small minority-serving institution. The materials will be designed for both in-person instruction and for remote, online use.
The project will train diverse students in this critical area. The University of Texas at Austin, where TACC is based, is one of nation's top 10 universities in terms of the number of Hispanic undergraduate degrees awarded, while UofL was ranked by U.S. News and World Reports as one of the best schools for African-American students outside historically black colleges and universities.
Education and outreach specialists at TACC will partner with K-12 STEM programs to take advantage of the cloud-based virtual environment to reach students as early as possible. TACC staff plan to create an activity using the cloud-based virtual environment that targets the approximately 200 underrepresented high school students who participate in the CODE @ TACC summer programs each year.
The research team will also collaborate with Campus Champions from the Extreme Science and Engineering Discovery Environment (XSEDE), who serve as local experts on campuses nationwide, to disseminate training opportunities. Their presence at two-year and four-year institutions will ensure rich diversity among students.
Said Rosalia Gomez, TACC Education & Outreach Manager: "To address the high-demand of advanced computing resources that are currently limited in classrooms across the country, this award will help us develop curriculum and learning frameworks and provide campus-wide access to resources that will impact students of diverse backgrounds."