Collaborative Synergies: Moving Big Data Forward

Author(s): Eric Nelson
Photographs by: Mark Simons

Sunil Prabhakar and Jennifer Neville

Sunil Prabhakar (left), head of the Department of Computer Science, and Prof. Jennifer Neville are leading the computer science expansion as part of the Purdue Moves campaign.

Since the first computer science department in the Northern Hemisphere was formed at Purdue in 1962 as an outgrowth of the Division of Mathematical Sciences along with the Departments of Mathematics and Statistics, the field has grown almost exponentially — along with its impact on our lives.

"Computers touch our lives from the moment we wake up to the moment we go to sleep, from interacting with others through email or social media, to how we do our jobs, get news and information, shop and find entertainment,” says Purdue computer science professor and department head Sunil Prabhakar.

At the same time, Prabhakar adds, advances in computing, communications and information technology have accelerated the pace of discovery and innovation and spurred an increasing demand across all industries and sectors for graduates with the ability to analyze and work with very large sets of data.

So, it should come as no surprise that an expansion of Purdue’s computer science program is a key part of Purdue Moves, a series of strategic initiatives announced by President Mitch Daniels in September that is built around three broad, crossdisciplinary categories: science, technology, engineering and math (STEM) leadership; world-changing research; and transformative education.

For the Department of Computer Science, the expansion equates to a 27 percent increase in undergraduate and graduate student enrollment capacity as well as additional support for faculty and staff hiring and for strategic research programs, Prabhakar says.

“We have entered an age of big data, where every day massive amounts of digital data are produced that could provide valuable insights for science, agriculture, business and government,” he says.

“Without more computer scientists and greater advances in the tools we use to derive meaning from these massive data sets, economic development and research will slow dramatically. Purdue’s computer science program is the oldest in the nation and this is yet another opportunity for it to serve as a national model as we prepare more students to design and use technology that will inform policy and drive decision making on the local, national and global levels.”

New ideas, new opportunities

Prabhakar and Jennifer Neville, associate professor of computer science and statistics, say the expansion is reflective of the new face of computer science as a discipline and builds on the department’s established strengths in
areas important to big data, including systems, databases, data mining/machine learning, security and algorithms. The duo discussed the initiative in November during a presentation at the President’s Forum and again in February at a “back-to-class” session for alumni at the President’s Council annual Mollenkopf/Keyes Weekend in Naples, Fla.

“The world is just starting to realize what statisticians and computer scientists have known for years — data is power,” Neville says. “In the era of big data, where data is being collected at unprecedented rates, computer scientists have a unique set of abilities needed to harness this power.

“That is why their computational and engineering skills are, and will continue to be, in such high demand. This demand is even greater for students with training in data mining and machine learning, since they can take an algorithmic approach to statistical modeling of data.”

Employment of computer and information research scientists is expected to grow by 19 percent from 2010 to 2020, according to the U.S. Department of Labor. Prabhakar says that prediction is especially significant given that these workers are already in demand.

“Companies are clamoring to hire computer science students,” he says. “Even through the worst part of the recession, our students have enjoyed a nearly 100 percent placement rate that usually includes multiple offers upon graduation. And with the exception of pharmacy grads, who have a professional degree, they also earn the highest starting salaries of any
program at Purdue.

“Student enrollment in computer science is robust and growing, as is the need for their skills. This demand is only going to increase as organizations realize the value of the data they hold and need employees who can extract that information and knowledge to improve their business. Our number of corporate partners has doubled in the last four years, and they are very diverse, from world leaders in computing like Intel and IBM to big companies like GE and Boeing for whom computing is essential to their business.

“We also work with a lot of startup companies in Silicon Valley like Twitter and Groupon, as well as Indiana-based startups like ExactTarget. And then there are companies like Walmart and State Farm that people don’t usually associate with computer careers but are now hiring computer science grads because of issues related to big data, from supply-chain logistics to information security.”

Likewise, the initiative includes plans to create special programs in the area of big data not only for computer science, but also in collaboration with the Department of Statistics and the Krannert School of Management. Joint master’s degree programs and undergraduate tracks with other Purdue colleges and schools are being considered as well.

“Big data is transforming all disciplines and the increasing demand for these skills is evidence of the more central role computer science is playing in business, research and education,” he says. “Being able to understand and use the powerful tools provided by computer science to evaluate data will be a vital part of top-level positions across a broad range of fields in the future.”

Clusters of collaboration

The expansion will build on the Department of Computer Science and Purdue’s collaborative research strengths and centers including the Center for Education and Research in Information Assurance and Security, or CERIAS, the world’s largest multidisciplinary academic center addressing information security and privacy, and the Cyber Center, which is focused on cyberinfrastructure and creating systems and tools to disseminate and preserve scientific and engineering knowledge.

Another burgeoning research partner is the Science of Information Center, Indiana’s first National Science Foundation Science and Technology Center, for which Purdue was awarded $25 million to develop a new understanding of the representation, communication and processing of information in biological, social and engineered systems.

“The goal is to advance the foundations of information theory,” Prabhakar says. “The center aims to understand not just how information is measured, but how information is represented, communicated and processed in a broad range of systems. How does information flow in biological systems? How does information flow in social networks? What is its value in an economic network?”

And there is no shortage of people asking such questions, Neville adds: “Almost every academic department at Purdue has someone who is collecting enough data that they would be interested in using these large-scale, analytic methods. In terms of collaboration, I’m almost overwhelmed by the number of people outside of computer science who want to work with us on these initiatives, from civil engineering, forestry and psychology to communication, physics and political science.”

Importantly, the plan also includes hiring additional faculty and staff in order to continue to deliver the high level of education and research for which the department is known and to expand its leadership in the area of computing.

“We don’t want to compromise the program’s quality by increasing enrollment without a commensurate increase in the number of faculty and staff, which will also help us build a richer set of course offerings,” Prabhakar says.

Prabhakar, Neville and other members of the department’s hiring committee are already evaluating prospective candidates for five faculty openings in computer science to be filled by fall 2014, either as part of the expansion or to fill vacated positions.

And they are part of a larger search team for “cluster hires” in big data that will include several joint faculty positions for computer science with other Purdue departments, colleges and schools, including agriculture, electrical and computer engineering, civil engineering, physics and library science.

Additional joint faculty positions are available in enabling technologies and in domain sciences that deal with the use and management of digital data, such as next-generation manufacturing, predictive modeling and systems biology.

“Purdue in general is very good at encouraging interdisciplinary research connections,” Neville says, “and this expansion will further grow and sustain those connections.”

Meeting the challenge

Another goal is to increase the computer science department’s involvement in social-impact issues, which dovetails with an earlier announced Purdue Moves initiative to strengthen the University’s leadership in developing new and novel ways to help feed a rapidly growing world population.

A “cybersustainability” program is being developed in collaboration with Purdue’s College of Agriculture to help analyze and process massive amounts of agriculture- relevant data and reveal valuable insights to improve farming practices.

“Purdue is uniquely positioned to become the pre-eminent leader in cybersustainability in agriculture. There is tremendous untapped potential in the data that is being collected,” Prabhakar says. “The tools created through computer science are what will allow us to leverage that data to make the most data-driven and informed decisions, regardless of the application.”

As an example, Neville points to the evolution of the modern “smart” tractor, which can now possess as much computing power as the first space shuttle.

“There are sensors on today’s tractors that measure everything from temperature and moisture to the chemical content in soil,” she says. “So much data is being collected that there are now numerous technical issues affecting how that data can be used most effectively to improve the decisionmaking process.

“Part of our larger research goal is to develop algorithms and computer systems that can automatically learn patterns from large amounts of data, or from interactions with systems that collect data, such as the computerized sensors used on tractors. We could analyze the data gathered from planting, for example, and compare it with the data gathered during harvest, as well as climate information, to help improve farming practices and yield.”

There also are numerous variables in the plan to to finance the expansion, Prabhakar notes. The University’s commitment to fund new and shared faculty positions will help drive initial enrollment increases, but additional support in the form of endowed professorships and scholarships and continued growth in corporate partnerships and research funding is necessary, too, as is an eventual facilities expansion.

“We are competing for the top faculty and the top students in the field,” he says. “Without strong external support from our alumni and other University stakeholders, it won’t be possible.”

Jerone Deverman, a 1995 College of Science Distinguished Alumnus who earned three degrees from Purdue (BS ’60, Mathematics; MS ’62, Mathematics; PhD ’69, Statistics) and attended Prabhakar and Neville’s “back- to-class” session in Naples, wholeheartedly agrees.

“Expansion occurs via students and faculty,” he says. “The catalyst, indeed, the grease for all of us Boilermakers — is simply money. Alumni can help with the heavy work of fundraising. It’s necessary. Alumni can also help in faculty acquisition and encouraging excellent students to consider a degree program within computer science at Purdue.”

Deverman, the founder and principal consultant of Medical Data Systems, has more than 30 years’ experience in computer systems, data processing, systems and statistical analysis, mathematical modeling and computer simulations, operational testing and evaluation, and large data systems applications. He knows better than most how daunting the challenge really is.

“If one looks at the relatively short history of computer science, one has to conclude that it is still young, still immature,” he says. “The box doesn’t yet exist with respect to which one may want to think outside of, and it’s hard to state with any certainty where the next frontier might be.

“In this environment — indeed, in the earliest years of this environment — Purdue computer science established a leadership role among all top programs in the world. We must now aggressively work to maintain and strengthen that position. Purdue has accomplished much in computer science; however, there is clearly much more to be done.”