Even though we know that there are more data scientists in the workforce today, neither what those data scientists actually do nor what we even mean by data scientists has been studied quantitatively. Miryung Kim and Muhammad Gulzar share the results of a large-scale survey with 793 professional data scientists at Microsoft. The study looked at data scientists’ educational background, problem topics that they work on, tools they use, and activities. From the gathered data, they clustered the data scientists based on the time they spent on various activities and identified nine distinct clusters of data scientists and their corresponding characteristics.
Drawing on this data, Miryung and Muhammad detail several trends about data scientists in the software engineering context, discuss the challenges they face, and best practices. They conclude by exploring potential software tools for improving data scientist productivity and efficiency in the area of big data intelligence.
Miryung Kim is an associate professor in the Department of Computer Science at UCLA as well as the cofounder of MK.Collective. Miryung builds automated software tools, such as debuggers, testing tools, refactoring engines, and code analytics, for improving data scientist productivity and efficiency in developing big data analytics. She also conducts empirical studies of professional software engineers and data scientists in the wild and uses the resulting insights to design novel software engineering tools. Previously, she was an assistant professor in the Department of Electrical and Computer Engineering at the University of Texas at Austin and a visiting researcher at the Research in Software Engineering (RiSE) group at Microsoft Research. Miryung’s honors include an NSF CAREER award, a Microsoft Software Engineering Innovation Foundation Award, an IBM Jazz Innovation Award, a Google Faculty Research Award, an Okawa Foundation Research Grant Award, and an ACM SIGSOFT Distinguished Paper Award. She also received the Korean Ministry of Education, Science, and Technology Award, the highest honor given to an undergraduate student in Korea. Miryung holds a BS in computer science from the Korea Advanced Institute of Science and Technology and an MS and PhD in computer science and engineering from the University of Washington.
Muhammad Gulzar is a PhD candidate in the Computer Science Department at the University of California, Los Angeles, where he is advised by Miryung Kim. Muhammad’s research interests lie at the intersection of software engineering and big data systems—specifically, in supporting interactive debugging in big data processing frameworks and providing efficient ways to perform automated fault localization in big data applications. He holds an undergraduate degree in computer science from Lahore University of Management Sciences (LUMS) SBASSE in Pakistan, where he was mentored by Fareed Zaffar.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org