Modern data science is the creative application of scientific principles to design new tools and processes in areas where a scientific approach has been previously infeasible due to the difficulty or expense of collecting data. That’s a mouthful, but if you see data science that way, we’re likely just at the beginning. The people and things that are starting to be equipped with sensors will create data that will enable entirely new classes of problems to be approached more scientifically.
Mike Stringer looks into the future and outlines some of the issues related to data science that may arise for business, for data scientists, and for society.
If you’ve got a problem that you suspect can be approached scientifically, off-the-shelf solutions probably don’t exist yet. Depending on your problem and how common it is, they may never exist. Data scientist isn’t a meaningful title that most will put on their résumé—your future data scientists are more likely to be physicists, chemical engineers, statisticians, or economists. The most important characteristic of a data scientist is curiosity and an aptitude for the creative and rigorous application of scientific principles to new problems. First, ask yourself if there are any aspects of the new connected world that are generating data that may be relevant to your pressing problems. If the answer is yes, ask yourself whether the benefit of developing a scientific approach to solve it is worth the cost. If so, you need data science.
For data scientists
Data scientists must be comfortable identifying new problems where the benefit of a data-driven approach exceeds the cost. Much of this explosion of data is nonexperimental and requires data scientists to tread extremely carefully to avoid incorrect conclusions. You also have to be able to build your own tools. In today’s climate, this means you need to be able to write code.
Mike Stringer is cofounder and managing partner of consulting and design firm Datascope Analytics, where he has led or contributed to projects across a variety of industries for clients including Procter & Gamble and Thomson Reuters. Mike is passionate about realizing the potential for data to be used as a resource to make a positive impact on business and society. He also enjoys decidedly non-data-oriented activities, including exploring the amazing food in Chicago, playing and listening to music, and generally making things from scratch. Mike holds a BS in engineering physics from the University of Colorado and a PhD in physics from Northwestern University.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.