Sci vs. Sci: Attack Vectors for Black-hat Data Scientists, and Possible Countermeasures

Average rating: ****.
(4.25, 4 ratings)

If you’re an evil genius with a yen for data science, what are your possible attack vectors?

If you’re a good guy and want to protect your data from unscrupulous competitors, what are your counter-attacks? How effective are they?

This talk will focus on data science attack vectors that can be exploited for commercial, not military, gain. We’ll look at black-hat internet marketing techniques, and then explain how data science can be used to scale and refine these black-hat techniques.

These techniques include:

  • Spam, article spinning, and content generation.
  • Unethical scraping and spidering.
  • CATPCHA breaking, voting rings, and sockpuppetry.

How can these techniques be scaled and refined using data science? Which of these black hat techniques are effective for nefarious large entities, and which of them rely upon being too small to notice (security through obscurity)? How can you prevent against your competitors using these techniques?

We’ll conclude with speculation on how grey-markets might emerge that all black hats to launder and resell their wares.

Photo of Joseph Turian

Joseph Turian


Joseph Turian, Ph.D., heads MetaOp­ti­mize LLC, which consults on predictive analytics, business intelligence, NLP, ML, and data strat­egy. He also run the MetaOp­ti­mize Q&A site, where Machine Learning and Natural Language Processing experts share their knowledge. He specializes in large data sets.

Joseph Turian holds a Ph.D. in computer science (with a focus on Machine Learning and Natural Language Processing) from New York University since 2007. During his graduate studies, he developed a fast, large-scale machine learning method for parsing natural language. He received his AB from Harvard University in 2001.

As a scientist, Joseph Turian has over 14 refereed publications in top NLP + ML conferences. His team submitted the best parser in EVALITA 2009 Main+Pilot tasks. He is an advocate for open-notebook science, releasing his research code on his github, and for broader scientific collaboration through the internet.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts