Put AI to Work
April 15-18, 2019
New York, NY
Please log in

Adversarial machine learning in digital forensics

Alina Matyukhina (Canadian Institute for Cybersecurity)
2:40pm3:20pm Thursday, April 18, 2019
Case Studies, Machine Learning
Location: Sutton South
Secondary topics:  AI case studies, Computer Vision, Ethics, Privacy, and Security, Models and Methods, Reinforcement Learning
Average rating: ****.
(4.00, 2 ratings)

Who is this presentation for?

  • Machine learning engineers, open source developers, adversarial machine learning researchers, and security practitioners



Prerequisite knowledge

  • Programming experience

What you'll learn

  • Learn about adversarial machine learning, digital forensics, and software authorship attribution techniques
  • Discover how an attacker can mimic the coding style of a software developer in open source projects using machine learning techniques and learn how to protect yourself from them


Digital forensics is very important when issues about authors of documents arise, such as their identity and characteristics (age, gender) and the ability to associate them with unknown documents. Machine learning approaches to source code authorship identification attempt to identify the most likely author of a piece of code by analyzing various characteristics from source code.

There are many situations in which police or security agencies are concerned about the ownership of software, for example, to identify who wrote a malicious piece of code. However, machine learning models are often susceptible to adversarial deception of their input at test time, which leads to poorer performance. Recent studies in adversarial machine learning showed that adversarial examples can easily attack image classification, speech recognition, and reinforcement learning.

Alina Matyukhina investigates the feasibility of deception in source code attribution techniques in real-world environments, which contain adversaries and dishonest users. Alina shows that even a sensible transformation of an author’s coding style successfully decreases the performance of source code authorship attribution systems. Alina also explores practical attacks on current attribution systems: author imitation and author hiding. The first attack, which can be applied on user identity in open source projects, transforms the attacker’s source code to a version that mimics the victim’s coding style while retaining functionality of original code. This is particularly concerning for open source contributors who are unaware of the fact that by contributing to open source projects they reveal identifiable information that can be used to their disadvantage. For example, by imitating someone’s coding style it is possible to implicate any software developer in wrongdoing. Alina then discuss multiple approaches to resist these attacks, including hiding the coding style of the software author before contributing to open source projects.

This work was conducted in collaboration with Natalia Stakhanova, Mila Dalla Preda, and Celine Perley.

Photo of Alina Matyukhina

Alina Matyukhina

Canadian Institute for Cybersecurity

Alina Matyukhina is a cybersecurity researcher and PhD candidate at the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick. Her research focuses on applying machine learning, computational intelligence, and data analysis techniques to design innovative security solutions. Previously, she was a research assistant at the Swiss Federal Institute of Technology, where she took part in cryptography and security research projects. Alina is a member of the Association for Computing Machinery and the IEEE Computer Society. She has spoken at several security and software engineering conferences, including HackFest, IdentityNorth, ISACA Security & Risk, Droidcon SF, and PyCon Canada.