Digital forensics is very important when issues about authors of documents arise, such as their identity and characteristics (age, gender) and the ability to associate them with unknown documents. Machine learning approaches to source code authorship identification attempt to identify the most likely author of a piece of code by analyzing various characteristics from source code.
There are many situations in which police or security agencies are concerned about the ownership of software, for example, to identify who wrote a malicious piece of code. However, machine learning models are often susceptible to adversarial deception of their input at test time, which leads to poorer performance. Recent studies in adversarial machine learning showed that adversarial examples can easily attack image classification, speech recognition, and reinforcement learning.
Alina Matyukhina investigates the feasibility of deception in source code attribution techniques in real-world environments, which contain adversaries and dishonest users. Alina shows that even a sensible transformation of an author’s coding style successfully decreases the performance of source code authorship attribution systems. Alina also explores practical attacks on current attribution systems: author imitation and author hiding. The first attack, which can be applied on user identity in open source projects, transforms the attacker’s source code to a version that mimics the victim’s coding style while retaining functionality of original code. This is particularly concerning for open source contributors who are unaware of the fact that by contributing to open source projects they reveal identifiable information that can be used to their disadvantage. For example, by imitating someone’s coding style it is possible to implicate any software developer in wrongdoing. Alina then discuss multiple approaches to resist these attacks, including hiding the coding style of the software author before contributing to open source projects.
This work was conducted in collaboration with Natalia Stakhanova, Mila Dalla Preda, and Celine Perley.
Alina Matyukhina is a cybersecurity researcher and PhD candidate at the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick. Her research focuses on applying machine learning, computational intelligence, and data analysis techniques to design innovative security solutions. Previously, she was a research assistant at the Swiss Federal Institute of Technology, where she took part in cryptography and security research projects. Alina is a member of the Association for Computing Machinery and the IEEE Computer Society. She has spoken at several security and software engineering conferences, including HackFest, IdentityNorth, ISACA Security & Risk, Droidcon SF, and PyCon Canada.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org