Digital forensics becomes very important when issues about authors of documents arise, such as their identity and characteristics (age, gender) and ability to associate them with unknown documents.
Machine learning approaches to source code authorship identification attempt to identify the most likely author of a piece of code by analyzing various characteristics from source code. There are many situations in which police or security agencies are concerned about the ownership of software, for example, to identify who wrote a malicious piece of code.
However, machine learning models are often susceptible to adversarial deception of their input at test time, which is leading to a poorer performance. Recent studies in adversarial machine learning showed that adversarial examples can easily attack image classification, speech recognition, and reinforcement learning.
In this session we will investigate the feasibility of deception in source code attribution techniques in real world environment, which contains adversaries and dishonest users.
In this session, we will show that even a sensible transformation of author’s coding style successfully decrease the performance of source code authorship attribution systems. An important part of this session’s content will include practical attacks on current attribution systems: author imitation and author hiding. The first attack can be applied on user identity in open-source projects. The attack transforms attacker’s source code to a version that mimics the victim’s coding style while retaining functionality of original code. This is particularly concerning for open-source contributors who are unaware of the fact that by contributing to open-source projects they reveal identifiable information that can be used to their disadvantage. For example, one can easily see that by imitating someone’s coding style it is possible to implicate any software developer in wrongdoing. To resist this attack we discuss multiple approaches of hiding a coding style of software author before contribute to open-source.
Alina Matyukhina is a cyber security researcher and 3rd-year PhD candidate at Canadian Institute for Cybersecurity (CIC). Her research work focuses on applying machine learning, computational intelligence, and data analysis techniques to design innovative security solutions. Before joining CIC, she worked as a research assistant at Swiss Federal Institute of Technology where she took part in cryptography and security research projects. Both her B.S. and M.S. was completed in Math and IT. Alina is a member of the Association for Computing Machinery, the IEEE Computer Society. Alina is presenting her research at several security and software engineering conferences including HackFest, IdentityNorth, ISACA Security & Risk, Droidcon SF, and PyCon Canada.
For exhibition and sponsorship opportunities, email aisponsorships@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of AI contacts
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)