Posts by Collection

portfolio

publications

Feature engineering and probabilistic tracking on honey bee trajectories

Published in Bachelor Thesis, Freie Universität Berlin, 2017

Feature engineering and model tuning to perform visual tracking of marked honey bees on a hive.

Recommended citation: Boenisch, Franziska. (2017). "Feature engineering and probabilistic tracking on honey bee trajectories." Bachelor Thesis. Freie Universität Berlin. https://www.mi.fu-berlin.de/inf/groups/ag-ki/Theses/Completed-theses/Bachelor-theses/2017/Boenisch/Bachelor-Boenisch.pdf

Tracking all members of a honey bee colony over their lifetime using learned models of correspondence

Published in Frontiers in Robotics and AI, 2018

Probabilistic object tracking framework to perform large-scale tracking of several thousand honey bees

Recommended citation: Boenisch, Franziska, et al. (2018). "Tracking all members of a honey bee colony over their lifetime using learned models of correspondence." Frontiers in Robotics and AI. 5(35). https://www.frontiersin.org/articles/10.3389/frobt.2018.00035/full

Differential Privacy: General Survey and Analysis of Practicabilityin the Context of Machine Learning

Published in Master Thesis, Freie Universität Berlin, 2019

Introduction and literature review on Differential Privacy. Implementation and performance evaluation several Differentially Privacy linear regression models.

Recommended citation: Boenisch, Franziska. (2019). "Differential Privacy: General Survey and Analysis of Practicabilityin the Context of Machine Learning." Master Thesis. Freie Universität Berlin. https://www.mi.fu-berlin.de/inf/groups/ag-idm/theseses/2019_Boenisch_MSc.pdf

talks

50 Shades of Privacy

Published:

A science slam is a different way to present your research, which only has two rules: 1) You need to present something you are personally researching on, 2) The talk shall not exeed 10 minutes, 3) The talk should be entaining and understandable. I presented my research on Differential Privacy. Find my talk here.

The Long and Winding Road of Secure and Private Machine Learning

Published:

Abstract: Nowadays, machine learning (ML) is used everywhere, including in sectors that deal with extremely sensitive data, like health or finance. And while most companies do not deploy a single line of code anymore without being tested somehow, ML models are often let out into the wild without being checked or secured. In my talk, I will guide you through the long road of possible threats and attacks that your ML models might be experiencing out there, and give an overview what countermeasures might be worth considering. Link to the event here.

Privacy-preserving Machine Learning with Differential Privacy

Published:

Abstract: With the growing amount of data being collected about individuals, ever more complex machine learning models can be trained based on those individuals’ characteristics and behaviors. Methods for extracting private information from the trained models become more and more sophisticated, such that individual privacy is threatened. In this talk, I will introduce some powerful methods for training neural networks with privacy guarantees. I will also show how to apply those methods effectively in order to achieve a good trade-off between utility and privacy. A video-recording of the meetup available here.

Bringing Privacy-Preserving Machine Learning Methods into Real-World Use

Published:

Abstract: Nowadays, there exist several privacy-preserving machine learning methods. Most of them are made available to potential users through tools or programming libraries. However, in order to thoroughly protect privacy, these tools need to be applied in the correct scenarios with the correct setting. This lecture covers the identification of concrete threat spaces concerning privacy in machine learning, the choice of adequare protection measures, and their practical application. Especially the latter point is discussed in class with respect to general usability and design patterns.

Privacy-Preservation in Machine Learning: Threats and Solutions

Published:

Abstract: In recent years, privacy threats against user data have become more diverse. Attacks are no longer solely directed against databases where sensible data is stored but can also be applied to data analysis methods or their results. Thereby, they enable an adversary to learn potentially sensitive attributes of the data used for the analyses. This lecture aims at presenting common privacy threat spaces in data analysis methods with a special focus on machine learning. Next to a general view on privacy preservation and threat models, some very specific attacks against machine learning privacy are introduced (e.g. model inversion, membership inference). Additionally, a range of privacy-preservation methods for machine learning, such as differential privacy, homomorphic encryption, etc., are presented. Finally, their adequate application is discussed with respect to common threat spaces. The video of my lecture can be found here.

Privacy-Preservation in Machine Learning: Threats and Solutions

Published:

Abstract: Neural networks are increasingly being applied in sensitive domains and on private data. For a long time, no thought was given to what this means for the privacy of the data used for their training. Only in recent years has there emerged an awareness that the process of converting training data into a model is not irreversible as previously thought. Since then, several specific attacks against privacy in neural networks have been developed. Of these, we will discuss two specific ones, namely membership inference and model inversion attacks. First, we will focus on how they retrieve potentially sensitive information from trained models. Then, we will look into several factors that influence the success of both attacks. At the end, we will discuss Differential Privacy as a possible protection measure.

A Survey on Model Watermarking Neural Networks

Published:

Abstract: Machine learning (ML) models are applied in an increasing variety of domains. The availability of large amounts of data and computational resources encourages the development of ever more complex and valuable models. These models are considered intellectual property of the legitimate parties who have trained them, which makes their protection against stealing, illegitimate redistribution, and unauthorized application an urgent need. Digital watermarking presents a strong mechanism for marking model ownership and, thereby, offers protection against those threats. The emergence of numerous watermarking schemes and attacks against them is pushed forward by both academia and industry, which motivates a comprehensive survey on this field. This document at hand provides the first extensive literature review on ML model watermarking schemes and attacks against them. It offers a taxonomy of existing approaches and systemizes general knowledge around them. Furthermore, it assembles the security requirements to watermarking approaches and evaluates schemes published by the scientific community according to them in order to present systematic shortcomings and vulnerabilities. Thus, it can not only serve as valuable guidance in choosing the appropriate scheme for specific scenarios, but also act as an entry point into developing new mechanisms that overcome presented shortcomings, and thereby contribute in advancing the field. Find the video here and the original paper here.

When the Curious Abandon Honesty: Federated Learning Is Not Private

Published:

Abstract: In federated learning (FL), data does not leave personal devices whenthey are jointly training a machine learning model. Instead, thesedevices share gradients, parameters, or other model updates, with acentral party (e.g., a company) coordinating the training. Becausedata never “leaves” personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but athin facade, as even a passive attacker observing gradients can recon-struct data of individual users contributing to the protocol.In this paper, we argue that prior workstilllargely underestimatesthe vulnerability of FL. This is because prior efforts exclusively con-sider passive attackers that are honest-but-curious. Instead, we in-troduce an active and dishonest attacker acting as the central party,who is able to modify the shared model’s weights before users com-pute model gradients. We call the modified weightstrap weights.Our active attacker is able to recover user dataperfectly. Recoverycomes with near zero costs: the attack requires no complex optimiza-tion objectives. Instead, our attacker exploits inherent data leakagefrom model gradients and simply amplifies this effect by maliciouslyaltering the weights of the shared model through the trap weights. Find the video here and the original paper here.

When the Curious Abandon Honesty: Federated Learning Is Not Private

Published:

Abstract: In federated learning (FL), data does not leave personal devices whenthey are jointly training a machine learning model. Instead, thesedevices share gradients, parameters, or other model updates, with acentral party (e.g., a company) coordinating the training. Becausedata never “leaves” personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but athin facade, as even a passive attacker observing gradients can recon-struct data of individual users contributing to the protocol.In this paper, we argue that prior workstilllargely underestimatesthe vulnerability of FL. This is because prior efforts exclusively con-sider passive attackers that are honest-but-curious. Instead, we in-troduce an active and dishonest attacker acting as the central party,who is able to modify the shared model’s weights before users com-pute model gradients. We call the modified weightstrap weights.Our active attacker is able to recover user dataperfectly. Recoverycomes with near zero costs: the attack requires no complex optimiza-tion objectives. Instead, our attacker exploits inherent data leakagefrom model gradients and simply amplifies this effect by maliciouslyaltering the weights of the shared model through the trap weights. Find the video here and the original paper here.

When the Curious Abandon Honesty: Federated Learning Is Not Private

Published:

Abstract: In federated learning (FL), data does not leave personal devices whenthey are jointly training a machine learning model. Instead, thesedevices share gradients, parameters, or other model updates, with acentral party (e.g., a company) coordinating the training. Becausedata never “leaves” personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but athin facade, as even a passive attacker observing gradients can recon-struct data of individual users contributing to the protocol.In this paper, we argue that prior workstilllargely underestimatesthe vulnerability of FL. This is because prior efforts exclusively con-sider passive attackers that are honest-but-curious. Instead, we in-troduce an active and dishonest attacker acting as the central party,who is able to modify the shared model’s weights before users com-pute model gradients. We call the modified weightstrap weights.Our active attacker is able to recover user dataperfectly. Recoverycomes with near zero costs: the attack requires no complex optimiza-tion objectives. Instead, our attacker exploits inherent data leakagefrom model gradients and simply amplifies this effect by maliciouslyaltering the weights of the shared model through the trap weights. Find the video here and the original paper here.

When the Curious Abandon Honesty: Federated Learning Is Not Private

Published:

Abstract: In federated learning (FL), data does not leave personal devices whenthey are jointly training a machine learning model. Instead, thesedevices share gradients, parameters, or other model updates, with acentral party (e.g., a company) coordinating the training. Becausedata never “leaves” personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but athin facade, as even a passive attacker observing gradients can recon-struct data of individual users contributing to the protocol.In this paper, we argue that prior workstilllargely underestimatesthe vulnerability of FL. This is because prior efforts exclusively con-sider passive attackers that are honest-but-curious. Instead, we in-troduce an active and dishonest attacker acting as the central party,who is able to modify the shared model’s weights before users com-pute model gradients. We call the modified weightstrap weights.Our active attacker is able to recover user dataperfectly. Recoverycomes with near zero costs: the attack requires no complex optimiza-tion objectives. Instead, our attacker exploits inherent data leakagefrom model gradients and simply amplifies this effect by maliciouslyaltering the weights of the shared model through the trap weights. Find the video here and the original paper here.

What trust model is needed for federated learning to be private?

Published:

Abstract: In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients with a central party (e.g., a company). Because data never “leaves” personal devices, FL was promoted as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive attacker observing gradients can reconstruct data of individual users. In this talk, I will explore the trust model required to implement practical privacy guarantees in FL by studying the protocol under the assumption of an untrusted central party. I will first show that in vanilla FL, when dealing with an untrusted central party, there is currently no way to provide meaningful privacy guarantees. I will depict how gradients of the shared model directly leak some individual training data points—and how this leakage can be amplified through small, targeted manipulations of the model weights. Thereby, the central party can directly and perfectly extract sensitive user-data at near-zero computational costs. Then, I will move on and discuss defenses that implement privacy protection in FL. Here, I will show that an actively malicious central party can still have the upper hand on privacy leakage by introducing a novel practical attack against FL protected by secure aggregation and differential privacy – currently considered the most private instantiation of the protocol. I will conclude my talk with an outlook on what it will take to achieve privacy guarantees in practice.

What trust model is needed for federated learning to be private?

Published:

Abstract: In federated learning (FL), data does not leave personal devices whenthey are jointly training a machine learning model. Instead, thesedevices share gradients, parameters, or other model updates, with acentral party (e.g., a company) coordinating the training. Becausedata never “leaves” personal devices, FL is presented as privacy-preserving. Yet, recently it was shown that this protection is but athin facade, as even a passive attacker observing gradients can recon-struct data of individual users contributing to the protocol.In this paper, we argue that prior workstilllargely underestimatesthe vulnerability of FL. This is because prior efforts exclusively con-sider passive attackers that are honest-but-curious. Instead, we in-troduce an active and dishonest attacker acting as the central party,who is able to modify the shared model’s weights before users com-pute model gradients. We call the modified weightstrap weights.Our active attacker is able to recover user dataperfectly. Recoverycomes with near zero costs: the attack requires no complex optimiza-tion objectives. Instead, our attacker exploits inherent data leakagefrom model gradients and simply amplifies this effect by maliciouslyaltering the weights of the shared model through the trap weights. Find the video here and the original paper here.

What trust model is needed for federated learning to be private?

Published:

Abstract: In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients with a central party (e.g., a company). Because data never “leaves” personal devices, FL was promoted as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive attacker observing gradients can reconstruct data of individual users. In this talk, I will explore the trust model required to implement practical privacy guarantees in FL by studying the protocol under the assumption of an untrusted central party. I will first show that in vanilla FL, when dealing with an untrusted central party, there is currently no way to provide meaningful privacy guarantees. I will depict how gradients of the shared model directly leak some individual training data points—and how this leakage can be amplified through small, targeted manipulations of the model weights. Thereby, the central party can directly and perfectly extract sensitive user-data at near-zero computational costs. Then, I will move on and discuss defenses that implement privacy protection in FL. Here, I will show that an actively malicious central party can still have the upper hand on privacy leakage by introducing a novel practical attack against FL protected by secure aggregation and differential privacy – currently considered the most private instantiation of the protocol. I will conclude my talk with an outlook on what it will take to achieve privacy guarantees in practice.

What trust model is needed for federated learning to be private?

Published:

Abstract: In federated learning (FL), data does not leave personal devices when they are jointly training a machine learning model. Instead, these devices share gradients with a central party (e.g., a company). Because data never “leaves” personal devices, FL was promoted as privacy-preserving. Yet, recently it was shown that this protection is but a thin facade, as even a passive attacker observing gradients can reconstruct data of individual users. In this talk, I will explore the trust model required to implement practical privacy guarantees in FL by studying the protocol under the assumption of an untrusted central party. I will first show that in vanilla FL, when dealing with an untrusted central party, there is currently no way to provide meaningful privacy guarantees. I will depict how gradients of the shared model directly leak some individual training data points—and how this leakage can be amplified through small, targeted manipulations of the model weights. Thereby, the central party can directly and perfectly extract sensitive user-data at near-zero computational costs. Then, I will move on and discuss defenses that implement privacy protection in FL. Here, I will show that an actively malicious central party can still have the upper hand on privacy leakage by introducing a novel practical attack against FL protected by secure aggregation and differential privacy – currently considered the most private instantiation of the protocol. I will conclude my talk with an outlook on what it will take to achieve privacy guarantees in practice.

teaching

Security Protocols and Infrastructures

Lecture, Freie Universität Berlin, Department of Computer Science, 2019

Worked as teaching assistant for the Master level course Security Protocols and Infrastructures. The course treated security protocols (e.g. TLS, PACE, EAC), ASN.1, certificates and related norms such as X.509/RFC5280, and public key infrastructures (PKI).

Machine Learning and IT Security

Seminary, Freie Universität Berlin, Department of Computer Science, 2020

Held a Master level seminary about Machine Learning and IT Security. The seminary covered topics about securing digital infrastructure through ML assistance, as well as protecting ML models against security and privacy violations.

Hello (brand new data) world

Seminary, Universität Bayreuth, Department of Philosophy, 2020

Held a Bachelor level invited seminary about the ethical implications of ML on society. The seminary consisted of a technical / computer science as well as a philosophical part. In the technical part, theoretical background as well as implementation details of ML algorithms were presented. The philosophy part treated subjects as the Turing test, the Chinese room argument, and discussions about dataism, surveillance, autonomous driving and autonomous weapon systems.

Privacy-Preserving Machine Learning

Software Project, Freie Universität Berlin, Department of Computer Science, 2021

I organized and held the software project “Privacy-Preserving Machine Learning” with final year Bachelor and Master students from Freie University Berlin. The goal of the project was to build a software library that allows non-privacy-expert machine learning (ML) practitioners to evaluate the privacy of their neural networks. Additionally, the tool should help non-ML-experts who are in charge with system security to get an impression about the model privacy. To evaluate the privacy, several attacks against ML models were implemented. The outcome of the software project can be found in our GitHub Repository. All project management was done with Scrum where I acted as a Product Owner and the students as the Developer Team.

Trustworthy Machine Learning

Seminary, Freie Universität Berlin, Department of Computer Science, 2022

Held a Master level seminary about Trustworthy Machine Learning. The seminary covered the following topics:

  • integrity attacks against ML models at training time
  • integrity attacks and defenses against ML models at test time
  • ML model confidentiality
  • privacy attacks against ML models
  • differential privacy
  • fairness and ethics
  • trustworthiness in federated learning