Computing research currently relies on nonconsensually-collected nude images
We are working toward building community guidelines and an consensually-sourced and participatorily-governed data trust of nude and sexual images.
Nonconsensual Collection
In our review of 150 computer science publications, we found that over 8 million nude images were collected and used without consent. Of the almost two-thirds of papers that created their own datasets, many failed to mention a data source or stated only that they collected data from “the Internet”. In a few cases, researchers collected and used abusive content. Notably, one paper collected 1,637 upskirt images from “the Internet” and another paper extracted content that was captured by “hidden or self cameras”. In another instance, a paper used a dataset that contained content depicting sexual violence from forums that are now banned. Lastly, we found a paper that alluded to collecting images depicting minors.
Nonconsensual Distribution
Although there is a norm in the AI/ML community toward sharing data to promote open science, such practices constitute image-based sexual abuse (IBSA) in this context. IBSA is a broad category of harm that relates to the nonconsensual creation (e.g., “upskirting”, “deepfakes”) or distribution of intimate content, as well as threats to cause these harms. IBSA is illegal in many countries and can lead to consequences similar to other forms of sexual violence. More than half of the papers we analyzed published nude example images, often with identifiable features. A few included public links to the datasets they had created. A key challenge victim-survivors of IBSA face is stopping the further dissemination of nonconsensually created and distributed content. Further distribution as published examples, for annotation purposes, or in publicly accessible datasets (even for research) contributes to this challenge.
Data Handling Practices
We also found that data security and ethical considerations were rare. We did not find any papers that discussed (the lack of) consent from image subjects. Furthermore, we observed virtually no difference between how researchers handled nude images compared to other image and vision datasets. Despite dealing with highly sensitive data, most papers make no effort to describe protections against accidental data leakage or deletion plans for the nude images they used.
We conducted an in-depth systematic review of 150 computer science publications that used nude images. We find that ethical norms around the usage of nude images in research are sparse, leading to a litany of problematic practices, like distributing and publishing nude images with uncensored faces and intentionally collecting abusive content.
Pathways Forward
While our research has uncovered many ways that the research community is in need of guidance on this issue, we believe there are pathways forward that protect the dignity of all people. As a research community, if we accept that there exist tasks for which this data is necessary, it is imperative that we also forge pathways to the ethical creation and handling of such data.
Community Guidelines
Although datasets containing nude images have been used for well over two decades in computer science research, there has been little guidance on their usage. There is a pressing need for community guidelines for researchers, reviewers, and publishers. We offer the following page as a starting point but call for the broader research community to offer additions, suggestions and critiques. We are also looking to source perspectives and expertise from researchers and industry professionals on the creation and handling of nude datasets to aid the creation of practical guidance. If you are interested in participating in a confidential paid interview study, contact us below.
A New Model of Data Governance
Researchers should work to build systems that detect, takedown, and prevent the spread of abusive content, while simultaneously recognizing that not all nude content is abusive. We call for a new model of data governance in this domain that centers ongoing, informed consent. Specifically, we propose a participatorily-governed data trust of nude and sexual images that are consensually collected and used for research purposes.
We are looking for funding to support this effort. Please contact us if you are interested in learning more.