The woman who shot the image, a librarian in rural Vermont named Jessamyn West, was surprised and angry when she found out the photo was used by IBM. She had uploaded it to Flickr with a use of the photo by some others. But something about not knowing that this image, along with other creations licensed pictures took – a self-portrait and about a dozen other shots – were included in a facial recognition dataset bothered here.
"I think if anybody had asked that all, I would have had a different feeling, "she said.
As a result, some researchers say they're now rethinking the Wild West atmosphere that pervades the face-gathering status quo use of an image of yourself (or someone else that you photographed) in a world where we constantly share our lives online.
Where the faces come from
ent years due to the popularity of a powerful form of machine learning called deep learning. In a typical system, faces are scanned (from still images, videos, or a live stream), and their features analyzed and then compared with labeled faces in a database.
Yet there are still accuracy issues. Researchers are increasingly concerned about bias in AI systems, which is evident in, say, the technology can do things like people of color and women. One reason for this issue is that the data sets used to train the software may be disproportionately male and white; IBM believes Diversity in Faces is more balanced than previous datasets.
David A. Shamma, who helped put together the Flickr dataset when he was director of research at Yahoo Labs, said that for years academics working on computer vision or object recognition were just trying to scrape data wherever they could get it. 19659003] "It was just an academic process where people would often say, 'No harm, no foul,'" he said.
By releasing the big Flickr dataset, Shamma, now a senior research scientist at FX Palo Alto Laboratory, felt he and his colleagues had an opportunity to hand a big, licensed pile of images to researchers so they could build upon it.
Those images had been uploaded to Flickr both by regular people like West, and by pros, all with Creative Commons licenses. These are special kinds of copyright licenses that clearly state the terms under which such images and videos can be used and shared by others, though you may not be aware of the specific ways they are used.
Creative Commons licenses were first released in 2002, and Flickr in particular has been around since 2004 – way before the current AI tree.
While researchers freely use images on sites like Flickr, they also acknowledge that many people post these "I think people expect it, but when you confront them with what is being used for, they won't expect it," Shamma said.
It took West by surprise. NBC News in March about IBM's data set included a tool NBC made to look up whether or not Flickr user's photos were included in it, she typed in her Flickr username: iamthebestartist. West was upset when she realized the photo she took of her junior-high friend's family and numerous other photos were part of it. She thinks AI will be helpful in the future, but she's concerned about her photos being used to train it with her knowledge
IBM customs CNN Business that it is "committed to the privacy rights of individuals" and that anyone who is included in the dataset can opt out at any time. It's not offering a tool of its own to find out if specific images are linked in the dataset, though, so people have to look it up through the one built by NBC.
Meanwhile, researchers at graphics chip maker Nvidia are looking at IBM's experience and thinking about how to change their own practices.
The tool came after NBC published its story, but David Luebke, Nvidia's vice president of graphics research, said it was already in the works for some time. 19659003] "We were thinking about what this would look like as people become more aware of this," he said. "If some people have objections, we want to make sure we're respectful of that."
The company also included a list of users who can take if their photo is removed from the dataset, and if they like to avoid having it used for future computer-vision research. These suggestions include making the photo private, changing the license attached to it, or even adding a tag to the photo – a phrase associated with the image that's searchable on Flickr – that says "no_cv" to show they don't want it to be used for computer-vision research.
"I think a lot of people either don't care or would be because their photos go to something like StyleGAN," Luebke said. "But if you don't have a way to opt out."
Some researchers believe in a good way to give people more control over how their images are used by a license that lets them determine clearly whether individual images they post can be used for computer vision or AI.
This is unlikely to come from Creative Commons, though. The nonprofit's licenses do not limit or use any images for the development of any kind or AI, as long as the terms are attached to a work are followed.
"The licenses are not designed as a tool to protect privacy or protect research ethics, "said Ryan Merkley, Creative Commons CEO.
Artificial intelligence has been rolled out so recently in recent years that regulations have begun to be formulated, alone implementeres. And when it comes to gathering and using images for facial recognition, companies and researchers are legally obliged to tell people much of anything.
There are no such federal rules related to how the technology can be built or used. A bit more has happened at the state level: Illinois, for example, has a law that requires companies to consent from customers before collecting biometric information. And the state senate in Amazon and Microsoft's home state of Washington recently passed a bill that limits the use of facial recognition. That bill still has to pass the state's house of representatives
Merkley and others think about how data is gathered for training and testing this kind of technology should be considered. This could happen in the not-so-distant future: in March, a Senate bill was introduced that would force companies to get consent from consumers before collecting and sharing identifying data. It would also require companies to conduct outside testing to ensure algorithms are fair before they are implemented, and let people know when facial recognition technology is in use.
Even in the absence of strictly legal boundaries for using images of people to train AI systems, there are ethical boundary companies and research groups should pass, said Jeremy Gillula, technology policy director for digital rights group the Electronic Frontier Foundation.  In his view, getting explicit consent from people whose faces are in those images. Sometimes that will be hard, he said, but that's a reality companies should have to face.
"I definitely think it matters," Gillula said. "I think it matters to people who are used for purposes they had not imagined they were used to."