Microsoft's Fight Against Child Porn
Last week I was invited to attend Microsoft’s Citizenship Accelerator Summit. This was an opportunity for Microsoft’s management, including Steve Ballmer (Microsoft’s CEO) and other senior executives, to share what socially beneficial activities Microsoft is up to. Some of this was either predictable (but laudable) such as supporting volunteerism or the United Way, or activities I was aware of and have admired in the past. This latter category included such great projects as Tech Soup Global, which distributes donated Microsoft products at a deep, deep discount (a service we’ve used at Benetech for years) as well as support for the efforts of NetHope in disaster relief. NetHope is the organization of the CTOs/CIOs of the major global humanitarian NGOs, and gets a fair amount of support from Microsoft, Cisco and other tech companies.
The most interesting project I saw was in the admittedly dark and unpleasant topic of combating child porn on the Internet. As the Microsoft staff pointed out several times, child porn was not widely distributed before the Internet (they said it was almost completely gone). But, the advent of the Internet has made distribution of these images much easier, and in many, many cases, the same images that are distributed have been discovered as part of criminal investigations or taken down from websites on multiple occasions. But, the images are slightly different in many of these cases: they are the same image, but not the exact same image file.
Microsoft worked with researcher Hany Farid of Dartmouth College adapt their technology to identifying these images. This involves a kind of pattern recognition technology that Microsoft calls PhotoDNA, which recognizes that this new file is a highly similar to a known image of child porn. Since my background is in optical pattern recognition (especially character recognition), I was fascinated to see a new and different pattern recognition solution being applied to a new social problem.
Microsoft doesn’t actually host a database of these images: they make their tech available for free to the National Center for Missing & Exploited Children (NCMEC), a nonprofit group that has the mission of combating child abduction and exploitation. The software computes a single value, known in programming lingo as a hash, that is unique (or nearly unique) to each image. The technical advance here is to have a hash value that stays the same, even when the image goes through minor modifications and conversions.
When Microsoft’s Bing search engine indexes a new image file on the Web, they run the same software against that new image, calculate the hash value and check it against the database of known “bad” hash values. If there’s a match, it’s highly likely that the new image is just a version of the same old known image of cruelty to children. And then, Bing won’t serve up that image if it’s requested in a search result. In addition, Microsoft will let the National Center (or its counterpart organization in another country) know about the URL, so that they can pursue take-down or other compliance measures.
The business angle is there, but not as direct as it is in some of the other Microsoft initiatives: Microsoft sees a business value in not serving up child porn (even if a small number of their search users may be seeking it). But, I really liked the example of good software development techniques (hashes are really computationally efficient) being used to advance the social goal of reducing the availability of this repugnant material.
The most interesting project I saw was in the admittedly dark and unpleasant topic of combating child porn on the Internet. As the Microsoft staff pointed out several times, child porn was not widely distributed before the Internet (they said it was almost completely gone). But, the advent of the Internet has made distribution of these images much easier, and in many, many cases, the same images that are distributed have been discovered as part of criminal investigations or taken down from websites on multiple occasions. But, the images are slightly different in many of these cases: they are the same image, but not the exact same image file.
Microsoft worked with researcher Hany Farid of Dartmouth College adapt their technology to identifying these images. This involves a kind of pattern recognition technology that Microsoft calls PhotoDNA, which recognizes that this new file is a highly similar to a known image of child porn. Since my background is in optical pattern recognition (especially character recognition), I was fascinated to see a new and different pattern recognition solution being applied to a new social problem.
Microsoft doesn’t actually host a database of these images: they make their tech available for free to the National Center for Missing & Exploited Children (NCMEC), a nonprofit group that has the mission of combating child abduction and exploitation. The software computes a single value, known in programming lingo as a hash, that is unique (or nearly unique) to each image. The technical advance here is to have a hash value that stays the same, even when the image goes through minor modifications and conversions.
When Microsoft’s Bing search engine indexes a new image file on the Web, they run the same software against that new image, calculate the hash value and check it against the database of known “bad” hash values. If there’s a match, it’s highly likely that the new image is just a version of the same old known image of cruelty to children. And then, Bing won’t serve up that image if it’s requested in a search result. In addition, Microsoft will let the National Center (or its counterpart organization in another country) know about the URL, so that they can pursue take-down or other compliance measures.
The business angle is there, but not as direct as it is in some of the other Microsoft initiatives: Microsoft sees a business value in not serving up child porn (even if a small number of their search users may be seeking it). But, I really liked the example of good software development techniques (hashes are really computationally efficient) being used to advance the social goal of reducing the availability of this repugnant material.
Comments