A UC Berkeley PhD candidate has developed the first automated techniques to identify adult ads tied to human trafficking rings by linking the ads to public information from Bitcoin -- the primary payment method for online sex ads.
The Internet has enabled and emboldened human traffickers to advertise sexual services, but law enforcement efforts to trace and disband human trafficking rings are hindered by the pseudonymous nature of adult ads, the tendency of ring leaders to employ multiple phone numbers and email addresses to avoid detection, and the difficulty in determining which online ads reflect willing participants in the sex trade and which ones reflect victims forced into prostitution. The study is a first step toward developing a suite of freely available tools to help police and non-profit institutions overcome these challenges and identify victims of sexual exploitation.
"The technology we've built finds connections between ads," said Rebecca Portnoff, a UC Berkeley PhD candidate in computer science, who developed the algorithm as part of her dissertation. "Is the pimp behind that post for Backpage also behind this post in Craigslist? Is he the same man who keeps receiving Bitcoin for trafficked girls? Questions like these are answerable only through more sophisticated technological tools -- exactly what we've built in this work -- that link ads together using payment mechanisms and the language in the ads themselves."
Portnoff will present the findings in August at the Association for Computing Machinery's SIGKDD Conference on Knowledge Discovery and Data Mining, one of the world's leading data mining conferences, which will publish the paper in its proceedings. The work was funded by Amazon Web Services Cloud Credits for Research program, Giant Oak, Google, the National Science Foundation, and the U.S. Department of Education. Computer scientists from UC San Diego and the New York University Tandon School of Engineering were also involved in the study.
The research team's approach relies on two novel algorithms. The first is a machine learning algorithm rooted in stylometry, which is the analysis of an individual's writing style to identify authorship. Stylometry can provide confirmation of authorship with high confidence, and in the case of online trafficking ads, allows researchers and police to identify cases in which separate advertisements for different sex workers share a single author: a telltale sign of a trafficking ring, but hard to identify without sophisticated computer analysis.
"Imagine looking through page after page of explicit advertisements, some for underage victims. You're looking through all this material to find the set that are advertising trafficked and underage victims. Even given a team of humans dedicated to the task, there's simply too much data -- often quite traumatizing - to go through," Portnoff said
By automating stylometric analysis, the researchers discovered they could quickly identify groups of ads with a common author on Backpage, one of the most popular sites for online sex ads. (Since this research was conducted, the adult advertising section of Backpage was discontinued; however, the researchers noted that adult ads remain prevalent, now appearing in multiple sections of the site.)
After identifying groups of ads with a single author, the researchers then tested an automated system that utilizes publicly available information from the Bitcoin mempool and blockchain -- the ledgers that record pending and completed transactions. Because Backpage posts ads as soon as payment is received, the researchers compared the timestamp indicating submission of payment to the timestamp of the ads' appearance on Backpage. All Bitcoin users maintain accounts, called wallets, and tracing payment of ads that have the same author to a unique wallet is a potential method for identifying ownership of the ads, and thus the individuals or groups involved in human trafficking.
"There are hundreds of thousands of these ads placed every year, and any technique that can surface commonalities between ads and potentially shed light on the owners is a big boost for those working to curb exploitation," said Damon McCoy, an NYU Tandon assistant professor of computer science and engineering and one of the paper's co-authors.
Combining automated stylometric and timestamp analysis to identify sex ads by both author and Bitcoin owner represents a considerable advancement in assisting law enforcement and nonprofit organizations that try to identify victims of human trafficking, McCoy said.
The researchers deployed their automated author identification techniques on a sampling of 10,000 real adult ads on Backpage, a four-week scrape of all adult ads that appeared on Backpage during that time, as well as on several dozen ads they themselves placed as a point of comparison. They reported an 89 percent true-positive rate for grouping ads by author -- significantly more accurate than current stylometric machine learning algorithms.
The team also reported a high rate of success in linking the ads they placed themselves to the corresponding transactions in the Bitcoin blockchain.
They acknowledge, however, that they were unable to verify whether matches they made using real-life ads and Bitcoin transaction information truly correspond to individuals tied to human trafficking - that must ultimately be pursued by police.
"Sex trafficking of children hides in plain sight within the vast online escort environment. It's difficult for investigators to sift through the mounds of data and figure out what is important and what is not when looking for a child," said Julie Cordua, CEO of Thorn. "This type of research is critical to advancing this work and helping investigators find children faster and reduce the time in trauma. We're grateful to academics and researchers who are willing to lend their time and talent to this issue to help find new solutions that move this work forward."