ActiveClean vs. Other Data Cleaning Methods (IMAGE)
Caption
Tested on a dirty, real-world data set, ActiveClean (in red), was able to clean just 5,000 records to bring the researchers' prediction model to a 90 percent accuracy level. The next best technique, called active learning (in green), had to clean 50,000 records to achieve comparable results. The most common data-cleaning method -- trial-and-error (in purple) -- provided minimal model improvement.
Credit
Eugene Wu
Usage Restrictions
None
License
Licensed content