Public Release: 

Researchers solve data science tasks in big data era using decision trees learning

World Scientific


IMAGE: This graphs depicts data mining and decision trees. view more

Credit: World Scientific, 2014

Data mining is the art of exploring large and complex bodies of data in order to discover useful patterns. Theoreticians and practitioners are continually seeking improved techniques to make the process more efficient, cost-effective and accurate.

The terms "Data Science" and "Data mining" were coined several years ago. "However, it really took shape only recently when technology has become sufficiently mature. Decision tree learning has established itself as a leading method in data science for obtaining an accurate yet comprehensive model", says Prof. Rokach, who recently published a book in the area with World Scientific.

Various domains such as commerce, medicine and research are applying data-driven discovery and prediction in order to gain some new insights. Google is an excellent example for a company that applies data science on a regular basis. It is well-known that Google tracks user clicks in an attempt to improve the relevance of its search engine results and its ad campaign management.

One of the ultimate goals of data science is the ability to make predictions about certain phenomena. Obviously, prediction is not an easy task. As the famous quote says: "It is difficult to make predictions, especially about the future" (attributed to Mark Twain and others). Still, we use prediction successfully all the time.

For example, the popular YouTube website (also owned by Google) analyzes our watching habits in order to predict which other videos we might like. Based on this prediction, YouTube service can present us with a personalized recommendation which is mostly very effective. In order to roughly estimate the service's efficiency you could simply ask yourself how often watching a video on YouTube lead you to watch a number of similar videos that were recommended to you by the system? Similarly, online social networks (OSN), such as Facebook and LinkedIn, automatically suggest friends and acquaintances that we might want to connect with.

The book, Data Mining with Decision Trees: Theory and Applications, was written by Lior Rokach (Ben-Gurion University of the Negev, Israel) and Oded Maimon (Tel-Aviv University, Israel). It describes how decision trees can be used for other data mining tasks, such as regression, clustering and survival analysis. In addition it includes a walk-through-guide for implementing decision trees using open-source software.

This scientific method can be found in Lior's book, which invites readers to explore the many benefits in data mining that decision trees offer: Self-explanatory and easy to follow when compacted; Able to handle a variety of input data: nominal, numeric and textual; Scale well to big data; Able to process datasets that may have errors or missing values; High predictive performance for a relatively small computational effort; Available in many open source data mining packages over a variety of platforms; Useful for various tasks, such as classification, regression, clustering and feature selection.


More information can be found at

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.