What is Data Preparation?

Data preparation (or data preprocessing) in this context means manipulation of data into a form suitable for further analysis and processing. It is a process that involves many different tasks and which cannot be fully automated. Many of the data preparation activities are routine, tedious, and time consuming. It has been estimated that data preparation accounts for 60%-80% of the time spent on a data mining project.


Data preparation is essential for successful data mining. Poor quality data typically result in incorrect and unreliable data mining results.

Data preparation improves the quality of data and consequently helps improve the quality of data mining results. The well known saying "garbage-in garbage-out" is very relevant to this domain.

Our aim is to develop tools to facilitate the tasks of data preparation.

