Features 

  • General

    1. Data access from text files, relational databases, and Excel workbooks
    2. Handling of large volumes of data (since data sets are not stored in the computer memory, with the exception of Excel workbooks and result sets of some databases where database drivers do not support data streaming)
    3. Stand alone tool, independent of any other tools
    4. User friendly graphical user interface
    5. Operator chaining to create sequences of preprocessing transformations (operator tree)
    6. Creating of model tree for test/execution data

  • Data cleaning

    1. Character removal
    2. Text replacement
    3. Date conversion

  • Attribute operators on columns in the data set

    1. Delete/Move attributes

      • Remove selected attributes
      • Move selected attributes 

    2. Discretize numeric attributes

      • Equal width
      • Equal frequency
      • Equal frequency from grouped data

    3. Handle missing values

      • Delete records containing missing values
      • Remove attributes containing missing values
      • Impute missing values
      • Predict missing valuues from model (dependence tree, Naive Bayes model)
      • Include missing value patterns
    4. Handle outliers

      • Z-score method
      • Box-plot method
    5. Numerate nominal attributes

      • Create binary attributes
      • Replace nominal values by indices
    6. Reduce number of labels

      • Keep a specified number of most frequent labels and create a new label from the remaining labels.
    7. Scale numeric attributes

      • Decimal
      • Linear
      • Hyperbolic tangent
      • Soft-max
      • Z-score
      • Other transformations (log(x), 1/x, x2, x3)
    8. Select attributes

      • Manual selection
      • Mutual information selecttion
      • Robust mutual information selection

  • Record operators on rows in the data set

    1. Sampling (random, every k-th item, first-k)
    2. Select records by key

  • File Utilities that create new files

    1. Create data sets
    2. Create missing values
    3. Append
    4. Balance
    5. Change names
    6. Merge
    7. Sort
    8. Smooth

  • Output

    1.   Statistics
    2.   Table
    3.   File
    4.   Database
    5.  Visualize

       Visualize Numeric attributes:

      • Bar chart, cumulative frequency chart
      • Box plot (single, conditional)
      • Histogtram (single, conditional, normalized, overlaid, histogram matrix)
      • Lag plot
      • Linear regression plot
      • Normal-quantile plot
      • Quantile plot
      • Quantile-quantile plot
      • Run sequence plot
      • Scatter plot

       Visualize Nominal (categorical) attributes:

      • Bar chart, pie chart
      • Pareto chart
      • Stacked chart

       Numeric and nominal attributes:

      • Dependency tree
      • Parallel coordinates
  • Tools

    1. Create data sets from raw data
    2. Create samples from raw data
    3. Shuffle raw data
    4. Configure database drivers


Copyright © DataPreparator Software, 2010. All rights reserved.