Class Comparison Methods in Data Mining
In many applications, users may not be interested in having a single class or concept described or characterized but rather would prefer to mine a description comparing or distinguishing one class (or concept) from other comparable classes (or concepts). Class discrimination or comparison (hereafter referred to as class comparison) mines descriptions that distinguish a target class from its contrasting classes. Notice that the target and contrasting classes must be comparable because they share similar dimensions and attributes. For example, the three classes, person, address, and item, are not comparable.
The previous sections' discussions on class characterization handle multilevel data summarization and characterization in a single class. However, the sales in the last three years are comparable classes, and so are computer science students versus physics students. The techniques developed can be extended to handle class comparison across several comparable classes.
For example, the attribute generalization process described for class characterization can be modified so that the generalization is performed synchronously among all the classes compared. This allows the attributes in all classes to be generalized to the same levels of abstraction. Suppose that we are given the All Electronics data for sales in 2003 and sales in 2004 and would like to compare these two classes. Consider the dimension location with abstractions at the city, province or state, and country levels. Each class of data should be generalized to the same location level. They are synchronously all generalized to either the city level, the province or state level, or the country level. Ideally, this is more useful than comparing the sales in Vancouver in 2003 with the sales in the United States in 2004 (i.e., where each set of sales data is generalized to a different level). The users, however, should have the option to overwrite such an automated, synchronous comparison with their own choices when preferred.
Class Comparison Methods and Implementation
The general procedure for class comparison is as follows:
Now from this, we can formulate that
Presentation of Class Comparison Descriptions
As with class characterizations, class comparisons can be presented to the user in various forms, including generalized relations, crosstabs, bar charts, pie charts, curves, and rules. Except for logic rules, these forms are used in the same way for characterization as for comparison. This section discusses the visualization of class comparisons in the form of discriminant rules.
Similar to characterization descriptions, the discriminative features of the target and contrasting classes of a comparison quantitatively by a quantitative discriminant rule, which associates a statistical interestingness measure, d-weight, with each generalized tuple in the description.