source: proiecte/Parallel-DT/R8/Doc/verbrules.1 @ 26

Last change on this file since 26 was 26, checked in by (none), 14 years ago

blabla

File size: 4.1 KB
Line 
1.TH C4.5 1
2.SH NAME
3A guide to the verbose output of the C4.5 production rule generator
4
5.SH DESCRIPTION
6This document explains the output of the program
7.I C4.5rules
8when it is run
9with the verbosity level (option
10.BR v )
11set to values from 1 to 3.
12.I C4.5rules
13converts unpruned decision trees into sets of pruned production
14rules.  Each set of rules is then sifted to find a subset of the
15rules which perform as well or better on the training data (see
16.IR c4.5rules(1) ).
17
18.SH RULE PRUNING
19
20.B Verbosity level 1
21
22A decision tree is converted to a set of production rules
23by forming a rule corresponding to each path from the
24root of the tree to each of its leaves.
25After each rule is extracted from the tree, it is examined
26to see whether the rule can be generalised by dropping
27conditions.
28
29For each rule, the verbose output shows the following figures
30for the rule as it stands, and for each of the rules that would
31be formed by dropping any one of the conditions:
32
33        Miss - no. of items misclassified by the rule
34        Hit  - no. of items correctly classified by the rule
35        Pess - the pessimistic error rate of the rule
36                 (i.e. 100*(misses+1)/(misses+hits+2))
37        Gain - the information gain of the rule
38        Absent condition - the condition being ignored
39
40If there are any conditions whose deletion brings about rules with
41pessimistic error rate less than the default error rate,
42and gain greater than that of the rule as it stands,
43then the one of these with the lowest pessimistic error rate
44is dropped.  When this happens, the message:
45
46        eliminate test \fId\fR
47
48is given and the new rule without condition \fId\fR
49is examined, and so on.
50
51When the rule has been pruned, either the rule is displayed,
52or the message:
53
54        duplicates rule \fIn\fR
55
56is given, where \fIn\fR is an identical rule already produced,
57and so the new rule is not added, or the message:
58
59        too inaccurate
60
61is given, indicating that the pessimistic error rate of the
62pruned rule is more than 50%, or more than the proportion of
63the items that are of the rule's class, and so the rule is
64not added.
65
66
67.SH RULE SIFTING
68
69.B Verbosity level 1
70
71The set of pruned rules for each class is then examined.
72Starting with no rules in the ruleset, the following
73process is repeated until no rules can be added or dropped.
74.IP "    1." 7
75If there are rules whose omission would not lead
76to an increase in the number of items misclassified,
77then the least useful of these is dropped.
78.IP "    2."
79Otherwise, if there are rules which lead to a decrease
80in the number of items misclassified, then the one
81with the least counterexamples is added.
82.TP 0
83This is shown in the output as:
84
85    Action  -  the number of the rule added or dropped
86    Change  -  the advantage attributable to the rule
87    Worth   -  the included rules for this class as:
88
89.IR                n1 [ n2 | n3 =
90.IR r1 ]
91
92    with:
93.IP "        \fIn1\fR" 11
94- the rule number
95.IP "        \fIn2\fR"
96- the number of items that correctly
97fire this rule and are not covered by any other included rule
98.IP "        \fIn3\fR"
99- the number of items that incorrectly
100fire this rule and are not covered by any other included rule
101.IP "        \fIr1\fR
102- the advantage attributable to the
103rule
104.HP 0
105After the rules have been sifted, the number of items of
106each class that are not covered by any rules is shown,
107and the default class is set to the class with the most
108uncovered items.
109
110
111.B Verbosity level 2
112
113When sifting rules for a particular class, the Worth of each rule
114which is for that class but not included in the ruleset,
115is shown at each stage of the process.
116
117.SH RULE SORTING
118
119.B Verbosity level 1
120
121The rules that are left are then sorted, starting with those
122that are for the class with the least number of false positives.
123The verbose output shows the number of false positives for each
124class (i.e. the number of items misclassified as being of this
125class).
126Within a class, rules with the greatest advantage are put first.
127
128.SH RULESET EVALUATION
129
130.B Verbosity level 3
131
132When evaluating a ruleset, shown are the attribute values,
133given class and class given by the ruleset for each
134item that is misclassified.
135
136
137.SH SEE ALSO
138
139c4.5(1), c4.5rules(1)
Note: See TracBrowser for help on using the repository browser.