Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

verbrules.1 @ 62

Last change on this file since 62 was 26, checked in by (none), 15 years ago
blabla
File size: 4.1 KB

Rev	Line
[26]	1	.TH C4.5 1
	2	.SH NAME
	3	A guide to the verbose output of the C4.5 production rule generator
	4
	5	.SH DESCRIPTION
	6	This document explains the output of the program
	7	.I C4.5rules
	8	when it is run
	9	with the verbosity level (option
	10	.BR v )
	11	set to values from 1 to 3.
	12	.I C4.5rules
	13	converts unpruned decision trees into sets of pruned production
	14	rules. Each set of rules is then sifted to find a subset of the
	15	rules which perform as well or better on the training data (see
	16	.IR c4.5rules(1) ).
	17
	18	.SH RULE PRUNING
	19
	20	.B Verbosity level 1
	21
	22	A decision tree is converted to a set of production rules
	23	by forming a rule corresponding to each path from the
	24	root of the tree to each of its leaves.
	25	After each rule is extracted from the tree, it is examined
	26	to see whether the rule can be generalised by dropping
	27	conditions.
	28
	29	For each rule, the verbose output shows the following figures
	30	for the rule as it stands, and for each of the rules that would
	31	be formed by dropping any one of the conditions:
	32
	33	Miss - no. of items misclassified by the rule
	34	Hit - no. of items correctly classified by the rule
	35	Pess - the pessimistic error rate of the rule
	36	(i.e. 100*(misses+1)/(misses+hits+2))
	37	Gain - the information gain of the rule
	38	Absent condition - the condition being ignored
	39
	40	If there are any conditions whose deletion brings about rules with
	41	pessimistic error rate less than the default error rate,
	42	and gain greater than that of the rule as it stands,
	43	then the one of these with the lowest pessimistic error rate
	44	is dropped. When this happens, the message:
	45
	46	eliminate test \fId\fR
	47
	48	is given and the new rule without condition \fId\fR
	49	is examined, and so on.
	50
	51	When the rule has been pruned, either the rule is displayed,
	52	or the message:
	53
	54	duplicates rule \fIn\fR
	55
	56	is given, where \fIn\fR is an identical rule already produced,
	57	and so the new rule is not added, or the message:
	58
	59	too inaccurate
	60
	61	is given, indicating that the pessimistic error rate of the
	62	pruned rule is more than 50%, or more than the proportion of
	63	the items that are of the rule's class, and so the rule is
	64	not added.
	65
	66
	67	.SH RULE SIFTING
	68
	69	.B Verbosity level 1
	70
	71	The set of pruned rules for each class is then examined.
	72	Starting with no rules in the ruleset, the following
	73	process is repeated until no rules can be added or dropped.
	74	.IP " 1." 7
	75	If there are rules whose omission would not lead
	76	to an increase in the number of items misclassified,
	77	then the least useful of these is dropped.
	78	.IP " 2."
	79	Otherwise, if there are rules which lead to a decrease
	80	in the number of items misclassified, then the one
	81	with the least counterexamples is added.
	82	.TP 0
	83	This is shown in the output as:
	84
	85	Action - the number of the rule added or dropped
	86	Change - the advantage attributable to the rule
	87	Worth - the included rules for this class as:
	88
	89	.IR n1 [ n2 \| n3 =
	90	.IR r1 ]
	91
	92	with:
	93	.IP " \fIn1\fR" 11
	94	- the rule number
	95	.IP " \fIn2\fR"
	96	- the number of items that correctly
	97	fire this rule and are not covered by any other included rule
	98	.IP " \fIn3\fR"
	99	- the number of items that incorrectly
	100	fire this rule and are not covered by any other included rule
	101	.IP " \fIr1\fR
	102	- the advantage attributable to the
	103	rule
	104	.HP 0
	105	After the rules have been sifted, the number of items of
	106	each class that are not covered by any rules is shown,
	107	and the default class is set to the class with the most
	108	uncovered items.
	109
	110
	111	.B Verbosity level 2
	112
	113	When sifting rules for a particular class, the Worth of each rule
	114	which is for that class but not included in the ruleset,
	115	is shown at each stage of the process.
	116
	117	.SH RULE SORTING
	118
	119	.B Verbosity level 1
	120
	121	The rules that are left are then sorted, starting with those
	122	that are for the class with the least number of false positives.
	123	The verbose output shows the number of false positives for each
	124	class (i.e. the number of items misclassified as being of this
	125	class).
	126	Within a class, rules with the greatest advantage are put first.
	127
	128	.SH RULESET EVALUATION
	129
	130	.B Verbosity level 3
	131
	132	When evaluating a ruleset, shown are the attribute values,
	133	given class and class given by the ruleset for each
	134	item that is misclassified.
	135
	136
	137	.SH SEE ALSO
	138
	139	c4.5(1), c4.5rules(1)

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format