Changes between Version 26 and Version 27 of Parallel-DT


Ignore:
Timestamp:
Jan 18, 2010, 8:30:34 PM (14 years ago)
Author:
andrei.minca
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Parallel-DT

    v26 v27  
    77 * Project Description: Classification is an important data mining problem. One of the most popular algorithms used for classification purposes are decision trees (DT). Since datasets that are used in data mining problems are usually very large, computationally efficient and scalable algorithms are highly desirable. Thus, the project's goal is to parallelize the decision tree inference process. A shared memory programming model using OpenMP is being considered for this task.
    88 * Project Presentation: [https://ncit-cluster.grid.pub.ro/trac/PP2009/attachment/wiki/Parallel-DT/Parallel-DT.ppt]
    9  * Prject Implementation Details: [wiki:Details]
     9 * Project Implementation Details: [wiki:Details]
    1010
    1111''' Serial DT process '''
    1212 Most of the existing induction-based algorithms, also C4.5 that is analysed on this topic, use Hunt's method as the basic algorithm. Here is a recursive description of Hunt's method for constructing a decision tree from a set T of trainning cases with classes denoted {C1, C2, C3, ..., Ck} :
    1313 * ''' Case 1 ''' T contains cases all belonging to a single class Cj. The decision tree for T is a leaf identifying class Cj.
    14  * ''' Case 2 ''' T contains cases that belong to a mixture of classes. A test is chosen, based on a single attribute, that has one or more mutually exclusie outcomes {O1, O2, O3, ..., On}. note that in many implementation n is chosen to be 2 and this leads to a binary decision tree. T is partitioned into subsets T1, T2, ..., Tn, where Ti contains all the cases in T that have outcome Oi of the chosen test. The decision tree for T consists of a decision node identifying the test, and oane branch for each possible outcome. The same tree building machinery is applied recursively to each subset of training cases.
     14 * ''' Case 2 ''' T contains cases that belong to a mixture of classes. A test is chosen, based on a single attribute, that has one or more mutually exclusie outcomes {O1, O2, O3, ..., On}. note that in many implementation n is chosen to be 2 and this leads to a binary decision tree. T is partitioned into subsets T1, T2, ..., Tn, where Ti contains all the cases in T that have outcome Oi of the chosen test. The decision tree for T consists of a decision node identifying the test, and one branch for each possible outcome. The same tree building machinery is applied recursively to each subset of training cases.
    1515 * ''' Case 3 ''' T containes no cases. the decision tree for T is a leaf, but the class to be asociated with the leaf must be determined from information other than T. For example, C4.5 chosses this to be the most frequent class at the parent of this node.
    1616