Version 6 (modified by 14 years ago) (diff) | ,
---|
Parallel-DT
- Short Name: Parallel-DT
- SVN: https://svn-batch.grid.pub.ro/svn/PP2009/proiecte/Parallel-DT
- Team members: Eremia Bogdan, Andrei Minca, Alexandru Sorici
- Project Description: Classification is an important data mining problem. One of the most popular algorithms used for classification purposes
are decision trees (DT). Since datasets that are used in data mining problems are usually very large, computationally efficient and scalable algorithms are highly desirable. Thus, the project's goal is to parallelize the decision tree inference process. A shared memory programming model using OpenMP is being considered for this task.
Serial DT process
Most of the existing induction-based algorithms, also C4.5 that is analysed on this topic, use Hunt's method as the basic algorithm. Here is a recursive description of Hunt's method for constructing a decision tree from a set T of trainning cases with classes denoted {C1, C2, C3, ..., Ck} :
- Case 1 T contains cases all belonging to a single class Cj. The decision tree for T is a leaf identifying class Cj.
- Case 2 T contains cases that belong to a mixture of classes. A test is chosen, based on a single attribute, that has one or more mutually exclusie outcomes {O1, O2, O3, ..., On}
- Case 3
Parallel aproaches
- ...
Project activity
Steps:
- 12 Nov - ...
- Project status: ... - ToDo: ...
Attachments (6)
-
T.JPG (40.3 KB) - added by 14 years ago.
a small training data set
-
F.JPG (23.3 KB) - added by 14 years ago.
steps in creating the decision tree
-
Outlook&Humidity.jpg (44.2 KB) - added by 14 years ago.
Outlook and Humidity attributes
-
SyncronusTreeConstruction-DepthFirstExpansionStrategy.jpg (44.0 KB) - added by 14 years ago.
syncronous treeconstruction
-
PartitionedTreeConstruction.jpg (56.9 KB) - added by 14 years ago.
partitioned tree construction
-
Parallel-DT.ppt (226.0 KB) - added by 14 years ago.
Project Presentation
Download all attachments as: .zip