| 19 | |
| 20 | == Goals == |
| 21 | |
| 22 | Image processing for OCR and content extraction |
| 23 | |
| 24 | Implies a set of costly operations like: |
| 25 | - Binarization – the resulted image is formed by two classes separated by a threshold t : foreground and background |
| 26 | - Noise detection |
| 27 | - Clustering for creating meta-entities (ex: characters, lines) |
| 28 | - Detection and/or classification of page components: line, table, character, frame, image, etc |
| 29 | - Skew Detection and correction |
| 30 | - Geometrical distortions correction |
| 31 | - And the list may continue |
| 32 | For this project our aim was to implement and parallelize methods for image binarization and to try different solutions to improve our results and to remove noise: |
| 33 | Otsu 1D |
| 34 | Otsu 2D |
| 35 | Niblack |
| 36 | Color Uniformization |