In this project, I created and augmented a dataset from a number of given images to train and test convolutional neural network which is used to classify five classes of images of scanned documents. In order to generate the dataset, some image processing techniques were applied such as sliding-window, rotating, flipping and pyramid-sizing. The result of this phase is a set of images having same size 244x224x3. These images after being labeled were divided into three dataset for training, validating and testing the network.
The network is a simple convolution neural network which is also called LeNet. It has three convolutional layers and one fully connected layer. After being trained and validated, the best state of the network was pointed out and tested on the testing dataset and some real images. The result showed that the LeNet was able to classify images of documents in a pretty high accuracy. At the end of the project, I modified the network and discussed the affect that those changes had on the network with the purpose of creating another similar network which can perform better than the original one. The result proved that it worked a little better than its original version.
Inhaltsverzeichnis (Table of Contents)
- Introduction
- Context
- About ICTLab
- ARCHIVES project
- Internship context
- Report organization
- Context
- State of the art
- Artificial intelligence & machine learning
- Artificial neural network (ANN)
- History
- Regular neural network
- Convolutional neural network (LeNet)
- Training and evaluating
- Contribution
- Data creation and augmentation
- ARCHIVES dataset
- Creating data
- Augmenting the data
- Preparing data
- Constructing the convolution neural network (LeNet)
- The model
- Training
- Validation and testing
- Developing the network
- Data creation and augmentation
- Results
- The basic network
- Testing on the dataset
- Testing on real images
- The network modifications
- Fully connected layer
- Convolutional layers
- The new network
- The basic network
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This internship report focuses on the application of convolutional neural networks (CNNs) for document classification. The primary objective is to develop and evaluate a CNN model capable of accurately classifying scanned documents into five distinct categories.
- Image processing techniques for dataset generation
- CNN architecture and training methodology
- Evaluation and analysis of model performance
- Network optimization and development
- Application of CNNs in document classification
Zusammenfassung der Kapitel (Chapter Summaries)
The report begins with an introduction to the project's context, highlighting the ARCHIVES project and its significance in document classification. Chapter 2 provides a comprehensive overview of artificial intelligence, machine learning, and particularly convolutional neural networks. This chapter delves into the history of neural networks, the structure of regular neural networks, and the specific architecture of LeNet, the chosen CNN model for this project. Chapter 3 details the creation and augmentation of the dataset, including image processing techniques like sliding window, rotating, flipping, and pyramid-sizing. The chapter also elaborates on the construction of the LeNet network, its training process, and validation and testing methods. Finally, Chapter 4 presents the results of the network's performance, both on the generated dataset and on real images. It further explores the impact of modifications to the network, including changes to the fully connected and convolutional layers, leading to the development of a new, improved network.
Schlüsselwörter (Keywords)
This internship report focuses on the application of convolutional neural networks (CNNs), image processing techniques, document classification, dataset creation, and model optimization for achieving high accuracy in document classification tasks. The project employs a LeNet architecture for training and evaluation, utilizing techniques like sliding window, rotating, flipping, and pyramid-sizing for data augmentation. The research explores the impact of network modifications, aiming to improve the performance of the CNN model.
- Quote paper
- Tai Doan (Author), 2016, Convolutional Neural Network in classifying scanned documents, Munich, GRIN Verlag, https://www.hausarbeiten.de/document/349852