Parallelizing support vector machines for scalable image annotation

Alham, Nasullah Khalid

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/5452

Title:	Parallelizing support vector machines for scalable image annotation
Authors:	Alham, Nasullah Khalid
Advisors:	Li, M
Keywords:	Image annotation;Map reduce;Machine learning;SVM;Distributed computing
Issue Date:	2011
Publisher:	Brunel University School of Engineering and Design PhD Theses
Abstract:	Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced. The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers. The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications.
Description:	This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.
URI:	http://bura.brunel.ac.uk/handle/2438/5452
Appears in Collections:	Electronic and Computer Engineering Dept of Electronic and Electrical Engineering Theses

Files in This Item:

File	Description	Size	Format
FulltextThesis.pdf		2.05 MB	Adobe PDF	View/Open

Show full item record