Parallelizing support vector machines for scalable image annotation

Alham, Nasullah Khalid

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/5452

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Li, M	-
dc.contributor.author	Alham, Nasullah Khalid	-
dc.date.accessioned	2011-06-30T11:13:22Z	-
dc.date.available	2011-06-30T11:13:22Z	-
dc.date.issued	2011	-
dc.identifier.uri	http://bura.brunel.ac.uk/handle/2438/5452	-
dc.description	This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.	en_US
dc.description.abstract	Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments. SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced. The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers. The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications.	en_US
dc.language.iso	en	en_US
dc.publisher	Brunel University School of Engineering and Design PhD Theses	-
dc.relation.uri	http://bura.brunel.ac.uk/bitstream/2438/5452/1/FulltextThesis.pdf	-
dc.subject	Image annotation	en_US
dc.subject	Map reduce	en_US
dc.subject	Machine learning	en_US
dc.subject	SVM	en_US
dc.subject	Distributed computing	en_US
dc.title	Parallelizing support vector machines for scalable image annotation	en_US
dc.type	Thesis	en_US
Appears in Collections:	Electronic and Computer Engineering Dept of Electronic and Electrical Engineering Theses

Files in This Item:

File	Description	Size	Format
FulltextThesis.pdf		2.05 MB	Adobe PDF	View/Open

Show simple item record