RESEARCH COMMONS
LIBRARY

Scalable Parallel Processing Of Multi-objective Optimized DNA Sequence Assembly

ResearchCommons/Manakin Repository

Scalable Parallel Processing Of Multi-objective Optimized DNA Sequence Assembly

Show simple item record

dc.contributor.author Ahmed, Munib en_US
dc.date.accessioned 2012-04-11T20:55:19Z
dc.date.available 2012-04-11T20:55:19Z
dc.date.issued 2012-04-11
dc.date.submitted January 2011 en_US
dc.identifier.other DISS-11365 en_US
dc.identifier.uri http://hdl.handle.net/10106/9546
dc.description.abstract Bioinformatics is an emerging branch of science where issues pertaining to molecular biology are evaluated and resolved by leveraging the techniques and algorithms devised in the field of computer science. Most of these issues are due to the enormous amount of data and the computational complexity involved in generating expeditious and qualitatively viable solutions. This poses a challenge to the algorithm developers who must strive to achieve multiple conflicting objectives of processing very large dataset with the highest accuracy possible while keeping the execution time to a minimum. Genome assembly is one such problem in bioinformatics where a DNA sequence is reconstructed using millions of small fragments of DNA that are produced in the laboratory as a result of sequencing process. When examined purely as data, these fragments are small in size (< 103 characters long) but large in numbers, have repetitive regions which exacerbates the complexity of the reconstruction algorithms, and contain erroneous data due to imperfect laboratory procedures. This dissertation takes a holistic approach to resolve these issues by first presenting a comprehensive study of contemporary work, highlighting its strengths and weaknesses while proposing improvements wherever needed, followed by the design and implementation of a new parallel framework. With the extra processing power available in a parallel computing environment, this framework enhances accuracy of the solution by correcting errors in the low quality data regions and improves the speedup by dynamically balancing the load among multiple processors and by utilizing innovative data structures along with a hashing technique that require lesser memory compared to other contemporary programs. One of the chief objectives of this work is to carve out an important and sizeable piece of the DNA sequence assembly process, device a new parallel algorithm, and provide its modular implementation in order to facilitate the scalability analysis and parametric study of various characteristics and interdependencies of multiple conflicting objectives such as speedup, accuracy, and data size. A comparison between experimental and theoretical statistics of the system explains similarities or deviations and their causes and effects. This research work and the underlying approach can be easily extended to other related areas of bioinformatics, including multiple sequence alignment and phylogenetics, using parallel computing. en_US
dc.description.sponsorship Ahmad, Ishfaq en_US
dc.language.iso en en_US
dc.publisher Computer Science & Engineering en_US
dc.title Scalable Parallel Processing Of Multi-objective Optimized DNA Sequence Assembly en_US
dc.type Ph.D. en_US
dc.contributor.committeeChair Ahmad, Ishfaq en_US
dc.degree.department Computer Science & Engineering en_US
dc.degree.discipline Computer Science & Engineering en_US
dc.degree.grantor University of Texas at Arlington en_US
dc.degree.level doctoral en_US
dc.degree.name Ph.D. en_US

Files in this item

Files Size Format View
Ahmed_uta_2502D_11365.pdf 3.524Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record

Browse

My Account

Statistics

About Us