A Random Walk Approach To Sampling Hidden Databases

ResearchCommons/Manakin Repository

A Random Walk Approach To Sampling Hidden Databases

Show simple item record Dasgupta, Arjun en_US 2007-08-23T01:56:03Z 2007-08-23T01:56:03Z 2007-08-23T01:56:03Z April 2007 en_US
dc.identifier.other DISS-1678 en_US
dc.description.abstract A large part of the data on the World Wide Web is hidden behind form-like interfaces. These interfaces interact with a hidden back-end database to provide answers to user queries. Generating a uniform random sample of this hidden database by using only the publicly available interface gives us access to the underlying data distribution. In this thesis, we propose a random walk scheme over the query space provided by the interface to sample such databases. We discuss variants where the query space is visualized as a fixed and random ordering of attributes. We also propose techniques to further improve the sample quality by using a probabilistic rejection based approach and conduct extensive experiments to illustrate the accuracy and efficiency of our techniques. en_US
dc.description.sponsorship Das, Gautam en_US
dc.language.iso EN en_US
dc.publisher Computer Science & Engineering en_US
dc.title A Random Walk Approach To Sampling Hidden Databases en_US
dc.type M.S. en_US
dc.contributor.committeeChair Das, Gautam en_US Computer Science & Engineering en_US Computer Science & Engineering en_US University of Texas at Arlington en_US masters en_US M.S. en_US
dc.identifier.externalLinkDescription Link to Research Profiles

Files in this item

Files Size Format View
umi-uta-1678.pdf 424.8Kb PDF View/Open
424.8Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record


My Account


About Us