|
Title:
|
A Random Walk Approach To Sampling Hidden Databases
|
|
Author:
|
Dasgupta, Arjun
|
|
Abstract:
|
A large part of the data on the World Wide Web is hidden behind form-like interfaces. These interfaces interact with a hidden back-end database to provide answers to user queries. Generating a uniform random sample of this hidden database by using only the publicly available interface gives us access to the underlying data distribution. In this thesis, we propose a random walk scheme over the query space provided by the interface to sample such databases. We discuss variants where the query space is visualized as a fixed and random ordering of attributes. We also propose techniques to further improve the sample quality by using a probabilistic rejection based approach and conduct extensive experiments to illustrate the accuracy and efficiency of our techniques. |
|
URI:
|
http://hdl.handle.net/10106/96
|
|
Date:
|
2007-08-23 |
|
External Link:
|
https://www.uta.edu/ra/real/editprofile.php?onlyview=1&pid=178
|