Declustering using fractals

Faloutsos, Christos; Bhagwat, P.

doi:10.1109/pdis.1993.253077

Cited by 101 publications

(67 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this work, the assignment of input chunks to the disks was done using a Hilbert curve based declustering algorithm [15]. Hilbert curve algorithms have been shown to achieve good I/O parallelism for multi-dimensional datasets.…”

Section: Datasetsmentioning

confidence: 99%

Optimizing Reduction Computations In a Distributed Environment

Kurç

Lee

Agrawal

et al. 2003

Proceedings of the 2003 ACM/IEEE Conference on Supercomputing

View full text Add to dashboard Cite

We investigate runtime strategies for data-intensive applications that involve generalized reductions on large, distributed datasets. Our set of strategies includes replicated filter state, partitioned filter state, and hybrid options between these two extremes. We evaluate these strategies using emulators of three real applications, different query and output sizes, and a number of configurations. We consider execution in a homogeneous cluster and in a distributed environment where only a subset of nodes host the data. Our results show replicating the filter state scales well and outperforms other schemes, if sufficient memory is available and sufficient computation is involved to offset the cost of global merge step. In other cases, hybrid is usually the best. Moreover, in almost all cases, the performance of the hybrid strategy is quite close to the best strategy. Thus, we believe that hybrid is an attractive approach when the relative performance of different schemes cannot be predicted.

show abstract

Section: Datasetsmentioning

confidence: 99%

Optimizing Reduction Computations In a Distributed Environment

Kurç

Lee

Agrawal

et al. 2003

Proceedings of the 2003 ACM/IEEE Conference on Supercomputing

View full text Add to dashboard Cite

show abstract

“…Several methods have been proposed for declustering data, including Disk Modulo [12], Field-wise Exclusive OR [29], Hilbert [13], Near-Optimal Declustering [5], General Multidimensional Data Allocation [27], cyclic allocation schemes [36], [37], Golden Ratio Sequences [7], Hierarchical Declustering [6], and Discrepancy Declustering [9]. Using declustering and replication, approaches including Complete Coloring [20] have optimal performance and Square Root Colors Disk Modulo [20] has one more than optimal.…”

Section: Introductionmentioning

confidence: 99%

“…Given the established bounds on the extra cost and the impossibility result, a large number of declustering techniques have been proposed to achieve performance close to the bounds either on the average case [5], [12], [13], [14], [16], [22], [24], [25], [29], [31], [36], [37] or, in the worst case, [3], [6], [7], [9], [41]. Although initial approaches in the literature were originally for relational databases or Cartesian product files, recent techniques focus more on spatial data declustering.…”

Section: Introductionmentioning

confidence: 99%

Analysis and Comparison of Replicated Declustering Schemes

Tosun

2007

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-Declustering distributes data among parallel disks to reduce the retrieval cost using I/O parallelism. Many schemes were proposed for the single-copy declustering of spatial data. Recently, declustering using replication gained a lot of interest and several schemes with different properties were proposed. An in-depth comparison of major schemes is necessary to understand replicated declustering better. In this paper, we analyze the proposed schemes, tune some of the parameters, and compare them for different query types and under different loads. We propose a three-step retrieval algorithm for the compared schemes. For arbitrary queries, the dependent and partitioned allocation schemes perform poorly; others perform close to each other. For range queries, they perform similarly with the exception of smaller queries in which random duplicate allocation (RDA) performs poorly and dependent allocation performs well. For connected queries, partitioned allocation performs poorly and dependent allocation performs well under a light load.

show abstract

“…Efficient access to data also depends on how well the data has been distributed across multiple storage nodes. The goal of declustering [9,14] is to distribute the data across as many storage units as possible so that data elements that satisfy a query can be retrieved from many sources in parallel. Caching is yet another optimization that targets multiple query workloads [1,10,19,21].…”

Section: Introductionmentioning

confidence: 99%

Servicing range queries on multidimensional datasets with partial replicas

Weng

Çatalyürek

Kurç

et al. 2005

CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005.

View full text Add to dashboard Cite

Partial replication is one type of optimization to speed up execution of queries submitted to large datasets. In partial replication, a portion of the dataset is extracted, re-organized, and re-distributed across the storage system. The objective is to reduce the volume of I/O and increase I/O parallelism for different types of queries and for the portions of the dataset that are likely to be accessed frequently. When multiple partial replicas of a dataset exist, query execution plan should be generated so as to use the best combination of subsets of partial replicas (and possibly the original dataset) to minimize query execution time.In this paper, we present a compiler and runtime approach for range queries submitted against distributed scientific datasets. A heuristic algorithm is proposed to choose the set of replicas to reduce query execution. We show the efficiency of the proposed method using datasets and queries in oil reservoir simulation studies on a cluster machine.

show abstract

Declustering using fractals

Cited by 101 publications

References 23 publications

Optimizing Reduction Computations In a Distributed Environment

Optimizing Reduction Computations In a Distributed Environment

Analysis and Comparison of Replicated Declustering Schemes

Servicing range queries on multidimensional datasets with partial replicas

Contact Info

Product

Resources

About