PERFORMANCE-GUARANTEED DATA REPLICATION FOR DATA-INTENSIVE SCIENTIFIC APPLICATIONS
DOI:
https://doi.org/10.64149/gjaets.12.4.19-23Keywords:
Data Grid, Data Replication, SchedulingAbstract
A Data Grid is composed of multiple interconnected sites organized in a hierarchical structure with one top-level site and several institutional sites. The top-level site functions as the central management unit and is responsible for maintaining the Replica Catalogue (RC), which stores metadata about data files and their replicas across different sites. Each site possesses both computational and storage capabilities, enabling job execution and data storage within the Grid environment. Communication within a site is assumed to have negligible delay due to high internal bandwidth.
In the data replication framework, multiple data files are generated by designated source sites, and a single site may act as the source for more than one file. Since each Grid node has limited storage capacity, it can replicate and store only a limited number of data files. Efficient replication strategies are therefore essential to optimize storage utilization, data availability, and overall system performance. This study focuses on addressing the data file replication problem in such a distributed Grid environment while considering storage constraints and system architecture.
