As we all know, Exascale computers use millions of processors that generate data at a rate of terabytes per second. It is impossible to store the data generated at such a speed. Methods such as dynamic data reduction through summarization, subset selection, and more sophisticated methods of dynamic pattern identification will be needed to reduce the volume of data. And even the small volume needs to be stored at the same rate as it is generated to proceed without interruption. This requirement will present new challenges for moving data from a supercomputer to local and remote storage systems. Data distribution must be integrated into the data generation phase. The problem of large-scale data movement will become more acute as very large data sets and subsets are shared by large scientific communities, this situation requires a large amount of data to be replicated or moved from production to analysis machines which sometimes found in a large area. Although network technology has improved significantly with the introduction of optical connectivity, transmission of large volumes of data will experience temporary failures and automatic recovery tools will be required. Another key requirement is the automatic allocation, usage and release of storage space. Replicated data cannot be left behind
tags