In the remainder of the report, we (1) describe the computing resources used for this analysis; (2) outline the methods used for data preprocessing and for running the two regressions; (3) discuss the levels of parallelism exploited, as well as the advantages and challenges encountered; (4) and (5) provide the scientific and performance results of the analysis; (6) discuss the meaning of the scientific results; and (7) conclude with a discussion of the operational value of these results and a description of the next steps in this research project.
Computing Resources
For this project, we used the resources of the Computational and Information Systems Laboratory (CISL) at the National Center for Atmospheric Research (NCAR). Specifically, we used the Yellowstone and Geyser clusters.
Yellowstone has 72,576 2.5 GHz cores spread across 4,536 nodes. Each core is a Sandy Bridge 2.6 GHz Intel Xeon E5-2670 processor with Advanced Vector Extensions (AVX). Yellowstone has 144.58 TB of total memory, but restricts each node to 25 GB. The Geyser cluster is better suited for heavy data analysis tasks and is thus the cluster that we primarily relied on for this project. Each of Geyser's 16 High-memory nodes has 1 TB of available memory. Each node contains four 10-core, 2.4-GHz Intel Xeon E7-4870 processors. Further information about Yellowstone and Geyser can be found at
https://www2.cisl.ucar.edu/resources/computational-systems/yellowstone and
https://www2.cisl.ucar.edu/resources/computational-systems/geyser-and-caldera, respectively.
Computing time on Yellowstone is charged per node hour, while computing time on Geyser was charged per core-hour. Since our project involved processing large GeoTIFF Files, we were often not able to fully utilize the 16 cores per Yellowstone node while simultaneously staying beneath the 25 GB limit. Thus, we spent the bulk of our computing time on Geyser.