Data capacity

Forest Advanced Computing and Artificial Intelligence Laboratory (FACAI) at Purdue University has its own high performance computing (HPC) clusters at the RCAC (Rosen Center for Advanced Computing, http://www.rcac.purdue.edu), the research computing arm of Information Technology at Purdue (ITaP), the University’s central IT organization. FACAI provides a large and diverse set of high-performance computing, data-intensive computing, and cloud computing resources, with high-speed network connections to national research wide-area networks, and large data storage and archival systems to the broad research communities at Purdue and elsewhere.

Through a five-year agreement (renewable in 2023), FACAI has access to 1TB-RAM nodes of the Snyder Cluster at RCAC (Table 1). With this powerful cluster, FACAI can process CSV files up to 1TB in size, or RData files up to 300GB in size without resorting to parallel computing. Snyder is a Purdue Community Cluster optimized for data intensive applications requiring large amounts of shared memory per node. Snyder was originally built through a partnership with HP and Intel, and has been most recently expanded with nodes from Dell. Snyder consists of a variety of compute node configurations as shown in the table below. All nodes have 40 Gbps Ethernet connections and a 5-year warranty.

Table 1 Summary of Snyder compute nodes hardware. FACAI has access to Sub-Cluster D, which has 1TB-RAM per node.

FACAI provides high-capacity, high-speed, reliable, and secure data storage service, through the Data Depot—an enterprise-class GPFS storage solution with an initial total capacity of over 2 PB. Currently, FACAI has 3 TB data storage already in place for all on-campus and off-campus collaborators, and can further increase this capacity based on future computing needs and budget.

The centralized data storage with FACAI has the following unique features and benefits:

Availability — FACAI has 3 TB data storage already in place for all on-campus and off-campus collaborators, and purchase in increments of 1 TB at a competitive annual price;

Accessibility — As a Windows or Mac OS X network drive on personal and lab computers on campus. Directly on Community Cluster nodes. From other universities or labs through Globus.

Capability — FACAI facilitates joint work on shared files across our research group, avoiding the need for numerous copies of datasets across individuals’ home or scratch directories. It is an ideal place to store group applications, tools, scripts, and documents.

Controllable Access — Clients have direct control over access management. Purdue Information Technology (ITaP) will create Unix groups for our group and assist in setting appropriate permissions to allow exactly the access one needs and prevent any one does not.

Reliability — Using redundancy control, all data are protected against hardware failures and accidental deletion. All data are mirrored at two different sites on Purdue campus for greater reliability and to protect against physical failures.

Restricted Data — FACAI data storage is suitable for non-HIPAA human-subject data, but is currently not approved for regulated data, including HIPAA, ePHI, or FISMA data.