URGENT: Wendian Critical Request & Reminders

Hello Wendian users,

Critical request: We have reached 88% capacity on the Wendian scratch partition. Please remove all unnecessary files on Wendian. If the filesystem reaches 95% capacity, we will be purging data >180 days old, per policy.

Announcements and Reminders

Implementation of Monthly Billing for HPC

This is an email reminder that Saturday April 1^st at 12:00am will mark the end of preemption and the beginning of a monthly billing cycles. If you need more information on the charge model, please see: https://ciarc.mines.edu/hpc-business-model.

Quality of Service (QoS) Changes

Quality of Service is a parameter used in Slurm to alter priority access to the job scheduler. Historically, Wendian had two main QoS options, full and normal. The full QoS allowed for jobs to submit to the entire available pool of CPU nodes, while normal QoS was a smaller pool with non-premption. Moving forward, please use the normal QoS. The Full QoS will exist through the month of April and behave identically to normal. At the end of April, the full QoS will be removed; direct your scripts to the normal QoS now to avoid job errors in May. To do this, please add the following to your Slurm scripts:

#SBATCH -q normal

Pricing on HPC

Below is the current table of rates for the new charge model. Though we are charging the same price for low and high memory compute nodes, we will be monitoring usage and reaching out to users that are inefficient with their memory consumption. Your jobs will be routed to the appropriate node, based on your memory request. Please request only the memory that your job requires.

These values will be kept up to date on the CIARC website: https://ciarc.mines.edu/hpc-storage-rates

Node Type	Rate per hour [USD]	CPU core	Memory per CPU core [GB]	GPU
CPU	$0.02	1	5 or 10*	NA
GPU enabled	$0.12**	6	48	1xV100

*There are two types of CPU nodes on Wendian: (1) a “low” memory node of 192 GB, and (2) a “high” memory node of 384 GB node. Jobs will be routed to each of these nodes depending on requested resources.

**For GPU jobs, the V100 node has 4 GPU cards. For each GPU card you request, you automatically must pay for 6 CPU cores and 48 GB memory, since this is ¼ of the available compute resources on the GPU node.

Consultations for improved compute efficiency

We understand that a charge model means that individuals will want to run jobs as efficiently as possible. If you would like to reach out for a consultation on how best to utilize HPC resources, please use the following Help Center ticket request:

https://helpcenter.mines.edu/TDClient/1946/Portal/Requests/ServiceDet?ID=30287

Wendian Scratch Filesystem

Files on /scratch is a short-term shared filesystem for storing data currently necessary for active research projects; this is subject to purge on a six-month (180 day) cycle. No limits (within reason) to the amount of data. Currently Wendian is reaching 88% of its 1 petabyte (1000 TB) storage capacity. Once the filesystem reaches the critical threshold of 95% capacity, access to Wendian will have to cease until the issue is resolved. This policy will remain in place: https://wpfiles.mines.edu/wp-content/uploads/ciarc/docs/pages/policies.html

Classroom Use on Wendian

As a reminder, Wendian HPC usage in classes will not be affected by the new model and ITS will request an annual budget to cover classroom costs. If you are interested in using HPC for your class, you can submit a request here: https://helpcenter.mines.edu/TDClient/1946/Portal/Requests/ServiceDet?ID=38002

If you have further questions, please submit a ticket here.

Best,

HPC@Mines Staff