News


Here is all the latest news from the UCF ARCC:


Fall 2021 UCF ARCC Maintenance Outage

UCF ARCC Maintenance Outage: Thursday, December 9 through Wednesday, December 15, 2021

Stokes and Newton will be taken down per our bi-annual routine maintenance cycle near the end of Summer term. Specifically, the clusters will be unavailable from Thursday, December 9 through Wednesday, December 15.

The primary objectives during this downtime is to install a new, faster high-speed Infiniband network.

Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring/Summer. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes and Newton prior to the downtime.

SLURM Jobs marked as “Requested Node not available, Reserved for maintenance”?

In preparation for the maintenance windows, the ARCC staff place a reservation on compute nodes starting at the beginning of the maintenance window. This ensures that jobs are not running when the staff need to begin the work to shut down the cluster. If you submit a job requesting more time than there is between the time the job is submitted and when the reservation begins, your job will stay in the queue with the status Requested Node not available, Reserved for maintenance. Jobs in this state will not start until after the maintenance window. If you believe your job can finish before the maintenance window, cancel the job and resubmit with a shorter time request. As with all resource requests, providing reasonable estimates to SLURM for run time will ensure the most efficient operation of the cluster.

Summer 2021 Maintenance Downtime: August 16-22, 2021

UCF ARCC Maintenance Outage: Monday, August 16 through Sunday, August 22, 2021

Stokes and Newton will be taken down per our bi-annual routine maintenance cycle near the end of Summer term. Specifically, the clusters will be unavailable from Monday, August 16 through Sunday, August 22

The primary objectives during this downtime is to complete some re-wiring we need to do, install some new support equipment, and update some software and data on the clusters.

Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring/Summer. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes and Newton prior to the downtime.

SLURM Jobs marked as “Requested Node not available, Reserved for maintenance”?

In preparation for the maintenance windows, the ARCC staff place a reservation on compute nodes starting at the beginning of the maintenance window. This ensures that jobs are not running when the staff need to begin the work to shut down the cluster. If you submit a job requesting more time than there is between the time the job is submitted and when the reservation begins, your job will stay in the queue with the status Requested Node not available, Reserved for maintenance. Jobs in this state will not start until after the maintenance window. If you believe your job can finish before the maintenance window, cancel the job and resubmit with a shorter time request. As with all resource requests, providing reasonable estimates to SLURM for run time will ensure the most efficient operation of the cluster.

Changes due to ARCC Fall 2020 Maintenance Cycle

Stokes and Newton have returned to operation early! Please remember that we have two such scheduled maintenance downtimes per year, one after Fall term (the one we just completed) and one after Spring term.

Please take a moment to read over the changes:

  1. We updated SLURM (our job scheduler) to version 20.11.0.
  2. We updated the Nvidia driver on the Newton compute nodes.
  3. We updated the default compiler flags for the Intel Compiler to reflect the hardware currently used within the clusters
  4. We cleaned curated data sets so only copy is in /datasets (copies in /groups areas were removed and linked).
  5. We processed old resource expansion requests that had expired (some had expired, but resource limits had not yet been put back to default limits for some users; they now have).
  6. We upgraded some infrastructure wiring that had been pending.

We appreciate your patience and wish you the best with your research!

Fall 2020 Maintenance Window

UCF ARCC Maintenance Outage: Friday, December 11, 2020 - Friday, December 18, 2020:

Stokes and Newton will be taken down per our bi-annual routine maintenance cycle near the end of Fall term. Specifically, the clusters will be unavailable from Friday, December 11 through Friday, December 18.

The primary objectives during this downtime is to complete some re-wiring we need to do, install some new support equipment, and update some software and data on the clusters.

Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes and Newton prior to the downtime.

SLURM Jobs marked as “Requested Node not available, Reserved for maintenance”?

In preparation for the maintenance windows, the ARCC staff place a reservation on compute nodes starting at the beginning of the maintenance window. This ensures that jobs are not running when the staff need to begin the work to shut down the cluster. If you submit a job requesting more time than there is between the time the job is submitted and when the reservation begins, your job will stay in the queue with the status Requested Node not available, Reserved for maintenance. Jobs in this state will not start until after the maintenance window. If you believe your job can finish before the maintenance window, cancel the job and resubmit with a shorter time request. As with all resource requests, providing reasonable estimates to SLURM for run time will ensure the most efficient operation of the cluster.

About the UCF ARCC:
The University of Central Florida (UCF) Advanced Research Computing Center is managed by the Institute for Simulation and Training, with subsidies from the UCF Provost and Vice President for Research and Commercialization, for the use by all UCF faculty and their students. Collaboration with other universities and industry is also possible.
Connect with Us!
Contact Info:
UCF Advanced Research Computing Center
3039 Technology Parkway, Suite 220
Orlando, FL 32826
P: 407-882-1147
Request Help