News


Here is all the latest news from the UCF ARCC:


Power loss in research park, ARCC outage

Last night (Thursday, September 26) at 11:30p, a commercial power loss in research park caused an outage in the Partnership buildings, where our clusters reside. The outage lasted 2.5 hours. The good news is that our system functioned precisely as designed: Our UPS held the load of the cluster for the designed length of time, and the core infrastructure elements (e.g., file system, infiniband backbone) were protected the entire time via generator.

But the bad news is that the length of the outage exceeded our UPS's designed limits, which means that almost all compute nodes lost power. All jobs on Stokes, and many jobs on Newton were lost. We apologize for the inconvenience.

The clusters are back up and accessible now. Queued jobs have deployed and are running.

Stokes and Newton back up after Dorian

ARCC resources have been brought back online now that hurricane Dorian has passed.

.

ARCC Shutting down for Dorian

Stokes and Newton will be shutdown this SATURDAY in advance of hurricane Dorian. Please plan accordingly and stay safe.

Addendum: Our clusters are DOWN for the duration of the storm. We will keep you posted.

Changes due to Spring 2019 Maintenance Downtime

Stokes and Newton have returned to operation two days early! Please remember that we have two such scheduled maintenance downtimes per year, one after Fall term and one after Spring term (the one we just completed).

Please take a moment to read over the changes, most of which will not affect usage:

  1. We increased the default inode count for user accounts from 400K files to 1 million files.
  2. We upgraded our NVIDIA GPU drivers to the latest versions and made the CUDA 10.1 module available to users.
  3. We repaired a minor problem that was preventing SLURM from sending email about jobs when configured to do so.
  4. We revised our naming standards for ARCC email addresses so that they were more general and more consistent with our other standards. The most relevant change that affects users is as follows. For the new address, check you email or go here. The old email address will redirect to this address for some time in the future; however, users should transition to the new address.
  5. Some minor configuration changes and testing were performed on our NFS server to address some performance issues. This is not the shared file system where user data resides, but is instead the file system where the modules and applications are stored. The system benchmarks acceptably now.
  6. We completed our regular benchmarking of the compute, file system, and network performance. All show the clusters operating as expected.
  7. A review of our anaconda installs indicated a number of errors. We have temporarily *removed* anaconda from our applications and modules list while we reconstruct these applications. This means that our JupyterHub gateways are also temporarily down. These will all be re-deployed soon.

We appreciate your patience and wish you the best with your research!

Paul & Glenn.

About the UCF ARCC:
The University of Central Florida (UCF) Advanced Research Computing Center is managed by the Institute for Simulation and Training, with subsidies from the UCF Provost and Vice President for Research and Commercialization, for the use by all UCF faculty and their students. Collaboration with other universities and industry is also possible.
Connect with Us!
Contact Info:
UCF Advanced Research Computing Center
3039 Technology Parkway, Suite 220
Orlando, FL 32826
P: 407-882-1147
Request Help