
News
Here is all the latest news from the UCF ARCC:
Stokes Login Node Outages
- Details
- Written by Jamie Schnaitter Jamie Schnaitter
- Published: 24 November 2020 24 November 2020
Changes due to ARCC Summer 2020 Maintenance Cycle
- Details
- Written by R. Paul Wiegand R. Paul Wiegand
- Published: 07 August 2020 07 August 2020
Greetings ARCC users,
Stokes and Newton have returned to operation! Please remember that we have two such scheduled maintenance downtimes per year, one after Fall term and one typically after Spring term (our Summer maintenance was a COVID-related delay).
Please take a moment to read over the changes:
- We retired all the old 12-core nodes (92 in all), and re-racked 28 new nodes. Each of the new nodes has 48 cores. This means we retired 1,104 cores and added 1,344 cores! The new machines have faster networking and a lot more memory (384GB in each node).
- We replaced one of the controllers in our file system and performance tested the file system to ensure that we had resolved some of the performance problems we've had over the Summer. We believe these are resolved.
- We moved all curated datasets to be in /datasets. There is no more /datasets/ImageDataSets.
- We upgraded the OS on all the nodes to CentOS 7.8.2003.
- We upgraded our resource manager, SLURM, to version 20.02.3.
- We upgraded our NVIDIA GPU drivers to the latest versions and made the CUDA 11 module available to users.
- We upgraded some of our internal configuration tools and ran regular benchmarking for compute, file system, and network performance; all show the clusters are operating as expected.
We appreciate your patience and wish you the best with your research!
Glenn & Paul.Stokes & Newton Down for A/C repair 4/28 -- 5/1
- Details
- Written by R. Paul Wiegand R. Paul Wiegand
- Published: 27 April 2020 27 April 2020
As indicated in our many listserv messages and on our Facebook page, the ARCC must take Stokes and Newton down from Tue.28.Apr - Fri.1.May so that the A/C system can be replaced.
Power loss in research park, ARCC outage
- Details
- Written by R. Paul Wiegand R. Paul Wiegand
- Published: 16 January 2020 16 January 2020
Unfortunately, there was a power loss in the Partnership III building last night, Thur.16.Jan 1:15a - 3:15a. The good news is that the infrastructure functioned as it was designed: The critical servers (file system, management, etc.) remained up and functioning, and the UPS performed as it was designed. The bad news is that the time exceeded our UPS limits, so nearly all compute nodes powered off -- the jobs were lost.
We are working to bring the nodes back online now. We apologize for the inconvenience and appreciate your patience.