
News
Here is all the latest news from the UCF ARCC:
Spring 2018 Maintenance Cycle (May 19-27, 2018)
- Details
- Written by R. Paul Wiegand R. Paul Wiegand
- Published: 27 April 2018 27 April 2018
Stokes and Newton will be taken down per our bi-annual routine maintenance cycle during mid-May. Specifically, the clusters will be unavailable from the morning of Saturday, May 19 through the morning of Monday, May 28.
The primary objective during this downtime is to upgrade the underlying operating systems of the cluster to CentOS 7.4. There will also be some changes made to the R installs to bring more consistency across versions. We will provide more detail in the change log when we bring the system back online.
Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes prior to the downtime.
End of ARCC Fall 2017 Maintenance Cycle
- Details
- Written by R. Paul Wiegand R. Paul Wiegand
- Published: 17 December 2017 17 December 2017
Stokes and Newton have returned to operation! Please remember that we have two such maintenance downtimes per year, one in late Fall (the one we just completed) and one in late Spring.
Please take a moment to read over the changes:
- Eight new nodes were installed (ec[49-56]). They all have the new Skylake Intel processors (32 cores per node) and 192 GB of memory.
- We replaced our data transfer node with a new machine.
- The scheduling software was upgraded to SLURM 17.11.0. All commands and scripts syntax should behave as they have before.
- All software built directly against slurm libraries (e.g., certain MVAPICH2 and OpenMPI libraries and software depending on them) are no longer available. Please contact us if you have trouble using an alternative module.
- The file system firmware, Lustre server software, and Lustre client software were upgraded. This should not affect users.
- Our Python builds had a lot of inconsistencies and errors, so we took this opportunity to clean these up and rebuild everything. There are now only four Python modules, so some of you may have to change your scripts to use the new modules. The python modules are:
- python/python-2.7.14-gcc-7.1.0
- python/python-3.6.3-gcc-7.1.0
- python/python-2.7.14-ic-2017.1.043
- python/python-3.6.3-ic-2017.1.043
- The environment module system was replaced with a new system, Lmod. The syntax for this system is the same as the old, so your scripts should not have to change. However, the new system has a lot more functionality. For more information about it, see: http://lmod.readthedocs.io/en/latest/010_user.html
- Because we have a new module system, we had to re-write all our module scripts. The vast majority of modules were replicated in the new system just as they were in the old. If you experience problems with any new modules, please submit a ticket by sending email to req...@ist.ucf.edu (Click on "..." to reveal the address). There were some changes to few modules:
- apache-maven-3.5.0 was renamed to maven-3.5.0
- scalapack-2.0.2-mvapich2-2.2-ic-2017.1.043 was removed
- protobuf-3.1.0-gcc-6.2.0 was removed
- vasp-5.4-openmpi-1.8.6-ic-2015.3.187 was renamed to vasp-5.4-openmpi-1.8.3-ic-2015.3.187
- libdrm-2.4.81-ic-2017 was renamed to libdrm-2.4.81-ic-2017.1.043
- meep-1.3-gcc-4.9.2 was renamed meep-1.3-openmpi-1.8.3-gcc-4.9.2
- partitionfinder-1.1.1-ps1 was renamed partitionfinder-1.1.1-pf1
- petsc-3.5.2-openmpi-1.8.3-ic-2013 was renamed petsc-3.5.2-openmpi-1.8.3-gcc-4.9.2
- qt-4.8.3-gcc-4.9.2 was renamed qt-4.8.3-gcc-6.2.0
- qt-5.8.0-ic-2017 was renamed qt-5.8.0-ic-2017.1.043
- torch-cuda-7.5.18-openblas-0.2.13-ic-2015.3.187 was renamed to torch-cuda-7.5.18-openblas-0.2.13-gcc-4.9.2
- openblas-0.2.13-gcc-6.2.0-openmp was renamed openblas-0.2.13-gcc-6.2.0-useopenmp-build.sh
- armadillo-7.900-1-gcc-6.2.0 renamed to armadillo-7.900.1-gcc-6.2.0
- arlequin-3552 renamed to arlequin-3522
- R-3.4.1-openmpi-2.1.1-gcc-7.1.0 was removed
- pegasus-4.7.4-gcc-7.1.0 was removed
- All slurm-16.05 module variants were removed, as described above in item #4
- The python modules were corrected as described in item #6
Fall maintenance cycle, Dec.9 - Dec.17
- Details
- Written by R. Paul Wiegand R. Paul Wiegand
- Published: 20 November 2017 20 November 2017
Stokes and Newton will be taken down per our bi-annual routine maintenance cycle during mid-December. Specifically, the clusters will be unavailable from the morning of Saturday, December 9 through the morning of Sunday, December 17.
Changes made during downtime will only minimally affect users. We will provide more detail in the change log when we bring the system back online. However, it will include the following: some minor changes to some module names and an upgrade to the latest SLURM.
Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes prior to the downtime.
Brief outage of Stokes management server Fri.18.Aug @ 10a
- Details
- Written by R. Paul Wiegand R. Paul Wiegand
- Published: 16 August 2017 16 August 2017
We need to reboot the Stokes server that is responsible for running our resource manager. We plan to do so this coming Friday at 10a. Jobs that are in the queue will remain in the queue, jobs that are running on compute nodes will continue to run, and you will still be able to login to Stokes and copy files. The only impact will be a brief interruption in your ability to obtain information from the scheduler or to submit jobs (sbatch, squeue, sinfo, and srun will be unavailable).
We appreciate you patience.