News


Here is all the latest news from the UCF ARCC:


Spring Maintenance cycle downtime, May 6 - May 12

Stokes and Newton will be taken down per our bi-annual routine maintenance cycle during early May. Specifically, the clusters will be unavailable from the morning of Monday, May 6 through the evening of Sunday, May 12.

Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes and Newton prior to the downtime.

The Newton GPU cluster has expanded!

Today we brought another 10 nodes online on our Newton GPU cluster! There are now 40 GPUs available on that cluster. These 20 nodes each have two NVidia V100 GPU cards in them. Twenty of these GPUs have 16 GB on board the card (evc1-evc10), and the other twenty have 32 GB (evc11-evc20). These resources were made possible via two awards received by the ARCC to support education and research at UCF. The ARCC continues to do our best to support the university's growing computational needs.

Changes due to Fall 2018 Maintenance Downtime

Stokes and Newton have returned to operation! Please remember that we have two such scheduled maintenance downtimes per year, one in late Fall (the one we just completed) and one in late Spring.

Please take a moment to read over the changes, some of which *will* affect usage:

  1. We have upgraded SLURM and changed the SLURM configuration in the following key ways:

    • We slightly reduced the amount of available memory on each node to mitigate some node thrashing problems we have been having. This means that some submission requests you had before that explicitly requested the maximum amount of memory will give you an error now. Issue the following call to see how much memory is now available for each node:
       sinfo -Nel -p normal
    • Because of the above we had to slightly reduce the default memory request per CPU from 1990 MB to 1950 MB. This probably will not affect most people; however, it is possible that jobs that used *almost all* the memory before but not quite will be impacted. Submit a ticket request if you have such a case, and we will discuss how to address this.
    • . Stokes and Newton will no longer permit direct ssh to compute nodes unless you have that node allocated by a job. When you *do* ssh to a node for which you have a job, that ssh session will be automatically added to that job, and the session will be killed when the job ends. This was to address problems we have been having with rogue processes remaining after a job completed that impacted other users.

  2. Several pieces of software that had been marked "deprecated" during a previous downtime were placed in an "unsupported" state to discourage their use. Contact us if you have a critical need for these, and we will either help you migrate to a newer solution or explain how you can gain access to older, unsupported modules. In all cases, there were six months to a year's notice for these. The unsupported software builds include:
    gcc-4.5.2
    impi-2017.2.174
    migrate-3.6.11-impi-2017-ic-2017

  3. Several pieces of software are or were marked "deprecated" to be removed in a *later* downtime. These deprecated software builds include:
    gcc-4.9.2 and all software built with it
    ic-2013.0.079 and all software built with it
    ic/ic-2015.1.133 and all software built with it
    ic/ic-2015.3.187 and all software built with it
    All versions of openmpi lower than 2.0 and all software built with it
    jdk/jdk-1.8.0_025 and all software built with it
    jdk/jdk-1.8.0_112 and all software built with it
    jdk/jdk-1.8.0_131 and all software built with it
    cuda/cuda-8.0
    matlab-R2014a
    petsc/petsc-3.7.3-lapack-3.5.0-openmpi-1.8.6-ic-2015.3.187
    petsc/petsc-3.7.5-lapack-3.5.0-openmpi-2.0.1-ic-2017.1.043

  4. Several new build tools were installed. Use "module avail" to see these. The new software includes:
    cuda-10.0
    openmpi-4.0.0  
    ic-2019.1.144  (Intel Parallel Studio 2019)

  5. The Newton GPU cluster was re-racked into a single rack and switched over to 60-amp power. This is the first step needed to expand Newton in January. This should not impact users.

  6. Several internal tools that we use for managing the clusters were upgraded, including our environment module system (lmod). This should not impact users.

Fall Maintenance Cycle (December 10-16, 2018)

Stokes and Newton will be taken down per our bi-annual routine maintenance cycle during mid-December. Specifically, the clusters will be unavailable from the morning of Monday, December 10 through the morning of Monday, December 17.

The primary objective during this downtime is to upgrade our scheduler, SLURM, and to change some of its default options. There will also be some changes made to the Python installs to bring more consistency across versions. We will provide more detail in the change log when we bring the system back online.

Recall that we now routinely bring the system down twice a year, once in late Fall and once in late Spring. We will keep the users notified in advance of such downtimes, but we recommend you build such expectations into your workflow. Though we anticipate no data loss during this time, it's never a bad idea to backup your materials. So we suggest you use this opportunity to copy salient data and code off of Stokes prior to the downtime.

About the UCF ARCC:
The University of Central Florida (UCF) Advanced Research Computing Center is managed by the Institute for Simulation and Training, with subsidies from the UCF Provost and Vice President for Research and Commercialization, for the use by all UCF faculty and their students. Collaboration with other universities and industry is also possible.
Connect with Us!
Contact Info:
UCF Advanced Research Computing Center
3039 Technology Parkway, Suite 220
Orlando, FL 32826
P: 407-882-1147
Request Help