High Performance Intercloud - Hokkaido University Information Initiative Center

HOKKAIDO UNIVERSITY
TROUBLE AND MAINTENANCE INFORMATION
HOME >

TROUBLE AND MAINTENANCE INFORMATION

A network fault occurred between the storage system and the computing subsystem due to a setting error associated with the installation of another system, and the job being executed at that time terminated abnormally.
The token of the job that terminated abnormally is being rewound.
Sorry for the inconvenience.

Failure date and time: 2019/10/10 19:40
Failure recovery date: 2019/10/10 20:10

We are planning urgent network maintenance on the Interdisciplinary Large-scale Computer System for the following period:

Schedule from 11:00 to 15:00 on Wednesday, October 2, JST

During the above period, connections to the server may fail or the sessions may be interrupted intermittently.
All the systems other than the network will not stop on this maintenance.

* The maintenace has been finished.

JPCERT Coordination Center (JPCERT / CC) is calling for attention that multiple vulnerabilities have been discovered in Apache HTTP Web Server 2.4.

Please consider updating as soon as possible if you are using an Apache HTTP Web Server on a cloud server.

These vulnerabilities may cause:
・ Falsification
・ Redirect to malicious web pages
・ Information leak
・ Damper of service operation (DoS) attack

Please check the following page for details.
https://jvn.jp/vu/JVNVU98790275/
https://httpd.apache.org/security/vulnerabilities_24.html

Affected versions:
* Apache HTTP Web Server version prior to 2.4.41

Note that you can check the version you are currently using with commands such as:
‘httpd -v’ or ‘apachectl -v’

* Depending on the OS, these commands may show the old version number, so please update the OS to the latest state using a command such as ‘yum update’ or ‘apt-get update / apt-get upgrade’.
* Please update the OS carefully because there is a possibility that the system may malfunction.

We will carry out maintenance on the inter-cloudsystem during the following period:

  • Maintenance Period: 9:00am-11:00am on July 8, 2019 JST

During the above period, operations using OpenStack Management Console or OpenStack API may intermittently fail. If these operations fail, please perform again after the maintenance.

There is no impact on running servers.

* The maintenace has been finished.

Supercomputer File system now has an issue, which intermittently causes the flowing problems:

  • – Users cannot log on to the login nodes of Supercomputer system and Application server.
  • – A submitted job cannot be deleted.
  • – A job whose status is shown as “Running” actually does not run.

An emergent maintenance is scheduled on July 8th. In the case when you need to delete your job (but cannot do it), please let us know it via the following e-mail address:

hsay@iic.hokudai.ac.jp

In the case that you fail to log on to the system, please do it again after a brief interval.
We deeply apologize for your inconvenience and appreciate your understanding.

July 8th Follow-up report:
The emergent maintenance has finished, and the status is under monitoring. If you will face the above problems, please let the staff know. Thank you for your understanding.

Due to the issue on the supercomputer storage system, an emergent maintenance will be conducted in the following schedule:

July 8th (Monday), 9:00 – 21:00, 2019.

During the maintenance, Supercomputer system including the login nodes and Application server will stop; users cannot log on them. If it is predicted that a submitted job will not be completed by 7/8 9AM, the job will be queued (not start) and start after the maintenance; if you will set an appropriate elapsed time in your job script, the job will possibly run before the maintenance.

We deeply apologize for your inconvenience and appreciate your understanding.

* The maintenace has been finished.

In the evening on June 20th, the supercomputer system had a hardware trouble, which affected some jobs. Now we are investigating this trouble for the recovery. We apologize for your inconvenience.

(Update) The supercomputer system has been recovered at 0:00 on June 21st.

The following periodic maintenance for the login nodes of Supercomputer system and Application servers are scheduled.
During the maintenance, there will be a possibility that connection to these servers will be failed and that a connected session will be broken.
Batch jobs that will be queued or running on Supercomputer will not be affected by the maintenance.

– Schedule: 9:00 – 12:00, Jun. 17, Aug. 19, Oct. 21, Dec. 16, 2019, Feb. 17, 2020.

The IIC portal may be suspended due to short maintenance after 5:00 p.m. every day.