The application is fully functional / running - automatically reconnect and replication of sessions in case of problems works
Goal of the document:
The primary goal of the document is to describe the procedure according to which IT Ops will be able to restore the full functionality of the application in various types of outages. The second goal is to record the behavior of the application in the mentioned outages.
Abbreviations:
AS - Application server, DB - Database, FS - Filesystem, OS - Operation system, P - Primary, S - Secondary
Activity 01: AS - CPU works on 100% on OS where AS runs
The procedure to be performed to make the application fully functional again
1
Application run control and log monitoring or servlet monitoring. The application runs on both sides of the AS
After releasing the OS resources, the response / functionality of the application should be restored. If this is not the case, a restart of the application is required
2
Use CPU at 100% on the OS where all instances of AS are running. Necessary cooperation of OS admins
3
Login of testers to the application
4
Testing and evaluation of application behavior
Step No.
Activity 02: AS - Filesystem filling where AS runs
The procedure to be performed to make the application fully functional again
1
Application run control and log monitoring or servlet monitoring. The application runs on both sides of the AS
After releasing the FS, a restart of the application is required. External components connect automatically after AS recovery.
2
Login of testers to the application
3
Fill FS to 100%, where all AS instances are running. / Tmp, / logs, / opt, / data should be considered
4
Testing and evaluation of application behavior
Step No.
Activity 03: AS Full disaster (more time consuming):
The procedure to be performed to make the application fully functional again
1
Stop all application components, nodes, and admin servers of the platform.
AS is started automatically at OS startup.
2
Execution of full archiving of the whole environment, standard backup, tar of important mount points
3
OS crash simulation
4
Restore an OS
5
Restore a backup
6
Environment start, connection control to admin console, application start
Scenario 01: DB - Database outgoing (Single instance DB)
The procedure to be performed to make the application fully functional again
1
Application run control and log monitoring or servlet monitoring. The application runs on both sides of the AS
The application has the ability to automatically reconnect to the database.
2
Restart DB
3
Test the basic functionality of the application after booting the DB
Note
For applications that also work in the so-called "closing" mode (so-called EndOfDay processing, etc.) it is necessary to execute the scenario 01 DB during this processing.
The procedure to be performed to make the application fully functional again
1
Application run control and log monitoring
Excalibur Facade has the ability to automatically reconnect to AS
2
Shutting down Excalibur Facade servers while the application is running. Functionality testing. Subsequent launch of Excalibur Facade servers.
3
Test the basic functionality of the application
Step No.
Activity 04: Excalibur Facade (ADDC) - CPU utilization at 100% on the OS where the facade service is running
The procedure to be performed to make the application fully functional again
1
Application run control and log monitoring
After releasing the OS resources, the response / functionality of the component should be restored. If this is not the case, a service restart is required.S
2
Use CPU at 100%. Necessary cooperation of OS admins.
3
Testing and evaluation of application
Step No.
Activity 05: Excalibur Facade (ADDC) - Fill the file system where the facade service is running
The procedure to be performed to make the application fully functional again
1
Application run control and log monitoring
After releasing the FS, a service restart is required.
2
Fill FS to 100%. Necessary cooperation of OS admins.