If you have a hanging server but no useful errors are shown in /var/log/messages, the way to find the reason is to install and configure kdump, so it will create a core dump file when the server hangs. In some cases kdump could not help on XEN PV machines, however, in any case, you should try it as on regular servers it is really doing a good job. Here are the steps to install and configure kdump:
- Install kexec-tools:
yum install kexec-toolsEdit /etc/kdump.conf, and set path variable to point to a directory with enough space to hold kernel dump file (default location is /var/crash/). File size will be about the size of the server RAM + 1GB.
- Edit /etc/grub.conf.
For CloudLinux 6 add to the kernel line as another boot parameter or modify existing one:
crashkernel=160M (or crashkernel=auto)
For CloudLinux 7 edit /etc/default/grub and add crashkernel=auto to GRUB_CMDLINE_LINUX parameter (or modify existing one) so it should look like this:
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet"Here is the link to the official RHEL recommendation regarding the crashkernel value: https://access.redhat.com/solutions/916043
Rebuild grub config file with the following command:
grub2-mkconfig -o /boot/grub2/grub.cfg
For CloudLinux 6 - add kdump to chkconfig and turn it On during boot:
chkconfig --add kdump chkconfig kdump on
- Modify /etc/sysctl.conf file and add the following block to catch all possible panic states:
# Enable reboots on panic to allow kdump make dumps kernel.sysrq=1 kernel.hung_task_panic = 1 kernel.panic = 1 kernel.panic_on_io_nmi = 1 kernel.panic_on_oops = 1 kernel.panic_on_stackoverflow = 1 kernel.panic_on_unrecovered_nmi = 1 kernel.softlockup_panic = 1 kernel.unknown_nmi_panic = 1
After the server boot check if kdump is running with:
service kdump status
Obtaining coredump if server hangs is described here.