If you have a hanging server but no useful errors are shown in /var/log/messages, there is a way to find the reason by installing and configuring kdump – it will create a core dump file when the server hangs.
In some cases, kdump could not help on XEN PV machines; however, we recommend giving it a try because, on regular servers, it is doing a really good job. Here are the steps to install and configure kdump:
- Install `kexec-tools`:
yum install kexec-tools
Edit /etc/kdump.conf, and set the path variable to point to a directory with enough space to hold the kernel dump file (default location is `/var/crash/`). The file size will be about the size of the server RAM + 1GB.
- Edit /etc/grub.conf.
- For CloudLinux 6, add to the kernel line as another boot parameter or modify the existing one:crashkernel=160M (or crashkernel=auto)
- For CloudLinux 7/8/9, edit `/etc/default/grub` and add `crashkernel=auto` to GRUB_CMDLINE_LINUX parameter (or modify the existing one) so it should look like this:GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet"
Here is the link to the official RHEL recommendation regarding the crashkernel value: https://access.redhat.com/solutions/916043
- Rebuild the grub config file with the following command:grub2-mkconfig -o /boot/grub2/grub.cfg
-
For CloudLinux 6 - add kdump to chkconfig and turn it On during boot:
chkconfig --add kdump chkconfig kdump on
- Modify `/etc/sysctl.conf` file and add the following block to catch all possible panic states:
# Enable reboots on panic to allow kdump make dumps kernel.sysrq=1 kernel.hung_task_panic = 1 kernel.panic = 1 kernel.panic_on_io_nmi = 1 kernel.panic_on_oops = 1 kernel.panic_on_stackoverflow = 1 kernel.panic_on_unrecovered_nmi = 1 kernel.softlockup_panic = 1 kernel.unknown_nmi_panic = 1
kernel.hardlockup_panic = 1 - Reboot.
After the server boots, check if kdump is running with:
service kdump status
The way to obtain coredump if the server hangs is described here.
Comments
0 comments
Please sign in to leave a comment.