Friday, October 29, 2010

The case of the read-only file system

Today, I received a notification from Nagios that one of my servers, running Sphinx search engine, was not OK. The message said the RAID controller didn't work well, but this was not the case. Sphinx itself was still running and it served the requests. It just could not update the indices to reflect the new data. I logged in with SSH and tried to run the Nagios plugin manually and received the error:

$ /usr/lib/nagios/plugins/check_hpacucli
/usr/lib/nagios/plugins/check_hpacucli: line 72: /tmp/hpacucli.txt: Read-only file system
rm: cannot remove `/tmp/hpacucli.txt': No such file or directory

Was it really read-only?! /tmp was not a mounted file system, it was the root FS

$ touch 1.txt
touch: cannot touch `1.txt': Read-only file system

It seemed that the root file system had really turned read-only.

$ mount
/dev/mapper/sphinx-root on / type ext4 (rw,errors=remount-ro)
$ cat /etc/fstab
/dev/mapper/sphinx-root /               ext4    errors=remount-ro 0       1

The meaning of the flag errors=remount-ro must be obvious: it remounts the file system in question in read-only mode when some errors may corrupt the FS. I had to find out what might kind of errors could trigger the flag. But there was nothing special in the system logs:

$ sudo tail -n 2 syslog
Oct 27 12:08:14 sphinx kernel: [10073549.962985] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=31736 PROTO=UDP SPT=137 DPT=137 LEN=58
Oct 27 12:08:14 sphinx kernel: [10073550.725397] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=32600 PROTO=UDP SPT=137 DPT=137 LEN=58
$ sudo tail -n 3 messages
Oct 27 12:07:22 sphinx kernel: [10073498.248999]       blocks= 879032432 block_size= 512
Oct 27 12:07:22 sphinx kernel: [10073498.249037]       heads=255, sectors=32, cylinders=107725
Oct 27 12:07:22 sphinx kernel: [10073498.249038] 

Of course, nothing could get into the log files as soon as the FS stopped writing into files! However, there is one more log, which is not written into a file. You can browse it using dmesg command:

$ sudo dmesg|tail
[10105513.875139] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=94.55.152.129 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=117 ID=27206 DF PROTO=TCP SPT=3128 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
[10105565.081765] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=81.198.246.184 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=55870 DF PROTO=TCP SPT=3218 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
[10105568.037173] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=81.198.246.184 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=56107 DF PROTO=TCP SPT=3218 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
[10105707.853626] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=29692 PROTO=UDP SPT=137 DPT=137 LEN=58
[10105708.615590] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=29742 PROTO=UDP SPT=137 DPT=137 LEN=58
[10105709.377978] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=30803 PROTO=UDP SPT=137 DPT=137 LEN=58
[10106190.118646] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=92.226.213.205 DST=212.24.56.25 LEN=52 TOS=0x00 PREC=0x00 TTL=119 ID=30620 DF PROTO=TCP SPT=2961 DPT=445 WINDOW=32767 RES=0x00 SYN URGP=0
[10106193.144746] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=92.226.213.205 DST=212.24.56.25 LEN=52 TOS=0x00 PREC=0x00 TTL=119 ID=30905 DF PROTO=TCP SPT=2961 DPT=445 WINDOW=32767 RES=0x00 SYN URGP=0
[10106214.257277] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=109.160.88.133 DST=212.24.56.25 LEN=434 TOS=0x00 PREC=0x00 TTL=50 ID=0 DF PROTO=UDP SPT=21015 DPT=5060 LEN=414
[10106311.722746] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=83.234.62.250 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=55552 PROTO=TCP SPT=53697 DPT=2222 WINDOW=65535 RES=0x00 SYN URGP=0

In dmesg log, I found some interesting lines:

[10073618.603399] do_get_write_access: OOM for frozen_buffer
[10073618.603451] ext4_reserve_inode_write: aborting transaction: Out of memory in __ext4_journal_get_write_access
[10073618.603541] EXT4-fs error (device dm-0) in ext4_reserve_inode_write: Out of memory
[10073618.603623] Aborting journal on device dm-0:8.
[10073618.603839] EXT4-fs (dm-0): Remounting filesystem read-only
[10073618.611361] EXT4-fs error (device dm-0) in ext4_dirty_inode: Out of memory
[10073618.876932] htop invoked oom-killer: gfp_mask=0x280da, order=0, oomkilladj=0

Out of memory killer is an funny feature of the Linux kernel. When the kernel needs more memory than it has available, it kills in a more or less random fashion one or more running processes. But when did it happen? The timestamps in dmesg are not very informative. The figures in brackets are seconds since boot time. Which means I had to find the uptime in seconds. It is very easy, just read the file /proc/uptime. This file is located on the virtual filesystem and hence it was still writable:

$ cat /proc/uptime
10132406.21 161014358.20

The first number is the number of seconds sine boot time. So, I just had to subtract the dmesg number from the /proc/uptime number, divide it by 60 and I got the number of minutes passed since the process was killed.

Actually, this information was not really important, it was just interesting. Now, I had the explanation of the situation. Sphinx must have allocated too much memory, the kernel killed some process (the message in dmesg said it was htop, but it was not the only one, there were more processes killed). The process must have been writing something to the disk when it was shot down. The file system recognized it as an error and switched to the read-only mode, as prescribed by the fstab option errors=remount-ro.

First of all, I tried to remount the file system back in rw mode. The usual command mount -o remount,rw would not work, because mount attempts to write to /etc/mtab, located on the read-only file system. So, I had to give one more flag to avoid writing to mtab:

$ sudo mount -n -o remount,rw
mount: block device /dev/mapper/sphinx-root is write-protected, mounting read-only

Read-only again... I thought I had to run fsck to fix the errors, but to do so, I had to reboot the server. It was probable that the automatic fsck would not solve the problem and I would need a live CD to run fsck. So, I sighed and moved Sphinx to another server. When it was done and the second server began serving users' requests, I inserted a live CD in the server and rebooted. Fortunately, the automatic fsck fixed the errors, the server rebooted once again and very soon Sphinx was up and running again.

No comments:

Post a Comment