DM@Work: 2010

Monday, December 20, 2010

SOLVED: Moving InnoDB tables between Percona MySQL servers

Update. Warning: The procedure is not working. The table can be copied, imported into another server and successfully read, but an attempt to write to the table causes a server crash. The problem is probably caused by remaining tablespace ids in the ibd file.

Update 2. Kudos to the Percona team, who pointed me to an error in my configuration file. To import tablespaces, one more option had to be set on the server, innodb_expand_import. Xtrabackup documentation has been updated.

When trying to move a table from one server to another, I found a problem. I followed the procedures outlined in the Percona Xtrabackup manual, chapter Exporting Tables.

On the last step, when doing IMPORT TABLE, I received an error message saying:

mysql> alter table `document_entity_bodies` import tablespace;
ERROR 1030 (HY000): Got error -1 from storage engine

There was some more information in the log:

101220 16:14:51  InnoDB: Error: tablespace id and flags in file './dbx_replica/document_entity_bodies.ibd'
 are 21 and 0, but in the InnoDB
InnoDB: data dictionary they are 181 and 0.
InnoDB: Have you moved InnoDB .ibd files around without using the
InnoDB: commands DISCARD TABLESPACE and IMPORT TABLESPACE?
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/innodb-troubleshooting-datadict.html
InnoDB: for how to resolve the issue.
101220 16:14:51  InnoDB: cannot find or open in the database directory the .ibd file of
InnoDB: table `dbx_replica`.`document_entity_bodies`
InnoDB: in ALTER TABLE ... IMPORT TABLESPACE

The page mentioned in the log was not really helpful. What saved me was this article: Recovering an InnoDB table from only an .ibd file.

Following the instructions, I used a hex editor (shed) to change tablespace id in the ibd file from 0x15 to 0xB5 and then the import worked fine.

I wonder if there is a way to avoid these manipulations. Perhaps, one more operation should be added to xtrabackup to make the tablespace ids agree?

Oh, and it's Xtrabackup 1.4 working with Percona Server 5.1.51

Wednesday, December 15, 2010

Amarok without panels and trays

What's the use of “desktop” ornamentation like panels and trays? Time of day, CPU load, WiFi status — I can't think of anything I would like to see every moment I spend at the keyboard. If I want to know what's the time, I press 'M-z a'. So, panels and trays do not occupy a single pixel on my display.

Some programs, though, have a strange habit of closing their main window and leaving only a small icon in the tray. And there's no other way to control them besides grabbing the mouse and clicking, clicking, clicking... In some cases, though, you can leave your mouse in the dust. Amarok is one of such cases. I was perplexed when I saw Amarok in the list of processes, but there was no main window anywhere. Unfortunately, Amarok uses one of those recent abominations that appeared in Linux, DBus. To show the main window, send a signal using `qdbus':

$ qdbus org.kde.amarok /amarok/MainWindow org.kde.amarok.MainWindow.showHide

Friday, November 19, 2010

MySQL: enable innodb_file_per_table with zero downtime

I thought that while my wife is preoccupied with the lemon pie, I might tell you this story.

InnoDB is a very good storage engine for MySQL that combines reasonable performance with wide popularity and, as a consequence, a good set of tools for diagnostics and fine-tuning. One of its downsides is that it is inefficient when it comes to the disk space management. While an extent of HDD space was added to the storage, InnoDB will not return it back even when you delete tables or databases. To add some flexibility, you should use innodb_file_per_table option. Unfortunately, if you have a running database, you cannot just enable this option. You will have to make a dump of the database and restore it on a new instance of MySQL with the option enabled from the very beginning. This scenario means that the database will be inaccessible from the moment you start mysqldump to the moment you finish restoring the data in the new instance. Is there a way to minimize the downtime?

Yes, you can run mysqldump on a backup of your database. But, then you lose the data written to the database from the moment you make the backup to the moment the new instance is ready. But that's a bit closer to the solution. You can also set up replication between the original database and the new one and then, when the new instance catches up with the old one, your task is completed. And the backup can be done online, without stopping MySQL, if you use Xtrabackup tool by Percona.

So, the basic steps you have to follow are:

Configure your original database as master. Unless your database is already using binlogs for security, this is the only step that will require restarting MySQL.
Make a backup of the original database using Xtrabackup.
Restore the backup and run a second instance of MySQL.
Run mysqldump on the second instance.
Stop the second instance, but do not delete it yet.
Create a new database and start the third instance of MySQL with the enabled option innodb_file_per_table.
Restore the dump by feeding it into the third instance of MySQL.
Configure the third instance as slave and run the replication.
When the initial replication finishes and the slave catches up with the master, reconfigure your clients to use the new instance.
That's it. You can stop the first instance now and delete it.

I wrote an even more detailed guide illustrated with example commands. It was published on Linux.com recently: HOWTO: Reconfigure MySQL to use innodb_file_per_table with zero downtime

Friday, November 12, 2010

How not to write scripts

Today, I've seen a PHP script that used 41.5 Gb of virtual memory. 26 of them were carefully put to the swap. And I recalled the programs we were writing only 15-20 years ago. Like, a text editor that could process gigabyte text files (in theory, because there were no disk drives to store such files). Or a graphic viewer that could show pictures ten times larger than the amount of available RAM.

As a friend of mine has put it, in the USSR, when the engineers were sent to kolkhoz to gather in the crop of potato, they knew what might happen to them if they didn't work well. Wish some of modern developers knew that, too.

Wednesday, November 3, 2010

Hardware RAID? Software RAID? Both!

I have received a HP server with Ubuntu installed by someone else. There is Smart Array RAID controller installed there with cciss driver. And at the same time, there's device mapper configured:

$ ls -l /dev/mapper/
total 0
crw-rw---- 1 root root  10, 59 2010-08-25 19:01 control
brw-rw---- 1 root disk 251,  0 2010-08-25 19:01 okd-root
brw-rw---- 1 root disk 251,  1 2010-08-25 19:01 okd-swap_1
$ ls -l /dev/cciss/
total 0
brw-rw---- 1 root disk 104, 0 2010-08-25 19:01 c0d0
brw-rw---- 1 root disk 104, 1 2010-08-25 19:01 c0d0p1
brw-rw---- 1 root disk 104, 2 2010-08-25 19:01 c0d0p2
brw-rw---- 1 root disk 104, 5 2010-08-25 19:01 c0d0p5
$ mount
/dev/mapper/okd-root on / type ext4 (rw,errors=remount-ro)
...
/dev/cciss/c0d0p1 on /boot type ext2 (rw)

What I'm certain about is that hardware RAID is a must on my servers. Its battery-backed cache can provide a higher performance than that of software-only RAID. But should I get rid of the device mapper and LVM based on it?

From what I found in various sources, I thought I would better leave it as it is.

Firstly, the performance penalty is negligible. Secondly, LVM allows for more flexible volume management (like, removal of a PV, moving PEs to the free space in the VG). Thirdly, snapshots. And, finally, portability of LVM volumes between incompatible hardware (this won't work for me, I'm afraid, because of the underlying hardware RAID1).

Besides, this LVM over RAID approach seems to be a common thing now and nobody seems to have complained about it... :)

Well, anyway, I understood that Linux software RAID is one more area I should learn more about.

Friday, October 29, 2010

The case of the read-only file system

Today, I received a notification from Nagios that one of my servers, running Sphinx search engine, was not OK. The message said the RAID controller didn't work well, but this was not the case. Sphinx itself was still running and it served the requests. It just could not update the indices to reflect the new data. I logged in with SSH and tried to run the Nagios plugin manually and received the error:

$ /usr/lib/nagios/plugins/check_hpacucli
/usr/lib/nagios/plugins/check_hpacucli: line 72: /tmp/hpacucli.txt: Read-only file system
rm: cannot remove `/tmp/hpacucli.txt': No such file or directory

Was it really read-only?! /tmp was not a mounted file system, it was the root FS

$ touch 1.txt
touch: cannot touch `1.txt': Read-only file system

It seemed that the root file system had really turned read-only.

$ mount
/dev/mapper/sphinx-root on / type ext4 (rw,errors=remount-ro)
$ cat /etc/fstab
/dev/mapper/sphinx-root /               ext4    errors=remount-ro 0       1

The meaning of the flag errors=remount-ro must be obvious: it remounts the file system in question in read-only mode when some errors may corrupt the FS. I had to find out what might kind of errors could trigger the flag. But there was nothing special in the system logs:

$ sudo tail -n 2 syslog
Oct 27 12:08:14 sphinx kernel: [10073549.962985] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=31736 PROTO=UDP SPT=137 DPT=137 LEN=58
Oct 27 12:08:14 sphinx kernel: [10073550.725397] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=32600 PROTO=UDP SPT=137 DPT=137 LEN=58
$ sudo tail -n 3 messages
Oct 27 12:07:22 sphinx kernel: [10073498.248999]       blocks= 879032432 block_size= 512
Oct 27 12:07:22 sphinx kernel: [10073498.249037]       heads=255, sectors=32, cylinders=107725
Oct 27 12:07:22 sphinx kernel: [10073498.249038]

Of course, nothing could get into the log files as soon as the FS stopped writing into files! However, there is one more log, which is not written into a file. You can browse it using dmesg command:

$ sudo dmesg|tail
[10105513.875139] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=94.55.152.129 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=117 ID=27206 DF PROTO=TCP SPT=3128 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
[10105565.081765] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=81.198.246.184 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=55870 DF PROTO=TCP SPT=3218 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
[10105568.037173] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=81.198.246.184 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=56107 DF PROTO=TCP SPT=3218 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
[10105707.853626] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=29692 PROTO=UDP SPT=137 DPT=137 LEN=58
[10105708.615590] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=29742 PROTO=UDP SPT=137 DPT=137 LEN=58
[10105709.377978] IP Tables: IN=eth1 OUT= MAC=ff:ff:ff:ff:ff:ff:18:a9:05:41:f3:ce:08:00 SRC=212.24.56.5 DST=212.24.56.31 LEN=78 TOS=0x00 PREC=0x00 TTL=128 ID=30803 PROTO=UDP SPT=137 DPT=137 LEN=58
[10106190.118646] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=92.226.213.205 DST=212.24.56.25 LEN=52 TOS=0x00 PREC=0x00 TTL=119 ID=30620 DF PROTO=TCP SPT=2961 DPT=445 WINDOW=32767 RES=0x00 SYN URGP=0
[10106193.144746] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=92.226.213.205 DST=212.24.56.25 LEN=52 TOS=0x00 PREC=0x00 TTL=119 ID=30905 DF PROTO=TCP SPT=2961 DPT=445 WINDOW=32767 RES=0x00 SYN URGP=0
[10106214.257277] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=109.160.88.133 DST=212.24.56.25 LEN=434 TOS=0x00 PREC=0x00 TTL=50 ID=0 DF PROTO=UDP SPT=21015 DPT=5060 LEN=414
[10106311.722746] IP Tables: IN=eth1 OUT= MAC=00:24:81:fb:2b:ad:00:0c:cf:47:11:c0:08:00 SRC=83.234.62.250 DST=212.24.56.25 LEN=48 TOS=0x00 PREC=0x00 TTL=121 ID=55552 PROTO=TCP SPT=53697 DPT=2222 WINDOW=65535 RES=0x00 SYN URGP=0

In dmesg log, I found some interesting lines:

[10073618.603399] do_get_write_access: OOM for frozen_buffer
[10073618.603451] ext4_reserve_inode_write: aborting transaction: Out of memory in __ext4_journal_get_write_access
[10073618.603541] EXT4-fs error (device dm-0) in ext4_reserve_inode_write: Out of memory
[10073618.603623] Aborting journal on device dm-0:8.
[10073618.603839] EXT4-fs (dm-0): Remounting filesystem read-only
[10073618.611361] EXT4-fs error (device dm-0) in ext4_dirty_inode: Out of memory
[10073618.876932] htop invoked oom-killer: gfp_mask=0x280da, order=0, oomkilladj=0

Out of memory killer is an funny feature of the Linux kernel. When the kernel needs more memory than it has available, it kills in a more or less random fashion one or more running processes. But when did it happen? The timestamps in dmesg are not very informative. The figures in brackets are seconds since boot time. Which means I had to find the uptime in seconds. It is very easy, just read the file /proc/uptime. This file is located on the virtual filesystem and hence it was still writable:

$ cat /proc/uptime
10132406.21 161014358.20

The first number is the number of seconds sine boot time. So, I just had to subtract the dmesg number from the /proc/uptime number, divide it by 60 and I got the number of minutes passed since the process was killed.

Actually, this information was not really important, it was just interesting. Now, I had the explanation of the situation. Sphinx must have allocated too much memory, the kernel killed some process (the message in dmesg said it was htop, but it was not the only one, there were more processes killed). The process must have been writing something to the disk when it was shot down. The file system recognized it as an error and switched to the read-only mode, as prescribed by the fstab option errors=remount-ro.

First of all, I tried to remount the file system back in rw mode. The usual command mount -o remount,rw would not work, because mount attempts to write to /etc/mtab, located on the read-only file system. So, I had to give one more flag to avoid writing to mtab:

$ sudo mount -n -o remount,rw
mount: block device /dev/mapper/sphinx-root is write-protected, mounting read-only

Read-only again... I thought I had to run fsck to fix the errors, but to do so, I had to reboot the server. It was probable that the automatic fsck would not solve the problem and I would need a live CD to run fsck. So, I sighed and moved Sphinx to another server. When it was done and the second server began serving users' requests, I inserted a live CD in the server and rebooted. Fortunately, the automatic fsck fixed the errors, the server rebooted once again and very soon Sphinx was up and running again.

Sunday, October 24, 2010

MySQL reference I had always missed

InnoDB Glossary. These terms are commonly used in information about the InnoDB storage engine. No idea, why I have never seen this before. I found the article today and I'm still reading. It's a very dense explanation of many important construction blocks of InnoDB (and, of course, XtraDB!) engine. Must read.

Friday, October 15, 2010

Nagios notifications

For quite some time, I've been receiving Nagios notifications via e-mail and SMS. Recently, my cellular operator, MTS, turned off the service that allowed me to receive email messages via SMS. Shame upon MTS! Don't use their services. The reliability is terrible and then they just turn it off.

I plan to buy and set up a SMS gateway on the server. In the meanwhile, I was looking for other alternatives.

Firefox extension

Nagios Checker is a Firefox extension that displays Nagios alerts in the browser's status bar. Works fine, but I'm still not certain about the merits of this solution. After all, I can get e-mail notifications in the same browser area. And it's not mobile.

On the other hand, it's much more satisfying to see the green spot down there than to jump up every time an email notifier informs you of another webinar you don't want to attend.

Push e-mail

My cell phone is Nokia E63. It's a Symbian thingy and its messaging application can monitor my Gmail account non-stop. Well, actually, I'm not sure if it's implemented as a real push e-mail, even though Wikipedia is dead sure that it is. Either way, it works, but only as long as I'm within reach of a WiFi network. Of course, it should work over GRPS/Edge/3G, too, but I am still to find how much this permanent connection would cost me.

Social networks

There's a number of articles that recommend using Twitter to deliver the notifications, but I never could grasp the value of this service and do not use it. Besides, I think anyone would be able to read the alerts? Not good.

Instant messengers

A sound idea. But I tend to keep as far from them as possible. Even at work, where Skype is the preferred way of communication, I hardly use it. It's not as comfortable as e-mail, IMHO. It annoys and distracts me. And, finally, there are too many new messages, and the natural reaction is to neglect them.

Other ways

I've found an interesting article, "Notifications and Events in Nagios 3.0" (Part I and Part II). It summarizes some interesting ideas and details of their implementation in Nagios. Still, it does not mention an amazing idea, which I would really like to try out. Have a look at this video:

In the meanwhile, I think I should try push e-mail. Also, I think the server could trigger a desktop notification program (like notify-send, kdialog or, even better, a self-made Tcl/Tk pop-up notifier) remotely, via ssh. It should work well.

It's a pity cell phones don't have their own IP addresses and can't be controlled via ssh...

Thursday, October 7, 2010

Git and SVN

Very soon I will have a chance to compare the two version control systems in practice. I have to admit, though, that I'm rather biased against Git. It might have certain advantages for open-source teams, where members change often and hardly ever see each other. But I fail to understand (at least, a priori) how it could be better than SVN in a traditional development model. If I understand correctly, the only difference between them is one more intermediate repository between every two team members and, consequently, one more synchronization before code gets into trunk. I can believe that Git makes developers commit more often. But the changes are checked into the local repository, while synchronizations between repositories are usually made less often.

Another advantage of Git, local branches, which are easier to create and merge, also has a downside. They encourage the developer to experiment, but they also increase the chances of conflicts when branches are merged. Once again, the freedom of experiments might be good for open source projects, but for the teams where people work in contact it may create extra risks of project fragmentation.

Monday, August 30, 2010

XtraBackup 1.3b

I have installed the newer version of XtraBackup, a tool for online backup of MySQL databases. The first tests show that a lot of errors were fixed in this, not yet released, version. Version 1.2 failed to make incremental backups on one of my servers, could not restore full backup on another, and segfaulted during full backup on the third one. 1.3b makes copies of all three and is able to restore them flawlessly.

A 233GB database was copied in 2:15'. I highly recommend you upgrade your installation of Xtrabackup to v.1.3b and to test it on production servers.

Download page at Percona.com is here: XtraBackup-1.3-beta. I usually download x86_64 binary files for Linux here.

Monday, August 23, 2010

Gaps in graphs generated by rrdtool

I sometimes see questions like: "Why are there gaps in my Munin graphs?" or "How do I get rid of holes in Nagiosgraph charts?". Some classical answers may be found in Munin FAQ or in 'man munin.conf'.

There is one more option, though. The problem may be caused by buggy plugins. By rounding errors in the plugins, to be precise.

From my experience, the gaps are usually found in the graphs where the data is a sum of some indicators. For example, the CPU load graph. The maximum value is calculated as number-of-cpus*100%. The actual value is calculated as the sum of a number of values: the time spent by the CPU in user mode, in system mode, waiting for I/O, in idle state, etc. Under certain conditions, this sum may exceed 100%. In this case, Rrdtool will omit the measurement from the graph and there will be a hole.

To get rid of the gaps, for example, in Munin's 'cpu' plugin, I replaced the lines

        PERCENT=$(($NCPU * 100))
        MAX=$(($NCPU * 100))

with

        PERCENT=$(($NCPU * 100 + 200))
        MAX=$(($NCPU * 100 + 200))

200 might be an overkill, but it looked a bit better in my Munin :). Of course, another option is to check the actual data for being not greater than the max value.

Monday, August 9, 2010

Using variables in complex configurations of Nginx

There are some situations for which Nginx config file does not provide adequate solutions. Very often, using variables, both built-in and user-defined, might help.

For example, Nginx does not support complex conditions, neither as logical operations nor as nested if's. The solution is to define a new variable. If you want a certain URL rewrite to take place only if the Referer header contains "somedomain.com" OR is empty, you can write:

$dorewrite = 'no';
if ($http_referer ~* somedomain.com) {
       set $dorewrite 'yes';
}
if ($http_referer = '') {
       set $dorewrite 'yes';
}
if ($dorewrite = 'yes') {
       rewrite ^ new-request;
}

If you want to rewrite the URL if both conditions are true, for example, the Referer header contains "somedomain.com" AND there's a GET parameter 'id' equal to '12345', modify the snippet above using the De Morgan's law:

$dorewrite = 'yes';
if ($http_referer !~* somedomain.com) {
       set $dorewrite 'no';
}
if ($arg_ID != '12345') {
       set $dorewrite 'no';
}
if ($dorewrite = 'yes') {
       rewrite ^ new-request;
}

Note the variable $arg_ID. This is a built-in variable. There is one such variable for every GET parameter in the request. Its name is composed of $arg_ and the name of the parameter. If you know all parameters you need to serve the request, these variables will help you to extract and pass them around.

If you use complicated regular expressions which you'd like to simplify, define another variable that would contain a part of the regexp:

if ($arg_ID ~* ^doc-number-([-a-z0-9]*)$) {
        set $newarg $1;
}
if ($dorewrite = 'yes') {
        rewrite ^ /doc/view/$newarg? break;
}

Friday, July 2, 2010

Installation of HP Array Configuration Utility CLI for Linux on Ubuntu 10.04 Lucid Lynx

HP Array Configuration Utility CLI for Linux (hpacucli) is a useful tool that allows you to manage your HP RAID controller from command line (via ssh, that is). It can also be used to monitor the state of the disk subsystem with tools like Nagios or Zabbix. The following controllers are supported:

Smart Array 5312 Controller
Smart Array 5302 Controller
Smart Array 5304 Controller
Smart Array 532 Controller
Smart Array 5i Controller
Smart Array 641 Controller
Smart Array 642 Controller
Smart Array 6400 Controller
Smart Array 6400 EM Controller
Smart Array 6i Controller
Smart Array P600 Controller
Smart Array P400 Controller
Smart Array P400i Controller
Smart Array E200 Controller
Smart Array E200i Controller
Smart Array P800 Controller
Smart Array E500 Controller
Smart Array P700m Contoller
Smart Array P410i Controller
Smart Array P411 Controller
Smart Array P212 Controller
Smart Array P712m Contoller
Smart Array B110i SATA RAID
Smart Array P812 Controller
MSA500 Controller
MSA500 G2 Controller
MSA1000 Controller
MSA1500 CS Controller
MSA20 Controller

The tool is supplied on HP Support Pack CDs, but you can download a newer version from the HP site. To install hpacucli, visit this page and download hpacucli-8.50-6.0.noarch.rpm. Now, copy it to the server you'll install it on. Next, we'll unpack the RPM file. You can do it using, for example, rpm2cpio, rpm2tgz, etc. In Ubuntu, there's a utility called alien, that can do the same:

alien --to-tgz hpacucli-8.50-6.0.noarch.rpm

alien will report some errors and warnings, but don't worry. You will now get a new file, called hpacucli-8.50.tgz.

tar -xzf hpacucli-8.50.tgz

Move the unpacked files to corresponding locations:

sudo mv opt/compaq /opt/
sudo mv usr/sbin/* /usr/sbin/

Now, if you run a i386 kernel, you can run hpacucli, which is a 32-bit program. However, if your Ubuntu is a 64-bit system, you will have to allow execution of 32-bit binaries. One of the ways to do so is to install ia32-libs package. After the installation you can run hpacucli.

hpacucli gives you a prompt where you can enter commands or you can give the commands from the shell command line, like:

$ sudo hpacucli help

$ sudo hpacucli ctrl all show config

Smart Array P212 in Slot 1                (sn: PACCP9SYJ067  )

   array A (SAS, Unused Space: 0 MB)


      logicaldrive 1 (419.2 GB, RAID 1, OK)

      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 450 GB, OK)
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 450 GB, OK)

   Expander 250 (WWID: 50014380065D7410, Port: 1I, Box: 1)

   Enclosure SEP (Vendor ID HP, Model DL18xG6BP) 248 (WWID: 50014380065D7423, Port: 1I, Box: 1)

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 249 (WWID: 50014380069159CF)

A similar command will give you even more information on the state of your RAID:

$ sudo hpacucli ctrl all show config detail

Now you can use this plugin to monitor the state of the array in Nagios. Besides, the information from hpacucli helped me to identify the causes of the significantly degraded performance of one of my servers (the battery of the write-cache was dead).

Tuesday, June 29, 2010

logrotate: rotating logs in multiple directories

I've got a server with almost one hundred web-sites. Each of the sites is in its own directory and runs its own logs in /usr/local/www/SITENAME/logs/*.log. When the logs grew up, I decided to set up logrotate. Since there were so many web-sites, my first idea was to create one configuration file for every site in logrotate.d and leave just one line in logrotate.conf:

include /usr/local/etc/logrotate.d

Only one hour later I understood that I can just write a path with multiple meta-characters to include ALL logs in just one line:

/usr/local/www/*/logs/*.log {
    daily
    rotate 60
    compress
    notifempty
    postrotate
          [ ! -f /var/run/nginx.pid ] || kill -USR1 `cat /var/run/nginx.pid`
    endscript

}

Thursday, June 24, 2010

PINBA: PHP Is Not a Bottleneck Anymore

Yesterday I installed Pinba on one of my servers. Pinba is a set of tools to monitor performance of PHP scripts. Pinba MySQL database engine runs a list of timers and automatically fills report databases. Pinba PHP extension uses two functions to open and close these timers. Besides, there are default timers, which open when a script is executed and close when it finishes. If you put a timer around some critical piece of code you can get information on how often it runs and how much time it is being executed. Timers can be tagged and the data can be grouped by tags. So, in some pieces of code you can set tags "author" and "task" with corresponding values. Then you'll manage to compare the performance of code written by different developers and identify the most time-consuming parts. The most interesting thing is that when you create database tables following certain rules, these tables become automagically filled with the necessary data. So, if you use the tags "author" and "tags" to group the data, the reports will include all valid combinations of these tags and show summaries on these combinations: how much time Joe's scripts parsed new documents and how often Jackie's front-end scripts were called. Very impressive.

By default, Pinba stores this information for some limited period of time (15 minutes, IIUC), so you need some way to make the data persistent. Since we use Munin to monitor various system indicators, I wrote a couple of plugins (in TCL and Lua, just for fun :)) to display the frequency of execution and the average execution time for each timer. Our developers added a handful of timers in various places of code and here it is:

The last graph looks cluttered and not too informative, so I plan to employ Munin's 'suggest' feature to draw some diagrams using one script. Perhaps, organizing the graphs will be the most difficult part of the deployment. I have to say, though, that the installation was not simple, either. Prerequisites include compiled sources of the installed MySQL (Percona Server 10.2 in my case), Google Protocol Buffers, Judy library, libevent 1.4.1+ (Ubuntu's default one will do) and Hoard memory allocator. And here is the installation process (paths will be different for you, so check carefully):

wget http://pinba.org/files/pinba_engine-0.0.5.tar.gz
tar -xzf pinba_engine-0.0.5.tar.gz
wget http://pinba.org/files/pinba_extension-0.0.5.tgz
tar -xzf pinba_extension-0.0.5.tgz
wget http://protobuf.googlecode.com/files/protobuf-2.3.0.tar.gz
tar -xzf protobuf-2.3.0.tar.gz
wget http://downloads.sourceforge.net/project/judy/judy/\
Judy-1.0.5/Judy-1.0.5.tar.gz?use_mirror=ignum
tar -xzf Judy-1.0.5.tar.gz
wget http://www.cs.umass.edu/%7Eemery/hoard/hoard-3.8/source/hoard-38.tar.gz
tar -xzf hoard-38.tar.gz
sudo aptitude install libevent-1.4-2 libevent-dev
cd protobuf-2.3.0
./configure
make -j
sudo make install
cd ../judy-1.0.5/
./configure
make
sudo make install
cd ../hoard-38/src
make linux-gcc-x86-64
sudo cp libhoard.so /usr/local/lib
sudo cp *.h /usr/local/include
sudo ldconfig
cd pinba_engine-0.0.5/
./configure --with-mysql=/home/minaev/Percona-Server-10.2/ \
--with-judy=/usr/local --with-protobuf=/usr/local \
--with-event=/usr --libdir=/usr/lib/mysql/plugin/ \
--with-hoard=/usr/local
make 
sudo make install
echo "INSTALL PLUGIN pinba SONAME 'libpinba_engine.so'"|mysql
echo "CREATE DATABASE pinba"|mysql
mysql -D pinba <default_tables.sql
cd pinba-0.0.5/
sed -i 's/NOTICE/CHECKING/' config.m4
phpize
./configure --with-pinba=/usr/local
sudo make install

I had to edit config.m4 because my version autoconf was a bit buggy. After this process you'll have to add three lines to your php.ini:

extension=pinba.so
pinba.enabled=1
pinba.server=[MySQL server address]

And here is one of Munin plugins, written in TCL. It collects data on how often certain API parts were called.

#!/usr/bin/tclsh

package require mysqltcl 3.05


proc clean_fieldname arg {
    return [regsub -all {[^A-Za-z]} $arg "_"]
}

set dbuser "pinba"
set db "pinba"

set conn [::mysql::connect -user $dbuser -db $db]

set fields [::mysql::sel $conn 
  "select concat(module_value, '+', action_value) from \
tag_info_module_action" -list]

if {$argc > 0} {
    switch [lindex $argv 0] {
        "config" {
            puts "graph_title PHP Actions per second"
            puts "graph_vlabel reqs per second"
            puts "graph_category Pinba"
            foreach fld $fields {
                set clean [clean_fieldname $fld]
                 puts "$clean.label $fld"
                 puts "$clean.draw LINE3"
            }
        }
        "autoconf" {
            puts "yes"
        }
    }
} else {
    foreach fld $fields {
        set clean [clean_fieldname $fld]
        set data [::mysql::sel $conn 
  "select req_per_sec from tag_info_module_action where \
  concat(module_value, '+', action_value)='$fld'" -list]
        puts "$clean.value $data"
    }
}

::mysql::close $conn

BTW, you may find it interesting that the performance of TCL scripts was almost the same as that of Lua scripts and about 3-4 times higher than for Bash.

Tuesday, June 15, 2010

Dark sides of Python

While reading about Python and playing around with its objects and classes (OOP being a perversions in itself), I witnessed a slightly weird behaviour. Define a class with a class variable.

class Parent:
  variable = "parent 1"

Then define a descendant class that inherits the class variable:

class Child(Parent):
  pass

(That funny single pass stands for empty definition body) Now, let's have a look at the value of variable in Parent and Child:

print Parent.variable
parent 1
print Child.variable
parent 1

Then, change the value of the variable in the parent class and it should also change in the child class:

Parent.variable = "parent 2"
print Parent.variable
parent 2
print Child.variable
parent 2

Sounds good. The variable must be shared between the two classes. Now, let's change the value of this allegedly shared variable in the child class:

Child.variable = "child 1"
print Parent.variable
parent 2
print Child.variable
child 1

Quite of a sudden, the variable turns out to be two separate variables. We have somehow broke the link that connected them and now, even if we change the value of the variable in Parent, this will not affect the variable in Child anymore:

Parent.variable = "parent 3"
print Parent.variable
parent 3
print Child.variable
child 1

And how a language with such non-trivial idiosyncrasies can be promoted as newbie-friendly, "does-what-you-want" language? My interest to Python evaporates so fast that I will probably never get to the famous included "batteries".

Tuesday, May 4, 2010

Errors when installing PHP 5.3 with FPM on Ubuntu

If you try to install PHP 5.3 with FPM on Ubuntu, no matter which installation path you follow, patching sources or downloading FPM from SVN, you will most likely see a lot of error messages similar to the following:

$ ./configure 
cat: confdefs.h: No such file or directory
./configure: 490: ac_fn_c_try_run: not found
./configure: 490: 5: Bad file descriptor    
./configure: 490: :: checking for pthreads_cflags: not found
./configure: 490: 6: Bad file descriptor                    
./configure: 490: checking for pthreads_cflags... : not found
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 490: ac_fn_c_try_run: not found                 
./configure: 492: 5: Bad file descriptor                     
./configure: 492: :: result: : not found                     
./configure: 492: 6: Bad file descriptor                     
./configure: 492: : Permission denied                        
./configure: 495: 5: Bad file descriptor                     
./configure: 495: :: checking for pthreads_lib: not found    
./configure: 495: 6: Bad file descriptor                     
./configure: 495: checking for pthreads_lib... : not found   
cat: confdefs.h: No such file or directory                   
./configure: 555: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 555: ac_fn_c_try_run: not found                 
cat: confdefs.h: No such file or directory                   
./configure: 555: ac_fn_c_try_run: not found                 
./configure: 557: 5: Bad file descriptor                     
./configure: 557: :: result: : not found                     
./configure: 557: 6: Bad file descriptor                     
./configure: 557: : Permission denied                        
./configure: 633: 5: Bad file descriptor                     
./configure: 633: :: result: : not found                     
./configure: 633: 6: Bad file descriptor                     
./configure: 633: : Permission denied                        
./configure: 635: 5: Bad file descriptor                     
./configure: 635: :: result: Configuring SAPI modules: not found
./configure: 635: 6: Bad file descriptor                        
./configure: 635: Configuring SAPI modules: not found           
./configure: 666: 5: Bad file descriptor

If so, don't be a fool like me, check what buildconf reported above:

$ ./buildconf --force                                                                                                          
Forcing buildconf                                                                                                                                       
buildconf: checking installation...                                                                                                                     
buildconf: autoconf version 2.64 (ok)                                                                                                                   
buildconf: Your version of autoconf likely contains buggy cache code.                                                                                   
           Running vcsclean for you.                                                                                                                    
           To avoid this, install autoconf-2.13.                                                                                                        
Can't figure out your VCS, not cleaning.

This is not just a regular warning you can neglect, you have to use autoconf-2.13. You don't have to install it from sources, though. This version is a separate package, so just run sudo aptitude install autoconf-2.13. You will have also to install libevent-1.4 and libevent-dev. Now, you can run buildconf again and then configure.

The full installation procedure would be:

sudo apt-get install autoconf2.13
wget http://ru.php.net/get/php-5.3.2.tar.bz2/from/ru2.php.net/mirror
tar -xjf php-5.3.2.tar.bz2
sudo aptitude install libevent-dev
cd php-5.3.2/
svn co http://svn.php.net/repository/php/php-src/trunk/sapi/fpm sapi/fpm
./buildconf --force
./configure --enable-fpm --with-zlib \
--enable-pdo --with-pdo-mysql --enable-sockets \
--with-mysql --with-config-file-path=/etc \
--enable-calendar --with-iconv --enable-exif\
 --enable-soap --enable-ftp --enable-wddx \
--with-zlib --with-bz2 --with-gettext \
--with-xmlrpc --enable-pcntl --enable-soap \
--enable-bcmath --enable-mbstring --enable-dba \
--with-openssl --with-mhash --with-mcrypt \
--with-xsl --with-curl --with-pcre-regex 
--with-gd --enable-gd-native-ttf --with-ldap \
--enable-pdo --with-pdo-mysql --with-mysql \
--with-sqlite --with-pdo-sqlite --enable-zip \
--enable-sqlite-utf8 --with-pear \
--with-freetype-dir=/usr --with-jpeg-dir=/usr \
--with-mysqli --with-fpm-conf=/etc/php/php-fpm.conf \
--with-fpm-pid=/var/run/php-fpm.pid \
--with-config-file-path=/etc/php/ \
--with-config-file-scan-dir=/etc/php/conf.d/

Of course, your php configure options can be different.

Wednesday, April 28, 2010

Naïve question

Been installing FineReader under Windows today. The question is why does installation process pause when I click'n'hold on the window title bar to move it? No reply...

Wednesday, April 21, 2010

Split screen vertically. You won't find it in `man screen`

Now I know what `serendipity' means. It's when you are not quite awake, sit at the keyboard, your fingers fumble and you suddenly see that your xterm window where `screen' is running, gets split in two, but not horizontally (one above another), as it should according to the documentation, but vertically (one beside another). I didn't even understand what keys I pressed to get this effect. `man screen' includes only familiar `split' command. It took me some time to google out the answer.

It turns out that this feature was included in `screen' quite some time ago, but somehow nobody cared to describe it in man. Pressing `C-a |' splits the screen in left and right regions. To get rid of one of the regions, use the usual `C-a X' or `C-a S'.

Thursday, April 15, 2010

MySQL Proxies

While searching for a MySQL proxy solution, I found the following four products:

At least two of them share the same codebase, MySQL-proxy being the forefather of Spock proxy. Spock proxy tries to increase performance by eliminating the scripting layer. The goal of Spock is to provide efficient sharding, not fault-tolerance. Dormando's proxy positions itself as an alternative to the official MySQL-proxy, and it tries to retain compatibility. It even supports Lua scripting. I'm not sure, though, if the API is the same. And, finally, Proximo is written in Perl, which means its performance is lower than that of the other competitors. But Proximo is in very early stage of development and has a promising architecture. It may have good future.

Thursday, March 25, 2010

Slightly disappointed with Percona products

A large part of my to-do list was devoted to the products by Percona, company well-known in the MySQL world. I installed XtraDB engine on our servers and began using Xtrabackup to backup our databases. Not bad, I have to admit. I mean, the servers still work and the backups are made. But I had to solve so many problems that I kept recalling my first days with Linux almost fifteen years ago, when you had to compile everything from sources, manually track dependencies and even then there was only one chance in two that the program will run.

First, I installed XtraDB, MySQL engine which is supposd to cure the long-standing deficiencies of InnoDB. It was not too difficult. I downloaded the sources of XtraDB-1.0.6, tried to compile, failed, found a description of a non-standard (easily explainable, though) installation procedure, copied the sources to the MySQL source tree, replacing the InnoDB engine, tried to compile, found an error, googled for a solution, fixed a bug in handler/i_s.cc, tried to compile, found an error, googled for a solution, fixed a bug in Makefile.in, tried to compile, succeeded, installed and there it is. Easy, right? :)

Next, I wanted to have a look at another product by Percona, Xtrabackup, version 1.0 of which was announced in December 2009. Xtrabackup is to become the main backup solution for MySQL, being the only free tool able to perform online backups. There is more than one link at Percona.com leading to the sources. Or should I say misleading? Here's one page saying you have to use Bazaar to get the sources. So I did. There were some screens chockfull of error messages, which I tried to quench with a bunch of header files stolen from MySQL source tree, but to no avail. I checked the Percona web-site once again and found another link. This tarball included a whole distribution of MySQL. I tried to compile it as it is. Then, I tried to copy the sources to MySQL sources, strictly following the recommended procedure. Then, I tried to copy the sources to my source tree (with XtraDB substituted for InnoDB, as described above). I admit I almost gave up. In the end, I downloaded a binary version of Xtrabackup, compiled by the authors. It just worked. Well, to a degree...

This package contained Xtrabackup itself and a Perl script called Innobackupex. The problem with Xtrabackup is that it does not support anything but InnoDB (and XtraDB). Fortunately, MyISAM tables can be just copied as files, and Innobackupex does exactly this. Unfortunately, Innobackupex does this only when making full backups. Incremental backups only include InnoDB tables.

This problem was easy to fix, but there was another. Innobackupex does not support the option --incremental-basedir specifying the last full backup. The reference point for the incremental backup is defined as the earliest directory. So, I had to add support for this option to the script manually.

So, to sum it up, the number of problems I met in Percona products is unusually high. Or is it normal for a company that makes money solving users' problems? :)

Monday, March 1, 2010

Ubuntu: Sudo vulnerability

Not too dangerous, unless you grant sudo rights to too many people, but worrying enough: Ubuntu Security Notice USN-905-1:

sudo did not properly validate the path for the 'sudoedit' pseudo-command. A local attacker could exploit this to execute arbitrary code as root if sudo was configured to allow the attacker to use sudoedit. The sudoedit pseudo-command is not used in the default installation of Ubuntu.

And another one, only a little bit more unnerving:

sudo did not reset group permissions when the 'runas_default' configuration option was used. A local attacker could exploit this to escalate group privileges if sudo was configured to allow the attacker to run commands under the runas_default account. The runas_default configuration option is not used in the default installation of Ubuntu.

Friday, February 26, 2010

Testing new MySQL on production server

Even when your MySQL works on a mission-critical server, there is a way to make downtime as short as possible after you have compiled a new version of the DBMS. MySQL documentation contains a chapter called 5.6. Running Multiple MySQL Servers on the Same Machine. But the reality is a bit more complicated.

First of all, you have to create a test database with the mysql_install_db command. Create an empty directory called, for example, test-db in your home directory and give full permissions to all users, to avoid problems with mysql user trying to create files:


$ mkdir test-db
$ chmod 777 test-db

Now, let's populate the directory with a test database:


$ mysql_install_db --user=mysql --datadir=/DIR/db

Note that we assume that your database runs under the username mysql, as it should. Now, you can try to run the test database. It must listen on a different IP port, use different data directory, different files for socket and pid, different log and error log. Hence, the following command:


$ PATH-TO-NEW/mysqld --port=12345 --datadir=/DIR/db --socket=/DIR/db/sock --pid-file=/DIR/db/pid --log=/DIR/db/log --log-error=/DIR/db/log-err

Now, you can check whether the new, fresh copy of MySQL works. If it does, you can stop the running version and then run make install for the new one.

Note that we did not grant any permissions to any users, so you might not be able to connect to the new database with mysql client.

MySQL+XtraDB: fixing compilation errors

Trying to launch MySQL with the new XtraDB engine. First, I downloaded MySQL sources with patches by Percona and untarred them. They compile without errors and MySQL starts without problems. However, this tarball uses InnoDB, while we wanted to test XtraDB.

Next, I downloaded XtraDB 5.1.42-1.0.6-9 sources from Percona web-site. According to the installation instructions, to use XtraDB, you have to replace the contents of storage/innobase directory with the contents of the XtraDB archive. This time, I received two error messages during the compilation phase, but both of them had already been addressed by the developers and described at Percona's Launchpad.net. In the first case, the file handler/i_s.cc contained a minor error at line 801: error: invalid conversion from ‘const char*’ to ‘char*’

To fix this,

if((p = strchr(index->table_name, '/')))

in line 801 in handler/i_s.cc has to be replaced with:

if((p = strchr((char *) index->table_name, '/')))

Then I got another error message: ha_innodb.cc:2622: undefined reference to `active_mi'. To solve this second issue, I had to add disable compilation of MySQL embedded server by adding the following option to the ./configure script: --without-embedded-server.

So, finally, make command succeeded and I tried to install MySQL, but then there was the third error:/bin/sh: @MKDIR_P@: command not found. Fortunately, someone else had seen this message before and the solution is available here. In the file /storage/innobase/Makefile.in line MKDIR_P = @MKDIR_P@ must be replaced with MKDIR_P = @mkdir_p@.

So, to save your time, do these three modifications before you run ./configure. So, the session transcript would look like this:

$ wget http://www.mysqlperformanceblog.com/mysql/5.1/\
source/mysql-5.1.26-percona.tar.bz2
$ wget http://www.percona.com/percona-builds/Percona-XtraDB/\
Percona-XtraDB-5.1.43-9.1/source/percona-xtradb-1.0.6-9.1.tar.gz
$ tar -xjf mysql-5.1.26-percona.tar.bz2
$ tar -xzf percona-xtradb-1.0.6-9.1.tar.gz
$ rm -r mysql-5.1.26-percona/storage/innobase/*
$ mv percona-xtradb-1.0.6-9.1/* mysql-5.1.26-percona/storage/innobase
$ cd mysql-5.1.26-percona/
mysql-5.1.26-percona$ sed -i 's/strchr(index/strchr((char *) index/' storage/innobase/handler/i_s.cc
mysql-5.1.26-percona$ sed -i 's/@MKDIR_P@/@mkdir_p@/' storage/innobase/Makefile.in
mysql-5.1.26-percona$ ./configure '--build=x86_64-linux-gnu' \
'--host=x86_64-linux-gnu' '--prefix=/usr' '--exec-prefix=/usr' \
'--libexecdir=/usr/sbin' '--datadir=/usr/share' \
'--localstatedir=/var/lib/mysql' '--includedir=/usr/include' \
'--infodir=/usr/share/info' '--mandir=/usr/share/man' '--with-server-suffix=-1ubuntu2' \
'--with-comment=(Ubuntu)' '--with-system-type=debian-linux-gnu' \
'--enable-shared' '--enable-static' '--enable-thread-safe-client' \
'--enable-assembler' '--enable-local-infile' '--with-pic' \
'--with-lib-ccflags=-fPIC' '--with-pstack' '--with-fast-mutexes' \
'--with-big-tables' '--with-unix-socket-path=/var/run/mysqld/mysqld.sock' \
'--with-mysqld-user=mysql' '--with-libwrap' '--with-ssl' \
'--without-docs' '--with-extra-charsets=all' '--with-plugins=max' \
'--without-ndbcluster' '--without-embedded-server' '--with-embedded-privilege-control' \
'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu' \
'CC=gcc' 'CFLAGS=-O3 -DBIG_JOINS=1 -fPIC -fno-strict-aliasing' \
'LDFLAGS=-Wl,-Bsymbolic-functions' 'CPPFLAGS=' 'CXX=g++' \
'CXXFLAGS=-O3 -DBIG_JOINS=1 -felide-constructors -fno-exceptions \
-fno-rtti -fPIC -fno-strict-aliasing' 'FFLAGS=-g -O2'
mysql-5.1.26-percona$ make
mysql-5.1.26-percona$ sudo make install