Friday, July 23, 2010

A scheduler for Netapp snapmirrors

Snapmirroring is a good way to protect your datas on your Netapp filer, replicating every night for instance all or part of your information of your filer in production to a remote hot spare filer. If you want to use a asynchronous snapmirror policy, you will have to rely on a sort of cron table to launch your replications.

But to my view, such cron feature does not prove to be a great idea if you want to avoid that too many snapmirrors occur at the same time (and thus slowing your network and increasing the latency of your filer). Of course, you can configure the cron in order to give sufficient time for some replications to end before launching new ones. This yet turns out to be a bit complicated to manage if you have a lot of volumes or qtrees to replicate. What is more, doing this way, you may waste part of your night time and replications may happened during business hours, impacting your responsiveness during peak hours.

To cope with such problems, I decided to create a sort of scheduler that takes a list of volumes/qtrees to replicate and ensure that only a certain amount of replications (and thus bandwidth) occur at the same time. When a slot is freed, it launches a new one. If you know Veritas Netbackup, you'll see where I got my inspiration.

Script is written in perl and you'll need the Netapp ONTAPI APIs (well, in fact just the perl part). You can dowload it here.

Install the scheduler :
mkdir /opt/netapp-manageability-sdk
tar xvfz /tmp/snapmirrorScheduler.tar.gz -C /opt/netapp-manageability-sdk
In the /opt/netapp-manageability-sdk/prod directory, install the ONTAPI API (just the perl part, don't bother with C, java etc). Then, you may need to patch a file in the API : /opt/netapp-manageability-sdk/prod/lib/perl/NetApp/ The invoke function of my API version did not manage the case where it received as an argument a reference to an object instead of a scalar. So here are the changes you may need to apply :
diff -r1.1
< $xi->child_add(new NaElement($key, $value));
> unless( ref($value)){ # we have a scalar variable
> $xi->child_add(new NaElement($key, $value));
> } else { # we have a reference object variable : must be treated in another way (bug from Netapp)
> my $newElement = new NaElement($key);
> $newElement->child_add( $value);
> $xi->child_add( $newElement);
> }

Now, let's change the root password of your filer. Look for the file lib/ and replace the TOCHANGE string by your password. Snapmirrors are launched on the servers that receive the replication so it should be the password of those filers. I considered that the filers had the same password, if your configuration differs, you'll have to change a bit the constructor block.

Finally, one should edit the configuration files. The tarball has got two configuration files : etc/scheduleSnapMirror_nas01a-ibm.conf and etc/scheduleSnapMirror_nas01b-ibm.conf. The files you'll have to create must be named the following way etc/scheduleSnapMirror_<filer-hostname>.conf. There must be a configuration file for each receiver filer. On each line, you explain the replication you want to launch. Replications at the top of the file will be launched first and the one at the bottom last. You can replicate just qtrees or whole volumes or a mix of them.
To fully understand the syntax of the configuration file, you must know that in my case, I have 2 principal filers : nas1a and nas1b and I have 2 filers on my backup data center : nas01a-ibm and nas01b-ibm. nas1a replicates on nas01a-ibm and nas1b replicates on nas01b-ibm. What is more, if I have a volume myVol13 on nas1a, replicated volumen on nas01a-ibm will be named R_myVol13. This allows me to make shorter my replication lines (if you have different conventions, you may hack a bit the code to do it as explained lated).

And now, it should work! Just execute the script :
/opt/netapp-manageability-sdk/bin/ --verbose nas01a-ibm
And you should see the first replications beginning.
If everything's OK, you can set such command in cron to execute it every night for instance :

mkdir /var/log/netapp
cat > /etc/cron.d/netapp <<-EOF
# schedule snapmirror execution on Netapp in order not to launch to many replications at the same time
03 22 * * * root /usr/local/bin/ nas01a-ibm
05 22 * * * root /usr/local/bin/ nas01b-ibm

Some explanations about the script :
Let's explain a bit the script
One function you might want to change is maxTransfersAllowed. It defines how many transfers you allow at the same time. I defined a business policy (lowTransfer variable) and an out of business policy (fullTransfer variable).
If you use the verbose mode, you'll see much information of your transfers in /tmp/.scheduleSnapMirror_<filer>-ibm.debug file. Every 5 minutes, the script will write on that file what transfers are executing and how many bytes have been transmitted.
You can also see in the script many eval constructions in order to catch errors. I did that because it sometimes happens that I have XML serialization problems that I did not really understand. What is more, you don't want your replication policy to stop if you loose network connectivity during a few seconds.
As default, the transfer rate (maxRate) is 8704 kb/s ; you'll may want to change it.
Last thing that can be interesting to change is in the launchEasyTransfer function. There, you'll see the code to establish a link between origin volume (filerA:volume) and destination volume (filerA-ibm:R_volume). You may need to adapt it according to your environment.

Sunday, June 6, 2010

Monitor xen domUs from dom0 with xenstore

The easier solution to monitor your domU servers may be to consider them as normal servers and install on them the Nagios NRPE plugin or any of your favorite monitoring software. But this means having a new daemon running on every of your domU, listening on a new port and new firewall rules to manage...
To avoid such burden, it is possible to monitor the virtual guests directly from the physical servers, using the xenstore database to pass information from the domUs to the host. The idea is the following one : every 5 minutes, the domUs execute a script that monitors what the system administrator thinks is important. The results is written on xenstore. Then, the dom0 collects all that information. Of course, if a domU fails to execute its script, the dom0 will detect it. Finally, there is a script on another server that does an snmpget to all the dom0s in order to gather all the information of your xen domUs.

The solution I wrote uses such idea, is written in python and works on RHEL 5, although it should work on any xen infrastructure. You can dowload it here.

Install the monitoring solution :
The first thing we need to do is to create the xenstore location on our dom0s. I chose to use /tool/myManagement/domUunsecureFiles/basicMonitoring. /tool/myManagement/ is the xenstore directory where I decided to store all the non standard information. Then I created the domUunsecureFiles to explain that every subdirectory in it has loose permissions : every domU may write in basicMonitoring. And as there is no sticky bit in xenstore, that means that every domU may delete or modify information another domU wrote before. There is a way with xenstore to enforce permissions but it is more complicated. What is more, I trust the people that administrate the domUs so I don't feel the need to go deep into those security concerns.

Let's add the commands to rc.local and execute them :
cat >> /etc/rc.d/rc.local <<-EOF

# write useful xenstore keys
xenstore-write /tool/myManagement/domUunsecureFiles/basicMonitoring/toDelete nothing
xenstore-rm /tool/myManagement/domUunsecureFiles/basicMonitoring/toDelete
xenstore-chmod /tool/myManagement/domUunsecureFiles/basicMonitoring w

xenstore-write /tool/myManagement/privateInfo/migration/toDelete nothing
xenstore-rm /tool/myManagement/privateInfo/migration/toDelete

. /etc/rc.d/rc.local

Then, copy the scripts, and in the /usr/local/bin directory of your dom0s.

We want to execute the monitoring script every 5 minutes :
cat > /etc/cron.d/basicXenMonitoring <<-EOF
# run xen basic monitoring every 5 minutes, one minute after the domUs
1,6,11,16,21,26,31,36,41,46,51,56 * * * * root /usr/local/bin/

Then, we must configure the snmpd daemon in each dom0. In /etc/snmp/snmpd.conf we add the following lines :
extend basicXenstoreMonitoring /usr/local/bin/
extend listXenCluster /usr/local/bin/
And we restart the service :
/etc/init.d/snmpd restart

Secondly, let's focus on your domUs. For the script to work, you need to have installed the xenstore-write and xenstore-rm commands. In Debian, it is quite easy because there is small package for it and all you have to do is :
apt-get install xenstore-utils
In Redhat, it is more complicated because those commands belong to the xen package. So you will have to install them on every domU and also copy the /usr/lib64/ library.

Set the script in /usr/local/lib.

Then, copy the script on the /usr/local/bin directory. You will need to change the gateway definitions in the checkNetworkConnectivity function (see explanations part below).

As for the dom0s, we create the cron file :
cat >/etc/cron.d/basicXenMonitoring <<-EOF
*/5 * * * * root /usr/local/bin/

Finally, on your monitoring server (or any server you judge valuable to collect the monitoring information), ensure that the /usr/bin/snmpget binary is installed. Copy the script in /usr/local/bin. And change the snmpget community (search at the end of the script for the toChange token).

Use it :
On the latter server, just use the command :
This command is made for "humans" : it separates errors from warnings and will list the domUs problems grouped by dom0s.
In order to make it easier for your monitoring system to parse the results, it may proves more interesting to use the following command : raw

Some explanations about the scripts : :
We just check 4 resources of the domU : CPU, swapping, disk and network connectivity. Those are the basic resources I consider we should monitor on every server (hence the name of the scripts...). Anyway, if you believe that there are other resources that should be monitored on every server, I think the script makes is quite easy to code it.

I tried to use the /proc filesystem to get information of the servers instead of using other commands or reading some configuration files (like /etc/fstab for instance). This is because I prefer getting the information from memory than accessing the disk.

In the checkDiskSpace function, you can see that I remove all the filesystems that contain the .snapshots token. This is due to the fact that for every Netapp NFS mount, we can mount lot of snapshots filesystems that don't offer valuable space information.

The most difficult function to understand may be checkNetworkConnectivity. The basic idea is to have a listing of all network definitions of my interfaces, and then to try to ping the gateways of those networks. In Xen, you can fix the MAC address of each interface. And in my case, I have a direct translation between the IP and the MAC (I consider it a good way to avoid MAC conflicts...). I build the MAC this way : MAC = 00:16 + <IP in hexa>. That is why, if I have a NIC whose MAC address begins with 00:16:ac:13:0a, I know that the corresponding IP belongs to the network and that the IP of the gateway I should try to ping is
Surely you will have to adapt this part. :
This script not only monitors the domUs but also the dom0. If you don't like that feature, just comment the call of the checkDom0 at the end of the script.
That function only monitors the memory and the disk space. I chose not to include the network connectivity check because it would be a bit complicated with my bridges configuration. What is more, if there is a problem in one of my VLAN/bridges, surely a domU will detect it!
Nor do I check the CPU with this function. This is just because I have another script to compute the Xen CPU (doing a xm list and looking at the values of the last command) to do that. :
This script does an snmpget to all the dom0s in order to get the monitoring results. That means that it should know the list of all the dom0s. I could have written that list in the script but I did not think it would be easy to administrate (you must change the script every time there is another dom0). So I preferred using another solution. On an NFS share mounted on every dom0, I have a description of all my hosts :
[root@srxen09 xenAdmin]# ls /data/config/auto/*
servidor1.cfg srxmwebmail.cfg www01.cfg www02.cfg wwwp02.cfg

srxmbf.cfg srxmcca-pre02.cfg srxmcolumbusd.cfg srxmldap-pre.cfg srxmlinuxweb01.cfg srxmtest2.cfg srxmtest.cfg srxmwasd02.cfg

servidor2.cfg www03.cfg wwwp01.cfg


Here, srxen01, srxen02, srxen03, srxen09 and srxen10 are some of my hosts. And you can see the configuration files of the domUs running on each of those hosts. So the script does an snmpget call to the listXenCluster script in order to know the dom0 list. As I have 2 datacenters with 2 NFS shares, I must call listXenCluster 2 times (doing an snmpget to the principal dom0).
Such solution may seem a bit complicated, so please don't hesitate to change it and hard-code your dom0 list if you don't like it.

Tuesday, May 18, 2010

My bashrc changes

Apart from the traditional personal alias lines, here are the commands I do like to add in the /etc/bashrc of every server.

The first one is dedicated to everybody who uses to tease me with the song "linux is only black and white". It may be less visual that other OS, but we can manage to add some colours to make it easier (and prettier) :
PS1='[\[\033[0;31m\]\u\[\033[0m\]@\[\033[1;31m\]\h \[\033[0;35m\]\W\[\033[0m\]]\$ '

Second set of commands is to get a more powerfull bash history. Knowing what has happened on your server is fine, but knowing when it happened is also a very valuable information. And let's add some colours again :
MY_BASH_BLUE="\033[0;34m" #Blue

Sunday, April 4, 2010

permissions in websvn

Subversion is a great tool for developers but it can also be very useful to centralize your changes of the configuration files or the scripts you deploy on your servers. To make it easier to use (in a read only way...), you can install websvn, that is a web front-end to your subversion repositories.
This little howto aims to explain how to install a websvn server (the web server for subversion repositories) in such way that we can easily enforce user permissions. To achieve it, we will use an apache/ldap authentication and we will hack a bit the websvn code to be able to use the same access rights file in subversion and in websvn and to fix a bug that appears when you have the same repository name in two different parent paths.
This howto works here with a Redhat 5 server and the client used is a Debian 5 desktop.

To explain the permission feature and show that the bug is fixed, I will create 2 parent path directories, each one containing 3 subversion repositories :
  • python, with repositories : dns, databases and network
  • java, with repositories : web1, web2 and dns
First, let's start installing subversion and apache. Since you can find many howto on the internet doing this, I won't give a lot of explanations about this part. If you just copy-paste the following commands, just remember to change the LDAP server, user and password in the /etc/httpd/conf.d/ldapConfiguration.included file and insert your LDAP user in the /etc/subversion/svn-acl* files.
yum install subversion httpd mod_dav_svn
mkdir -pv /opt/subversion/{python,java}
svnadmin create /opt/subversion/python/dns
svnadmin create /opt/subversion/python/databases
svnadmin create /opt/subversion/python/network
svnadmin create /opt/subversion/java/web1
svnadmin create /opt/subversion/java/web2
svnadmin create /opt/subversion/java/dns

cat >> /etc/httpd/conf.d/subversion.conf <<-EOF
<Location /python>
Include /etc/httpd/conf.d/ldapConfiguration.included

SVNParentPath /opt/subversion/python
AuthzSVNAccessFile /etc/subversion/svn-acl-python.conf

<Location /java>
Include /etc/httpd/conf.d/ldapConfiguration.included

SVNParentPath /opt/subversion/java
AuthzSVNAccessFile /etc/subversion/svn-acl-java.conf

cat > /etc/httpd/conf.d/ldapConfiguration.included <<-EOF
DAV svn

AuthBasicProvider ldap
AuthzLDAPAuthoritative off
AuthLDAPURL ldap://,dc=org?uid
AuthLDAPBindDN uid=readerall,ou=special,ou=users,dc=example,dc=org
AuthLDAPBindPassword passwordToChange

AuthType Basic
AuthName "Subversion control"
Require valid-user

cat > /etc/subversion/svn-acl-python.conf <<-EOF
# These ACLs are for http method of subversion
# You don''t need to reload apache to get the permissions enforced. Just modify the file and save it.

pythondev = smith,mike,robert

# everybody can read this repository
* = r
@pythondev = rw

mike = rw

@pythondev = rw

cat > /etc/subversion/svn-acl-java.conf <<-EOF
javadev = mike,sarah,arthur,elen

@javadev = rw

daniel = r
@javadev = rw

@javadev = rw

rm -f /etc/httpd/conf.d/authz_ldap.conf
chown -R apache /opt/subversion/

chkconfig httpd on
service httpd start
Now that subversion is installed and configured, you can do some tests on your computer :
apt-get install subversion
svn --username mike checkout http://subversionserver/python/dns
cd dns
echo world > hello
svn add hello
svn ci hello

OK, subversion seems to work fine! So let's install websvn now.
Download the last tar.gz version from the project website and place this archive to the /tmp directory of your subversion server.
cd /tmp
tar xvfz websvn*tar.gz
rm -f websvn*tar.gz
cd websvn*
mkdir -pv /opt/websvn/prod
mv * /opt/websvn/prod/

cat > /etc/httpd/conf.d/websvn.conf <<-EOF
Alias /websvn /opt/websvn/prod/
<Directory /opt/websvn/prod/>
Options MultiViews
DirectoryIndex wsvn.php

AuthBasicProvider ldap
AuthzLDAPAuthoritative off
AuthLDAPURL ldap://,dc=org?uid
AuthLDAPBindDN uid=readerall,ou=special,ou=users,dc=example,dc=org
AuthLDAPBindPassword passwordToChange

AuthType Basic
AuthName "Subversion control"
Require valid-user
Remember to change the previous file with your ldap credentials.
Now we will apply some patches on websvn. The files I have changed work with the webSvn 2.3.0.
In order to help you to apply the patches on another version, I will both give you the file and my RCS file (the first version of RCS being the official one).

The first file you should change is the template /opt/websvn/prod/templates/calm/header.tmpl that shows a bug when you are looking at the root web page (download the normal version / the RCS version).

Second file is more important. It is a PHP library : /opt/websvn/prod/include/configclass.php. My version of this file fixes the bug I mentioned before : if you have two repositories with the same name in different parentPath, the uncorrected webSvn will only show one of them. In our current example, that would mean you would only see python/dns, not java/dns. What is more, my version adds the useAuthenticationFileGroup function that lets you define permissions at the parentPath level (php is far from being my favorite script language so please forgive me if my code is not perfect...). I will explained it in a few lines. You may download the text version here and the RCS version here.

Now, we shall edit the webSvn main configuration file. Start with copying the sample file :
cp /opt/websvn/prod/include/distconfig.php /opt/websvn/prod/include/config.php
Then edit the copy you made.
First, add the two parentPaths : those are the directories that contain the subversion repositories you create at the beginning of this howto.
Then we only activate the template we fixed :
Other important configuration lines are :
$config->useTreeIndex(true); // Tree index, open by default
Finally, we must link the parentPaths with the permission files we wrote before :
Here you can see the function I wrote in the patched library file. If you don't use this function, you would have to use the standard useAuthenticationFile function. That would mean, writing 6 lines instead of 2 and remembering to change this configuration every time you create or delete a repository.

The webSvn installation ends like that :
yum install php-pear
ln -s /opt/websvn/prod/templates /var/www/html
service httpd reload
Finally, just type the http://subversionserver/websvn URL in a brother to check that everything is OK.

Tuesday, March 16, 2010

sqlite : a light and easy to use database

Sometimes you would like to use the power of the SQL language to handle your data but without all the burden of a database. Indeed, why would you need a heavy software that can manage many different accounts, cluster configurations, and can listen on a network interface if the database will only be accessed just a few times each hour by a single software/script on the localhost? Sqlite proves to be the perfect solution for such situation and has the following assets:
  • serverless database : forget the client-server scheme! Sqlite will only consume CPU and memory when you use it.
  • zero configuration : just install the sqlite binaries and use it!
  • single database file : the whole database stands in a single file. So it is very easy to handle or move to another computer.
In my case, I mainly use sqlite on my xen dom0 servers. I have a cron-script that is executed every 10 minutes, that collect statistic values about the running domUs and that stores them in the database (something similar to sar software). For such critical dom0 servers, I wanted to access a local low memory footprint database.

Let's give an introduction about how to use it.

On Debian 5, installation follows the traditionnal method :
apt-get install sqlite3
Then, you can start and create a new sqlite database with the sqlite3 command. Just give as an argument the name of the file that will store your database :
luangsay@ramiro:/tmp$ sqlite3 foo.db
sqlite> CREATE TABLE sa(date int(5), server varchar(20), load5 float(2), iowait int, eth0rx int, eth0tx int, eth1rx int, eth1tx int);
You can use the .help command to see the list of meta-commands. For instance, if we want to list all the tables and know their structure, we would type :
sqlite> .tables
sqlite> .schema sa
CREATE TABLE sa(date int(5), server varchar(20), load5 float(2), iowait int, eth0rx int, eth0tx int, eth1rx int, eth1tx int);
And here are some basic insert/select commands :
sqlite> insert into sa values(0955, 'server1', 1.4, 32, 187474, 18747, 0, 0);
sqlite> select * from sa;

Of course, you don't have to use sqlite3 to manage your database. You may use your favorite programming language. Here is an example for python.

First, install the software :
apt-get install python-pysqlite2
Then, on the python interpretor, you may type :
>>> import sqlite3
>>> conn = sqlite3.connect( '/tmp/foo.db')
>>> conn.row_factory = sqlite3.Row
>>> cursor = conn.cursor()
>>> cursor.execute('select * from sa')

>>> for line in cursor: print line['date'], line['server'], line['iowait']
955 server1 32
>>> conn.commit()
>>> cursor.close()

If you want to discover a bit more about this database, you may refer to the official website.

Saturday, March 6, 2010

perl debugger

Perl is (still) a popular scripting language in system administration but it seems to me that few administrators know it has a debugger. I believe that it is a bit sad because this debugger is quite powerful and it can ease your job a lot at writing script and enable you to do it faster. So this article is a short introduction to this useful tool.

To explain how to use the debugger, we need a sample script. Here is a very basic one :
luangsay@ramiro:/tmp$ cat
#!/usr/bin/perl -w

use strict;

my $bar = {};

sub myFunc {
$bar->{'one'} = 1;
$bar->{'two'} = 2;
print $bar

sub callFunc {


One of the feature I do like, is that the debugger is embedded in the perl interpreter (not like with python, which needs the pdb program). To activate the debug mode, just call the interpreter with the d switch :
luangsay@ramiro:/tmp$ perl -d ./

Loading DB routines from version 1.3
Editor support available.

Enter h or `h h' for help, or `man perldebug' for more help.

main::(./ my $bar = {};

First thing you can do to learn the debugger is looking at the help :
DB<1> h
List/search source lines: Control script execution:
l [ln|sub] List source code T Stack trace
- or . List previous/current line s [expr] Single step [in expr]
v [line] View around line n [expr] Next, steps over subs
f filename View source in file Repeat last n or s

If you have a very complicated script, it may be useful to list the loaded perl modules :
DB<1> M
'' => '1.08 from /usr/share/perl/5.10/'
'Carp/' => '/usr/share/perl/5.10/Carp/'
'' => '/usr/lib/perl/5.10/'
'' => '/usr/lib/perl/5.10/'
'' => '5.62 from /usr/share/perl/5.10/'
'' => '1.23_01 from /usr/lib/perl/5.10/'
'IO/' => '1.27 from /usr/lib/perl/5.10/IO/'
To view some lines in the script, we can use the l command :
DB<1> l
5==> my $bar = {};
7 sub myFunc {
8: $bar->{'one'} = 1;
9: $bar->{'two'} = 2;
10: print $bar
11 }
13 sub callFunc {
14: myFunc()
We can see that the debugger is ready to execute the first instruction of the script at line 5. To execute this instruction, we type n:
DB<1> n
main::(./ callFunc()
Now, we are about to call the callFunc function. If we want the debugger to execute just one step inside this funcion, we must use the s command :
DB<4> s
main::callFunc(./ myFunc()

Let's say now that we want to know the value of the bar variable before printing it. We must therefore execute the script till line number 10 :
DB<5> c 10
main::myFunc(./ print $bar
DB<6> v
7 sub myFunc {
8: $bar->{'one'} = 1;
9: $bar->{'two'} = 2;
10==> print $bar
11 }
13 sub callFunc {
14: myFunc()
15 }
We have just entered in another function. The T command gives us a listing of all the function calls that were made to get there :
DB<9> T
. = main::myFunc() called from file `./' line 14
. = main::callFunc() called from file `./' line 17

To view the value of a variable, you may use the p command. But as $bar is a reference of a hash, it proves to be better to do a dump of the variable with the x command. This latter command is very useful for instance to know the state of a huge perl object.
DB<6> p $bar
DB<7> x $bar
0 HASH(0x82496b8)
'one' => 1
'two' => 2

The last feature I would like to explain, is the ability to run perl commands directly in the debugger. This enables you to change values on the fly or to do some tests . For instance, we can load another module to get the time or change the value of $bar->{'two'} :
DB<15> use HTTP::Date;
DB<24> p time2str($time);
Sat, 06 Mar 2010 13:08:40 GMT
DB<25> $bar->{'two'} = 3;
DB<31> x $bar
0 HASH(0x824bfd0)
'one' => 1
'two' => 3

I hope this introduction will make you want to discover a bit more the perl debugger!

Saturday, February 20, 2010

My favourite bash shortcuts

As a system administrator, the bash may be the linux tool you use most. So it may be a good idea to know a bit about the shortcuts that may speed up your job.
Here are my favourite ones :
Ctrl-a , to go to the beginning of the line
Ctrl-e , to go to the end of the line
Alt-b , to move backward one word
Ctrl-u , to delete everything to the left of the cursor
Ctrl-w , to delete the previous word

cd - , to go back to the previous directory

Ctrl-r , to search a previous command in the bash history. This, is a MUST shortcut. The one I prefer. Just type Ctrl-r and then, a few letters of a command that is in the history to find it and be able to execute it (the bash will do a pattern matching on the commands in your history). To look further in the history, you can type many times Ctrl-r.
Alt-. , to print the last argument of the last command. You can type many times this shortcut to look for an argument further in your history.

^foo^bar, to substitute the "foo" pattern in your previous command by the "bar" pattern and execute it.

Saturday, January 9, 2010

install openais high-availability cluster

Openais is a free opensource solution to create cluster. It forked from the heartbeat project and gives more features than that latter project. With Openais, you can configure a many nodes high-availability or load-balancing cluster. To my view, it proves to be a great and efficient solution and can sustitute expensive softwares such as Sun Cluster, IBM HACMP or Redhat Cluster Suite.

This article aims to present a quick howto in order to install a 2 nodes high-availability cluster.
Clustered service here is just an apache web. Quite easy indeed but if you understand how to do it, you'll be able to handle very complicated clusters. Servers are RHEL 5 64 bits although I have tried it successfully on RHEL 4 64 bits. The only difference is that for a RHEL 4, you need to install a more recent version of python (I compiled 2.4.6 version and it worked great) and change the shabang of the crm python command. Hardware is xen domU virtual servers but I don't think it really matters, as long as you manage to have a shared quorum disk (with SAN for instance).
A last word about "hardware", you must ensure that you have multicast enabled on the switchs that rely the two nodes.

OK, so let first start this howto creating the quorum disk. If you don't use xen but real hardware, you just need to know that I am going to create a 512Ko disk that is visible on both node as /dev/xvdc. My xen domUs names are and On my domOs, each domU has its own folder under /data/xm. So, here are the commands to execute on the dom0:
cd /data/xm/srxmtest7/
dd if=/dev/zero of=sharedDiskSbdClustertest7 bs=1k seek=512 count=1
Edit the domU configuration file and add the following line in the disk part :
Then, we add the disk on the second node :
cd ../srxmtest8
ln -s /data/xm/srxmtest7/sharedDiskSbdClustertest7 .
And in the srxmtest8 configuration file we add :
Then, start the two servers.

OK, now we've finished with xen configuration. Now we are going to work on srxmtest7 and srxmtest8. To distinguish, I will use
for commands to be executed on srxmtest7,
for commands to be executed on srxmtest8 and
for commands to be executed on both servers. You may use clusterssh software in order to do it (see my previous post on this blog).

Now, we are going to install the RPMs we need. Redhat does not support the whole openais/pacemaker solution so we can not use its repositories. So we have to get them from the pacemaker website. Take care not to mix those openais RPMs with Redhat openais RPMs because they are not compatible. Here are the RPMs you must install from that repository :
[both] openais pacemaker cluster-glue cluster-glue-libs heartbeat libopenais2 pacemaker-libs resource-agents
If you want to ensure that you won't install any Redhat RPM that could update those Suse RPMs, modify your yum configuration :
[both] echo -e "\n\nexclude=openais pacemaker cluster-glue cluster-glue-libs heartbeat libopenais2 pacemaker-libs resource-agents" >> /etc/yum.conf
Then, configure automatic startup :
[both] chkconfig heartbeat off
[both] chkconfig openais on
In order to avoid a deathmatch stonith problem (that could happen if you have a multicast network problem for instance), you may sustitue the /etc/init.d/openais script on each host by this one. The change I made detects if the server has rebooted more than 3 times since the start of the day, and if so, does not start openais.

We must now set up openais basic configuration :
[both] cd /etc/ais
[srxmtest7] ais-keygen
[srxmtest7] scp authkey root@srxmtest8:/etc/ais
[both] vim openais.conf
In the configuration file, we will change the bindnetaddr parameter. My servers srxmtest7 and srxmtest8 have the following IPs : So bindnetaddr will be You must also ensure that the couple mcastaddr/mcastport is unique among whole your clusters. What I usually do is adding to the multicast address the last two bytes of my first node IP. So mcastaddr would be Another thing I like to change is modifying the logging facility to be unique. For instance, I would write "local4". The purpose is to have all the openais logs in a separated file :
[both] vim /etc/syslog.conf
Change the common line to :
*.info;mail.none;authpriv.none;local4.none /var/log/messages
And add the following line :
local4.* /var/log/openais.log
Last thing we have to do is configuring the changelog definition :
[both] mkdir /var/log/openais
[both] vim /etc/logrotate.d/openais
Write the file like this :
/var/log/openais.log {
rotate 28
olddir openais
/bin/kill -HUP `cat /var/run/ 2> /dev/null` 2> /dev/null || true
We can now restart the syslog service :
[both] service syslog restart
We also want to regularly delete the PEngine temporary files. Edit the /etc/cron.d/openais file :
# erase PEngine files every saturday
33 23 0 0 6 /usr/bin/find /var/lib/pengine -daystart -type f -ctime +7 -exec /bin/rm -f {} \;

Basic openais configuration has been done so we may start the cluster service :
[srxmtest7] service openais start
You can control the how the nodes are joining the cluster with the crm_mon command :
[srxmtest7] crm_mon
(use Ctrl-C to quit). After a few seconds, start the cluster on the other node :
[srxmtest8] service openais start
Once the second node has joined the cluster, you should see something like that :
[root@srxmtest7 ~]# crm_mon -1
Last updated: Mon Jan 11 08:58:28 2010
Stack: openais
Current DC: - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
0 Resource configured.

Online: [ ]

Cluster seems to work. So the next step is to secure fencing behavior with the quorum disk. This is done with the stonith software.
[both] cat > /etc/sysconfig/sbd <<END
The sbd cluster resource script is quite buggy so you should change the /usr/lib64/stonith/plugins/external/sbd file by this one. The change I made was to fix the status function. Then, me must initialize the quorum disk :
[srxmtest7] sbd -d /dev/xvdc create
[srxmtest7] sbd -d /dev/xvdc allocate
[srxmtest7] sbd -d /dev/xvdc allocate
Now we're ready to configure our first openais service. This is done with the crm configure primitive command :
[srxmtest7] crm configure primitive sbdFencing stonith::external/sbd params sbd_device="/dev/xvdc"
And to have this service executed on both node :
[srxmtest7] crm configure clone fencing sbdFencing
Quorum behavior is quite special for a 2 nodes cluster and we don't want the cluster to crash if one node is down :
[srxmtest7] crm configure property no-quorum-policy=ignore
To get stonith fully operational, we must restart the openais service :
[both] service openais restart
We can check that stonith is working with the following commands :
[root@srxmtest7 ~]# sbd -d /dev/xvdc list
0 clear
1 clear
[root@srxmtest7 ~]# pgrep -lf sbd
27527 /usr/sbin/sbd -d /dev/xvdc -D -W watch

Now, let configure the cluster IP service. For all your clients and remote application, this is the only relevant IP, the one where all the clustered services should be bound and the one that should never be down.
[both] crm configure primitive clusterIP ocf:heartbeat:IPaddr2 params ip= cidr_netmask=24 op monitor interval=10s
The IP should have the same network definition as one of your real IPs definitions. Don't be surprised if you can't see the new IP definition with the ifconfig -a command. If you want to see it, you may use the crm configure show command :
[root@srxmtest7 ~]# crm configure show
primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip="" cidr_netmask="24" \
op monitor interval="10s" \
meta target-role="Started"
primitive sbdFencing stonith:external/sbd \
params sbd_device="/dev/xvdc"
clone Fencing sbdFencing \
meta target-role="Started"
property $id="cib-bootstrap-options" \
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
stonith-enabled="true" \
no-quorum-policy="ignore" \
And to know which node holds the cluster IP, you can use the crm_mon command :
[root@srxmtest7 ~]# crm_mon -1
Online: [ ]

Clone Set: Fencing
Started: [ ]
ClusterIP (ocf::heartbeat:IPaddr2): Started
To give more stability to your services, I would recommend configure stickiness :
crm configure property default-resource-stickiness=100
And in a basic configuration, you just want your cluster to be event driven so you don't need to check properties based on time :
[srxmtest7] crm_attribute -n cluster-recheck-interval -v 0

The second service we must install is the apache one. We use a classical redhat installation :
[both] yum install httpd
To enable the openais monitoring of the service, we must create the /var/www/html/index.html. Just edit the file and write whatever you want (the hostname of the server for instance). Then, you can register the service in openais :
[srxmtest7] crm configure primitive apache ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=10s

Now we want to get sure that the web service is on the same node as the cluster IP service :
[srxmtest7] crm configure colocation website-with-ip INFINITY: apache clusterIP
What is more, we want Apache service to start after the cluster IP service. Because of the colocation command we just passed before, this is not really necessary but it may be important to stop the services in the proper order ( especially if you have a LVM or filesystem service instead of a mere IP service!) :
[srxmtest7] crm configure order apache-after-ip mandatory: clusterIP apache
Et voila! Cluster configuration is just finished and just works fine!

To better understand what you have just done and to improve your skills in openais/pacemaker, you may refer to the following links :
Another how to install a Apache cluster, deeper explained, with the DRBD and OCFS2 resources.
Pacemaker explained, this PDF will tell you all you need to know to administrate your cluster.
If you want to use your cluster with a software not supported by pacemaker, you will need to write a resource script :
This website, although no longer maintained, explains the basis to write a script.
But to really understand what you need to write, the opencfg draft is the reference.
Finally, you may also need to have a look at the mailing lists archives : the linux-ha mailing list and the pacemaker mailing list.