Experiments with MogileFS

OK, OK, I’m a server nerd.    More to the point, I’m a storage nerd, and I have a big man crush on Isilon.  The whole idea just seems right — everytime you add disks, you also add CPUs and network interfaces to keep your chakras in balance.   And all your nodes form a massive Infiniband cabal of data and metadata sharing, providing plausible deniability in the event of hardware failure (can you tell I’ve been reading Among the Truthers?)

Anyway, I’m drawn to every project which claims to offer similar properties in an open-source format.  Today’s example: MogileFS.

I successfully installed Mogile on a low-end Ubuntu machine, so I’ve decided to set up a fresh install on handy, much gruntier CentOS 5.5 machine.   I’ll start with a fairly simple installation using the Mogile tracker (mandatory), and their backend storage server (optional, could be any WebDAV server, I think).    Just one node at first, as both tracker and storage, with room for expansion, obviously.  No other load balancers or proxies, as this is mostly for distributing data during processing and I expect a lot of random access.  MySQL database.

I’m trying to install as normal as much as possible rather a system-level user.  Obviously I’m using root privileges to install new RPMs..

I’m basically tracking the HOWTO from the MogileFS Google Code wiki.

I assume all of the necessary package dependencies are taken care of (like a Mysql server, Perl, and the Perl-Mysql libraries).

Pull the latest code from CPAN:

sudo perl -MCPAN -e ‘install “MogileFS:Server”;’ -e ‘install “MogileFS:Utils”‘

On this CentOS 5.5 system, the default system CPAN/Perl configuration didn’t seem to be current enough to find the packages. I had to do a massive CPAN package update to get this to work.

Once the Perl is all loaded, create the necessary Mysql entries.

$ mysql -p
Enter password:

Type ‘help;’ or ‘h’ for help. Type ‘c’ to clear the buffer.

mysql> CREATE DATABASE mogilefs;
mysql> GRANT ALL ON mogilefs.* TO ‘mogile’@’%’;
mysql> SET PASSWORD FOR ‘mogile’@’%’ = OLD_PASSWORD( ‘sekrit’ );
mysql> FLUSH PRIVILEGES;
mysql> quit

And get mogile to setup its databases:

$ mogdbsetup –dbname=mogilefs –dbuser=mogile –dbpassword=foobar

With a clean installation, the first step is to set up one or more trackers. First, create a configuration file (I’m using ~/usr/etc/mogilefsd.conf)

db_dsn = DBI:mysql:mogilefs;port=3306;mysql_connect_timeout=5
db_user = mogile
db_pass = barfoo
conf_port = 7001
listener_jobs = 5
node_timeout = 5
rebalance_ignore_missing = 1

Relative to the HOWTO, I’ve omitted the bit in db_dsn which specifies the database server, as it’s running on the same machine.

Then start the tracker with:

$ mogilefsd -c ~/usr/etc/mogilefsd

Without the “–daemonize” flag, it will run in the foreground for testing.

To simplify later steps, store the IP addresses of my trackers in a conf file:

$ echo “trackers = 127.0.0.1:7001” > ~/.mogilefs.conf

Next configure a storage server (again, on the same machine .. it’s just a test setup).

Create a configuration file (~/usr/etc/mogstored.conf for me)

httplisten=0.0.0.0:7500
mgmtlisten=0.0.0.0:7501
docroot=/home/myhome/usr/var/mogdata

Before starting the daemon, add the storage server to the database, and the devices to the server:

$ mogadm host add wrcws –ip=132.181.86.104
$ mogadm device add wrcws 1

I manually specified the IP address for the host because I didn’t want to inadvertently decide the server was at ‘127.0.0.1’ — which would break access from other machines.

You can check your hosts and devices with:

$ mogadm host list
wrcws [1]: down
IP: 132.181.86.104:7500

$ mogadm device list
wrcws [1]: down
used(G) free(G) total(G)
dev1: alive 0.000 0.000 0.000

Of course, wrcws shows as down because I haven’t started the server yet. Do that with:

$ mogstoraged -c ~/usr/etc/mogstored.conf

Again no “–daemonize” so we get lovely debugging in the foreground.

Hmm. That didn’t fix things immediately.

Ah, forgot Centos comes with SELinux on full-attack by default. Add access through firewall to ports 3306 (MySQL), 7500 and 7501 (Mogilefs storage), 7001 (Mogilefs tracker).

Restarting the daemons provided a bit of success.

Ahha. Also, I need to remember that the devices (“dev1”) must be created as a directory under your “docroot” (or, a whole disk drive could be mounted there, for example). Forgot to do that and got slightly cryptic messages:

[Fri Jul 15 00:35:02 2011] [monitor(20561)] Port 7500 not listening on 132.181.86.104 (http://132.181.86.104:7500/dev1/usage)? Error was: 404 Not Found

Actually creating the dev directories:

$ mkdir ~/usr/var/mogdata/dev1

leads to something like:

$ mogadm check
Checking trackers…
132.181.86.104:7001 … OK

Checking hosts…
[ 1] wrcws … OK

Checking devices…
host device size(G) used(G) free(G) use% ob state I/O%
—- ———— ———- ———- ———- —— ———- —–
[ 1] dev1 376.754 0.421 376.333 0.11% writeable 0.0
—- ———— ———- ———- ———- ——
total: 376.754 0.421 376.333 0.11%

Neat.

Now a quick test. Create a domain and class:

$ mogadm domain add testdomain
$ mogadm class add testdomain testclass

And run a quick test, sending the file /etc/services in the cluster, then getting it back out again.

$ mogupload –domain=testdomain –key=’/etc/services’ –file=’/etc/services’
$ mogfetch –domain=testdomain –key=’/etc/services’ –file=”-”
# /etc/services:
# $Id: services,v 1.42 2006/02/23 13:09:23 pknirsch Exp $
#
# Network services, Internet style
#

Great.

As a second step, I wanted to take my original Ubuntu-based server and add it to my new cluster.

On the Ubuntu machine (whose setup otherwise mirrors the one described here), modified mogilefsd.conf to point to the Mysql database. Also change my ~/.mogilefs.conf files to reflect both trackers.

trackers = wrcws.canterbury.ac.nz:7001,open.grc.canterbury.ac.nz:7001

And add the new host and devices to the system:

$ mogadm host add open –ip=132.181.86.22
$ mogadm device add open 2

Apparently device numbers need to be unique across the whole system?

And you end up with:

$ mogadm check
Checking trackers…
wrcws.canterbury.ac.nz:7001 … OK
open.grc.canterbury.ac.nz:7001 … OK

Checking hosts…
[ 1] wrcws … OK
[ 2] open … OK

Checking devices…
host device size(G) used(G) free(G) use% ob state I/O%
—- ———— ———- ———- ———- —— ———- —–
[ 1] dev1 376.754 0.423 376.331 0.11% writeable 0.0
[ 2] dev3 213.182 138.408 74.774 64.92% writeable 0.0
—- ———— ———- ———- ———- ——
total: 589.937 138.831 451.106 23.53%

Awesome.