Adding a MogileFS proxy

As noted previously, I’m using MogileFS as the data store for some high(-ish)-performance computing. So far, it does the job without complaints. I haven’t benchmarked it, but I’m not asking that much.

One issue is that it uses its own proprietary access protocol … sortof. A simple Mogile read goes something like:

  1. Computer A decides it wants a file from the Mogile cluster. It contacts one of the trackers using the Mogile protocol. [Computer A needs to know the hostnames of the tracker(s) in advance.]
  2. The tracker has a think, checks the database, and returns the path to the requested file on one of the storage hosts. This path is a normal URL. It passes this URL back to the client.
  3. Computer A uses HTTP to get the file from that URL on the storage host.

The fly in the ointment, of course is that the original client needs to speak Mogile to the tracker. I’m using a hacked version of SeattleRB’s Mogile-client Ruby gem in all of my various programs and clients to access Mogile directly.

One of those programs is a web front-end for browsing the data. When it wants to insert a Mogile asset into a web page (an image, typically), it will contact the tracker, get the storage URL, and insert it directly into the page, so you end up with HTML that contains code like:


<img src="http://my.mogile.storaged:7500/dev1/0/000/405/0000405859.fid">

The problem is that all of my Mogile trackers and storage nodes are behind a firewall (as they should be). When I’m on campus, this works fine, as my desktop can see both the web server and the storage nodes.

When I work off campus, I can use an SSH tunnel to get to the web server, but all of the storage node paths break terribly.

I’d like to add a web proxy, preferably running on the same web server as my web app, which can handle the Mogile requests. Instead of returning raw Mogile URLs, the web app can pass paths to the proxy, the proxy can get the data from Mogile, and send it back as HTTP. As my web app is mostly for browsing, the proxy can even be read-only.

How I did it

As a first attempt, I was more interested in getting a solution running than any sense of performance. As my web app is in Rails 3, I knew I could basically delegate a path to a separate, self-contained Rack server. The Resque control panel does this, for example.

As a sketch, then, I want any path of the form:


http://webserver.com/mogproxy/some/key/something

to return whatever Mogile has stored under the key /some/key/something

The first step was to add a route to the application in routes.rb:

I put my application code at lib/mogproxy/ which resulted in a bit of a search on autoloading files and require paths. The solution I arrived at looked something like this in application.rb

It may be possible to do away with the require through clever use of the autoload path, but I didn’t feel the automatic-ness was really worth it. Note the require has to go at the end after the autoload_path is set.

The application below is written in Sinatra because I’ve used it before and it’s handy. It’s pretty simple — take whatever path comes in and look it up in Mogile. Return the result.

[Yes, I know, my regexp-fu is weak…]

As a side note, the MogileModule::Mogilable and mog bits are code I wrote to make a class (*cough* global) variable Mogile client instance, which can be configured once and used throughout the application.

It’s hardly efficient because it loads the asset from Mogile then spits it back out at the client. And the error checking is basically absent. But it works.

On the other side, I then rewrote my client code to return “/mogproxy/….” paths instead of looking up the mog paths themselves.

Leave a Reply

Your email address will not be published. Required fields are marked *