POST
Calling Matlab from Python in a Docker container
I’m supporting a project which will install a new instrument on the OOI Cabled Array in the summer of 2018. Unlike many of our other projects, this is a “mature” instrument which was previously installed on the Ocean Networks Canada “Neptune” array for almost five years.
While the instrument was on the Canadian cable, a set of post-processed data products were defined, and over time responsibility for making those products moved from the inidividual scientsts to the Canadian Data Management and Archiving (DMAS).
At least initially, we will be taking that resposibility back, doing the post-processing ourselves. Given my experiences with CamHD, I volunteered to manage that data processing.
The one minor hitch is that the science team on this project works in Matlab. I don’t have a major issue with Matlab, and it there’s little point in trying to force the scientists to change tools. On the other hand, I’d like to have the flexibility to scale the processing out across multiple machines, onto virtual instances, etc., and I do see Matlab as an impediment to that.
Luckily, Mathworks does have a fairly robust set of tools for “compiling” Matlab code into a C/C++ library, Python module, etc. Never mind that it still requires a Matlab Runtime.
With that in mind, my initial goal was to have some sort of portable (Docker) unit which will run the project M-code.
My solution is split into three parts:
- a generic Docker image with the Matlab Runtime Engine (on Docker Cloud and Github);
- the scientists' Matlab code (which isn’t public);
- and an application-specific Docker image containing both Matlab and the Matlab-generated Python module (Github).
Making the covis-postprocess
image is a two-step process. First, mcc
, the Matlab compiler, must be used to convert the M-files to a Python module. Then this module and some wrapper stuff are built into a Dockerfile. The first step must be run on a machine with Matlab installed, of course.
I made a rule for Matlab compilation in the Makefile:
build_matlab: ${COVIS_REPO}/covis_*.m
${MCC} -v -d pycovis-postprocess/ -W python:pycovis.postprocess -T link:lib $^
Pretty straightforward. -d
sets the output directory, and the resulting Python module will be called pycovis.postprocess
. All of the specified .m
files are compiled in.
The Dockerfile is similarly straightfoward (I’ve cut out some of the application-specific details):
FROM amarburg/matlab-runtime:latest
Starting from my Matlab runtime image, install some necessary packages.
RUN apt-get install -y python3 python3-pip libgl1-mesa-dev libxt-dev
RUN pip3 install pytest
Copy the compiled Matlab and the mw_python
script into the image
COPY pycovis-postprocess /root/pycovis/pycovis-postprocess
COPY mw_python /root/pycovis/
Actually install the python module
WORKDIR /root/pycovis/pycovis-postprocess
RUN python3 setup.py install
Set Python as the entrypoint
ENTRYPOINT ["./mw_python"]
The mw_python
script is a thin
wrapper which sets a few env. variables for Matlab.
Just running the resulting image will drop you to the Python command line inside the image.
Scripts can also be passed into the image using stdin:
cat version.py | docker --rm -i amarburg/covis-postprocess
In this case, version.py is a simple test script which demonstrates instantiating the Matlab module and calling a couple of simple functions:
import pycovis.postprocess as postprocess
postprocess.initialize_runtime(['-nojvm','-nodisplay'])
pp=postprocess.initialize()
print("Matlab version: %s" % pp.version() )
print("COVIS postprocessing version: %s" % pp.covis_version() )
pp.terminate()
version
is a built-in Matlab function, and covis_version
is a user-supplied function. This gives:
% cat scripts/version.py | docker run -i --rm amarburg/covis-postprocess
Creating MATLAB Runtime Cache at location: /tmp/.mcrCache9.3
.max_size not found. Using default size of 33554432 bytes.
MATLAB Runtime cache extracting component: postprocess_F10ACC5AF04A449C99DA302AF6B5B4FA
Acquiring MATLAB Runtime cache root-level directory lock... acquire succeeded.
Reading cache index file...
File open failed for /tmp/.mcrCache9.3/.mcr_cache_index
MATLAB Runtime cache: extractDir is /tmp/.mcrCache9.3/postpr0
Adding component postprocess_F10ACC5AF04A449C99DA302AF6B5B4FA to the cache.
MATLAB Runtime Cache: performing maintenance...
Processing cached components...
Done with cache maintenance.
Creating component directory: /tmp/.mcrCache9.3/postpr0
Acquiring component directory WRITE lock... acquire succeeded.
Extracting component... Component extracted to cache. Writing creation timestamp...
Timestamp successfully created.
done.
Downgrading WRITE lock to READ lock... downgrade successful.
Component postprocess_F10ACC5AF04A449C99DA302AF6B5B4FA has successfully been accessed from the cache.
MATLAB Runtime Cache: performing maintenance...
Processing cached components...
Done with cache maintenance.
Checking whether index file /tmp/.mcrCache9.3/.mcr_cache_index needs to be written...
Write is needed.
Writing cache index file: /tmp/.mcrCache9.3/.mcr_cache_index
Writing cache index entry:
postprocess_F10ACC5AF04A449C99DA302AF6B5B4FA
postpr0
2588153
2017-Oct-27 22:53:58.561173
Matlab version: 9.3.0.713579 (R2017b)
COVIS postprocessing version: COVIS Post-processing v1.0
%
Obviously there’s a lot of debugging output, but as shown, we’re getting output from Matlab. Beer time!