RAM Parabolic Equation Code, Matlab Style
These are notes on using Collins' RAM PE Code
for long range acoustic calculations. The specific package here is
implemented in matlab as developed by Matt Dzieciuch. See also a comparison between RAM and Kraken.
Codes for Acoustic Propagation
- RAM in Fortran 95. The suite of matlab scripts described on this page have been converted to Fortran 95 and parallelized using MPI.
- RAM in Matlab. This page.
- Eigenray A package for calculating acoustic rays and travel times by ray tracing.
- Modes in Matlab. A simple matlab script for calculating acoustic modes by solving an eigenvalue/eigenfunction problem.
Table of Contents
Downloads
- aogpe.zip: Tar ball of code and mex files. Includes example run "peramx.m". (Updated 12/13/06)
- sinc.m: The code requires the sinc.m file from the signal processing toolbox.
- ram.pdf: Mike Collins' Documentation of the RAM Code (0.3 MB PDF file)
- eflat.m: Flat earth transformation
- eflatinv.m: Inverse flat-earth transformation
- ridder.m: Used by inverse flat earth transformation
User Notes
Because the code is matlab, one can easily change it for whatever
purpose one may like, and it is meant to easily integrate into a
standard matlab working environment.
There is an example of how to run the code in a script called peramx.m.
Using the mex files "solvetri" and "matrc" for your particular machine
will make ram just as fast (or perhaps a little faster) than the
original fortran version.
J.C.: One good test I have found in a pe code is whether it propagates a
mode without introducing numerical coupling; I'll try that on your RAM
for various grids. M.D.: I have done the test of propagating a high
angle mode with ram and it seems to work. As you say, this is a good way
to check if the grid size is adequate.
Comments on Accuracy
Subject: Re: RAM Matlab Code
Brian, It was not my intent to alarm everyone, but the details of range
dependent modeling in acoustics is still a research issue. Even Mike
Collins would agree. My only suggestion is to make comparisons among the
various versions of RAM, i.e. different Pade coefficients, and make
comparisons to CSNAP and other range coupled normal mode codes.
b/r, Arthur [Baggeroer]
Subject: RE: RAM Matlab Code
Art and Brian - Yes, I would concur that this business of modeling range
dependent environments can get tricky, especially when the range
dependence is small scale like internal waves. Often I have found that
one's confidence in a model increases when multiple models show the same
results, like coupled mode and PE showing agreement. When Andrey showed
agreement between our PE Monte Carlo calculations and the coupled mode
theory to within 3 dB at 2000 km range I was tremendously impressed.
Best,
John [Colosi]
Parameters
Accurate calculations depend on a number of parameters, and as always
there is a trade off between speed and accuracy. It is also true that
phase errors accumulate, so if a grid is good at 100 km, it may not be
good at 1000 km.
- np: The number of pade coefficients. The
calculation is not very sensitive at all to to this number, and np=4
always seems to be a good choice.
- deltaz: The depth increment for the PE grid.
Picking the correct grid size for ultimate speed is important. While a
coarser grid is faster, it will not be as accurate. A deltaz=1.0 is
O.K., while a deltaz=0.2 will produce more accurate results but take an
order of magnitude longer to complete. This value is also dependent on
frequency, with higher frequencies demanding smaller values. For fc=250
Hz, deltaz=0.2 is probably closer to what is needed. A calculation at
ca. 1500 km range and 75 Hz required a deltaz of 0.5 for an accurate
result; deltaz=1.0 resulted in 10-20 ms increased dispersal of the
arrival pattern.
- deltar: The range increment for the PE grid.
The code is less sensitive to the range step, but this value should be
set small enough so that the PE converges. Also, N*deltar should equal
the sound-speed horizontal sampling distance, where N is an integer. So
if you've used a sound speed grid of 1000m, deltar=250 should work, but
if you use a sound speed grid of 400m, then set deltar=200m. The value
of deltar is inversely proportional to computation time.
- T: The time window. This value sets the
width of the time window for the calcuations. The value of this window
sets the number of frequencies that are required to do the calculation,
hence it is directly related to the computation costs. The value should
be tuned to the expected dispersal of the arrival pattern. Too large a
value just means that more frequencies than needed have to be
calculated.
- dzm: The depth interval used for output
depth decimation. This number is not related to the calculation, per se,
but is related to how coarsely the results are subsampled for plotting.
Too small a value results in a grainy looking figure.
- fc: The center frequency desired. 75 Hz for ATOC-like transmissions, 250-Hz for HLF-5-like transmissions.
- fs: Sample frequency, generally 4X the center frequency.
- N: The number of frequencies for the calculation=fs*T. (not a user specified parameter, except through T.)
- df: The frequency increment=fs/N. (not a user specified parameter, except through T.)
- tdelay: The time delay, which sets the start time
for the window of width T. This value should be roughly the range
divided by a nominal sound speed, c0. c0=1485 or 1490 m/s. The value of
tdelay is used to place the prediction in the correct (unaliased) time
window.
- attn: Attenuation. The default attenuation is 0.5
db/wavelength 100m below the bottom, and 5db/wavelength 300m below the
bottom. Bottom reflections are/were discouraged in this code.
Attenuation is a two row matrix, the top row being attenuation 100 m
below the bottom (0.5), the second row row being below that (5). The
exact parameterizations vary wildly in the actual ocean, and are poorly
modeled, in any case. The default choices for sediment density and speed
are unrealistic, but since the attenuation is so high one wouldn't
notice. A somewhat more realistic WAG (Wild Ass Guess) as opposed to the
SWAG (Stupid Wild Ass Guess) that I used might be rho=1.1 kg/l, cs=1700
m/s.
Mex file compilation
The package includes two C routines, matrc.c and solvetri.c, that
should be compiled to obtain mex-files for these functions. These
routines will allow the RAM code to run more than an order of magnitude
faster. There are two ways to compile these routines:
- Place the file "mexopts.sh" in the local directory and edit it for
your own situation. The file has sections for x86, xa64, and solaris
architectures, and matlab will automatically select the correct
architecture for the computer you are on. Set CC, CXX, or FC to be your
C-compiler, C++-compiler, or FORTRAN compiler, respectively. You can, in
addition, set whatever optimization flags there (e.g., "COPTIMFLAGS")
you'd like to try. With that file in the local directory, "mex matrc.c"
will compile the mex file, using the options in this local file
(otherwise it uses the default version in matlab's installation
directory). Use the "-v" option, if you want to see what is going on.
- Matt D. has put together a makefile that is also suitable to
compile the mex files. Edit this file to set your compiler and
optimization flags, and then execute "make all", which should compile
the mex files and link in the appropriate libraries. This makefile is
set up for the x86 architecture, and it will require significant
modification to work on the xa64 architecture.
Portland group compiler options include "-fastsse -tp p7", while gnu
compiler options include "-O3 -mfpmath=sse -msse -march=athlon-xp", or
"-O3 -mfpmath=sse -msse2 -march=k8" for AMD64 cpus. In linux, "cat
/proc/cpuinfo" will tell you what SSE support your cpu has. "Your
mileage may vary."
Range Dependent Sound Speed
Range dependent sound speed is implemented by defining variables:
- rp - ranges of profiles in meters
- nrp - number of profiles
- cw - matrix of sound speeds Each row is a sound speed profile.
- zw - depth of sound speeds in meters (positive down)
In general, I think it is true that the zw have to be at a fine
increment of O(1m), and the same zw should be used for all profiles.
Adding range dependence will make the code run considerably slower. The
following segment of matlab code will produce values appropriate for ram
from the sound speeds derived from the matlab mapping package. The
mapping package extracts sound speed from the WOA on 33 standard depths.
load temp.ssp I=(temp(:,1)==-1); rp=temp(I,2)*1000; nrp=length(rp);
zq=temp(2:34,1); cw=temp(:,2); cw=reshape(cw,34,nrp); cw(1,:)=[]; zw=0:1:5000;
zw=zw(:); cw=interp1(zq,cw,zw,'spline');
cw is a matrix, while rp and zw are a vectors. While it is not strictly
necessary to interpolate sound speed to the zw, you should interpolate
on your own. RAM will just linearly interpolate sound speed to the grid
size specified to deltaz. A coarsely sampled sound speed could therefore
lead to biases, false caustics, war with Iran, etc.
The source depth, sound speeds and depths require the flat-earth transformation before sending to RAM (see also downloads).
% Flat earth transform: Re=6378137; invRe=1/Re; eps=zw*invRe;
zw=zw.*(1+(1/2)*eps+(1/3)*eps.*eps); eps=eps*ones(1,nrp); cw=cw.*(1+eps+eps.*eps);
cw=cw'; % ram expects sound speed profiles in rows. eps=zsrc*invRe;
zsrc=zsrc*(1+(1/2)*eps+(1/3)*eps*eps);
To include the effects of internal waves in this calculation, sound
speed has to be defined on fine enough depth and range increments to
resolve the internal waves of interest. This makes cw a rather large
matrix.
Range Dependent Bathymetry
Implementing range-dependent bathymetry is easy. The code just needs
vectors "rb" and "zb", which are the bathymetry ranges and bathymetry
values (both in meters), respectively, along the path of interest. The
bathymetry extracted using the mapping package can be loaded in by the
following simple matlab code: load temp.bth rb=temp(:,1); zb=temp(:,2);
Adding bathymetry, other than flat bottom, to the calculation will slow
it significantly - by a factor of 5 or more. matrc needs to run every
time there is an environmental change (sound speed or bathymetry), so
the rough bottom case will take longer than the flat bottom case.
Parallelization Notes
The computations involve completely separate calculations at the
different frequencies of the broadband signal, hence the parallelization
problem in this case is trivial. One need merely farm out various sets
of frequencies to whatever computers one has available, let them run
their calculations, and then collect the results at the end. The code
has an implementation of parallelization using matlab's tcpip toolbox
already implemented, but there are other ways to do this.
The parallelization just has to fill the columns of the matrix psif.
So, one way to parallelize the calculation would be to have each
processor fill different columns, and then collect the results after
they are all done.
FAQ
- Can this RAM do 3-D? RAM cannot do azimuthal coupling although the scaling is set up for a 3-d non-coupled problem.
- The final figure is rather grainy. How would I boost the resolution of the arrival depth-travel time figure? You probably have the decimation factor, dzm, set higher than you'd like. The output grid size is dzm*deltaz.
- How come the travel times predicted by RAM are way off
compared to a similar ray prediction, yet the arrival patterns are the
same? The RAM code makes a prediction for a time window "T", and
that time window is aliased infinitely many times. Therefore, you need
to apply a reasonably accurate time delay to put the time window "T" in
the physically correct time. This involves giving the code a reasonably
accurate value for "tdelay", which roughly the range divided by a
nominal sound speed. For the Pacific, a nominal sound speed is ca. 1485
m/s, while a larger value is more appropriate for the Atlantic. The
travel times predicted by RAM are correct, if you set the time window
such that the physically correct arrival pattern (rather than an alias)
appears in it. It is helpful to compare the RAM result to a ray
prediction to be sure a good value for the time delay is being used.
- How come the travel times predicted by RAM are way off compared to a similar ray prediction? The
RAM code uses the unmodified sound speed that is input to calculate the
propagation, whereas most ray codes (e.g., eigenray) apply a flat-earth
transformation before calculating the propagation. The flat-earth
transformation accounts for the spherical earth; sound speeds and depth
are stretched slightly to be equivalent to propagation on a sphere. So
try applying the flat-earth transformation to your sound speeds before
doing the RAM calculation (see eflat.m download above), or turning off
the flat-earth transformation of your ray code, if you can.
Benchmarks
Matt's unadulterated test case (peramx.m) is used for these benchmarks.
This test case is 100-km propagation of a broadband signal in a range
independent ocean given by the canonical sound speed profile. Time
uncertainties are ±1-2 s.
Processor |
<Details |
Time |
Notes |
Intel Q6600 Quad Core |
o'clocked to 2.8 GHz 2X2MB L2 cache |
13 s. |
Intel icc compiler; "-fast"; single proc. |
Opteron Dual Core 265 |
1.8 GHz, 64 bit 1 MB L2 cache |
30 s. |
"-O3 -msse3 -march=opteron" |
Opteron Dual Core 265 |
1.8 GHz, 64 bit 1 MB L2 cache |
Two procs 31 s. |
"-O3 -msse3 -march=opteron" |
Opteron Dual Dual-Core 265 |
1.8 GHz, 64 bit 1 MB L2 cache |
Four procs 36 s. |
"-O3 -msse3 -march=opteron" |
Athlon 3200+ 64 |
2.0 GHz, 64 bit 1 MB L2 cache, laptop |
33 s. |
"-O3 -mfpmath=sse -msse2 -march=k8" |
Athlon 3200 XP |
2.2 GHz, 32 bit 512 KB L2 cache |
47 s. |
"-O3 -mfpmath=sse -msse -march=athlon-xp" |
Athlon 2500 XP |
2.0 GHz (OC'd), 32 bit 512 KB L2 cache |
60 s. |
"-O3 -mfpmath=sse -msse -march=athlon-xp" |
Xeon |
2.2 GHz, 32 bit 512 KB L2 cache |
61 s. |
pgcc: "-fastsse" |
Athlon 2000 XP |
1.6 GHz, 32 bit 256 KB L2 cache |
81 s. |
"-O3" (old gcc; minimal options) |
Solvetri.c is compiled to the mex file with the options as indicated.
The mex file for the matrc routine improves computation time by a factor
of 2 (more for longer range computations) or so.
Benchmarks in practice... Calculating PE predictions for an acoustic
path in the Philippine Sea for 36 ECCO2 ocean model realizations. Time
listed is for a single frequency. All values "under load", that is, all
cpu's running. Using the 80 cpus of aogreef, the 36 snapshots at 628 km
range took about 10 hours total (72sX13 frequencies for each cpuX36
realizations=9.4 hrs.) All calculations used the mex files compiled with
the GNU gcc. It appears that the Intel icc compiler might gain an
additional 7% or so. compiler with "-fast" option.
Processor |
Details |
Time |
Notes |
Phenom 9600 |
2.3 GHz, 64 bit 4X512 KB L2 cache |
Four procs 50 s. |
bluefin |
Intel Q6600 Quad Core |
o'clocked to 2.8 GHz 2X2MB L2 cache |
Four procs 57 s. |
skipjack |
2×Opteron Dual Core 265 |
1.8 GHz, 64 bit 2X1 MB L2 cache |
Four procs 72 s. |
sardines |
Opteron Dual Core 170 |
2.0 GHz, 64 bit 2X1 MB L2 cache |
Two procs 73 s. |
anchovies |
2×Opteron Dual Core 2210 |
1.8 GHz, 64 bit 2X1 MB L2 cache |
Four procs 75 s. |
aogreef |
References
- M. D. Collins, A split-step Padé solution for the parabolic equation method, J. Acoust. Soc. Am., Vol. 93, pp. 1736-1742, 1993.
- M. D. Collins, An energy-conserving parabolic equation for elastic media, J. Acoust. Soc. Am., Vol. 94, pp. 975-982, 1993.
Figures
Comparison between Eigenray prediction and RAM prediction for 100 km range (NOT the test case!) |
|
Comparison between Eigenray prediction and RAM prediction for 1401.745
km range. This comparison shows the effects of bathymetry in the form of
a sharp ridge midway along the acoustic path. The comparison
illustrates two problems that can be fixed by adjustment of the
calculation parameters. (1) Note that the ray and PE predictions shown
here have slightly different dispersal, which is particularly evident in
the early part of the timefront. This is the result of using too large a
value for "deltaz". When deltaz=0.5, rather than 2.0 (as was used
here), this discrepancy is resolved. (2) The odd arrival in the lower
right hand corner of the PE prediction is a result of using too small a
value for T - the window is not wide enough so the earliest arrival
appears to overlap the finalé. A larger value for T will give a window
wide enough to encompass the complete arrival pattern with no wrapping
around. |
|
Topic revision: r29 - 2009-02-09 - BrianDushaw
Copyright
© 2008-2015 by the contributing authors. All material on this
collaboration platform is the property of the contributing authors.
|