The goals of this homework are to:
I wrote an article recently urging applied mathematicians to share research code, which appeared in SIAM News. Some colleagues at other universities have told me it’s required reading for their students, so of course I have to assign it too! It’s pretty light reading.
The IPython notebook $UWHPSC/codes/homework5/notebook/quadrature2.ipynb is an improved version of the notebook from the last homework. Some of the functions have been made more general and a discussion has been added of Simpson’s Rule, a more accurate formula than Trapezoid.
This notebook is best viewed live so that you can experiment with changing things in order to explore these examples. If you have a sufficiently recent version of the notebok installed (see IPython_notebook) then you should be able to do:
$ cd $UWHPSC/codes/homework5/notebook
$ ipython notebook --pylab inline
and then click on the quadrature2 notebook.
You can also view it at https://www.wakari.io/sharing/bundle/rjleveque/quadrature2 . If you have created a Wakari account, you should be able to download it to your account to try it out.
You can also view a static version of the notebook, which is in $UWHPSC/codes/homework5/notebook/quadrature2.pdf.
There is also a Python script version of the code in the notebook at $UWHPSC/codes/homework5/notebook/quadrature.py if you find that easier to experiment with. (But see the comments at the top of that file.)
Experiment with the notebook or the module to make sure you understand the material presented.
Study this code to make sure you understand it.
The directory $UWHPSC/codes/homework5/ also contains Fortran code that implements the last part of homework 4, with some added enhancements. In particular:
Study this code and experiment with it.
With 4 threads it might produce something like the following (timings will depend on how many cores you have):
Using 4 threads
true integral: 6.00136745954910E+00
n approximation error ratio
50 6.00200615142458E+00 6.387E-04 0.000E+00
100 6.01762134207395E+00 1.625E-02 3.929E-02
200 5.99787907396672E+00 3.488E-03 4.659E+00
400 5.99537682567465E+00 5.991E-03 5.823E-01
800 6.00057196798962E+00 7.955E-04 7.531E+00
1600 6.00118591794817E+00 1.815E-04 4.382E+00
3200 6.00132301603504E+00 4.444E-05 4.085E+00
6400 6.00135640717690E+00 1.105E-05 4.021E+00
12800 6.00136470029559E+00 2.759E-06 4.006E+00
25600 6.00136677000209E+00 6.895E-07 4.002E+00
51200 6.00136728718235E+00 1.724E-07 4.000E+00
102400 6.00136741645906E+00 4.309E-08 4.000E+00
Elapsed time = 0.00554200 seconds
CPU time = 0.01890300 seconds
fevals by thread 0: 51211
fevals by thread 1: 51187
fevals by thread 2: 51187
fevals by thread 3: 51165
Total number of fevals: 204750
You do not need to code anything for this part.
Create a new subdirectory $MYHPSC/homework5 for the code you write below. You can use the code provided as a starting point.
Create a new module quadrature2.f90 by starting with quadrature.f90 and adding a new function simpson that implements Simpson’s rule. It should have the same input arguments as trapezoid.
Write a new main program test2.f90 to test this. Check that it is 4th order accurate on the function f2 provided with various values of k. Check it also with some other functions if you want.
Note: this module should also be called quadrature2, not just the file name, i.e. it should start with the line:
module quadrature2
and test2.f90 should:
use quadrature2, only: ...
Your simpson routine should include an omp parallel do loop similar to trapezoid. Make sure it gives the same results in the error table for both with and without the -fopenmp during compilation, and for different choices of the number of threads.
Remember that you can run with more threads than your computer has cores and it should still work, but will probably make it run slower rather than faster.
Create a new version of the quadrature module named quadrature3 that has no parallel loops in trapezoid and instead has a parallel do loop in the error_table routine when it loops over the different values of n to test from the nvals array.
In this loop make last_error a firstprivate variable and think about what other variables need to be private. More about this below.
Test this version with a new test program test3.f90 that calls error_table with method = trapezoid.
Note the following:
Make the changes for the next two parts also in your quadrature3.f90 version.
Because of the load-balancing issue just mentioned, it is useful to include another clause in the omp parallel do loop directive in error table:
!$omp parallel do ... & ! whatever you needed before
!$omp schedule(dynamic)
do j=1,size(nvals)
This instructs the compiler to split up the values of j from 1 to size(nvals) dynamically rather than deciding in advance that the first half of the values will go to Thread 0 and the second half to Thread 1, for example. Instead the two threads would start working on j=1 and j=2 and whichever finishes first would start on j=3. This should give a somewhat better balance between threads.
Note that it can’t do a perfect job for this example since computing the error for the last value of j (the largest value of n) takes more function evaluations that all the others put together!
In order to improve load balancing, reorder the parallel loop so that n is decreasing rather than increasing via:
do j=size(nvals),1,-1
Think about why this is better.
In this case you might get results like this:
Using 4 threads
true integral: 6.00136745954910E+00
n approximation error ratio
12800 6.00136470029559E+00 2.759E-06 0.000E+00
6400 6.00135640717688E+00 1.105E-05 2.497E-01
25600 6.00136677000212E+00 6.895E-07 0.000E+00
1600 6.00118591794817E+00 1.815E-04 3.798E-03
3200 6.00132301603504E+00 4.444E-05 2.487E-01
800 6.00057196798962E+00 7.955E-04 2.282E-01
400 5.99537682567465E+00 5.991E-03 7.419E-03
200 5.99787907396672E+00 3.488E-03 2.280E-01
100 6.01762134207395E+00 1.625E-02 3.686E-01
50 6.00200615142457E+00 6.387E-04 5.462E+00
51200 6.00136728718236E+00 1.724E-07 0.000E+00
102400 6.00136741645906E+00 4.309E-08 0.000E+00
Elapsed time = 0.00621600 seconds
CPU time = 0.01550900 seconds
fevals by thread 0: 51200
fevals by thread 1: 102400
fevals by thread 2: 22600
fevals by thread 3: 28550
Total number of fevals: 204750
(Can you guess from this which thread got which values of n?) Notice that the table is very much out of order in this case, since lines were printed as threads finished their work.
One could clean up the table by keeping the approximation and error values for each n in a short array and then printing at the end in the proper order, along with the correct ratios. But you don’t need to do this for the assignment.
The changes for Problems 6,7,8 should all be made in the same version. So what you end up with in quadrature3.f90 will have the parallel loop in error_table, will use dynamic scheduling, and have the loop on j reordered. The test3.f90 program should call error_table with method = trapezoid.
Suppose we want to compute an integral in two space dimensions of the form
\(\int_a^b \int_c^d g(x,y) \, dy \, dx\)
This can be rewritten as \(\int_a^b f(x) \, dx\) where the function \(f(x) = \int_c^d g(x,y) \, dy\). As usual, we could approximate the integral of \(f(x)\) by the trapezoid rule in x. But now for each x, in order to approximate \(f(x)\) we must approximate \(f(x)\) by a trapezoid rule approximation to the integral of \(g(x,y)\) in \(y\).
Create a new directory homework5/quad2d that contains new versions of the codes functions.f90, quadrature.f90, and test.f90 that can be used to approximate
\(\int_0^2 \int_1^4 \sin(x+y)~dy~dx\)
for which the true value can be easily calculated for comparison.
In this case the function f(x) defined in functions.f90 should contain an implementation of the trapezoid rule (in y) that estimates \(\int_1^4 g(x,y) \, dy\) for any value x.
The functions module should also contain a function g(x,y) that will be called by f.
For the trapezoid rule in y, always use ny = 1000 points. (Not a great idea, see below, but let’s keep it simple.)
Modify the test program so that it produces an error table for ten values of n as shown in the sample output below. (These are the values used in the trapezoid rule approximation in x).
Also modify your code so that it keeps track of how many evaluations of the function g(x,y) each thread does, by introducing a new module variable gevals that is initialized and incremented appropriately.
Start with the modules provided in $UWHPSC/codes/homework5 and you can leave quadrature.f90 alone. In this module, OpenMP is used for the loop in the trapezoid routine. You do not need to add it to your new trapezoid loop in the definition of f(x). (You would not want to since they you would have nested parallel loops.)
Sample output might look like this:
Using 4 threads
true integral: -1.17773797385703E+00
n approximation error ratio
5 -1.15309805294824E+00 2.464E-02 0.000E+00
10 -1.17288644038560E+00 4.852E-03 5.079E+00
20 -1.17664941136820E+00 1.089E-03 4.457E+00
40 -1.17747897159959E+00 2.590E-04 4.203E+00
80 -1.17767418488683E+00 6.379E-05 4.060E+00
160 -1.17772156012371E+00 1.641E-05 3.886E+00
320 -1.17773323092813E+00 4.743E-06 3.461E+00
640 -1.17773612733693E+00 1.847E-06 2.569E+00
1280 -1.17773684879809E+00 1.125E-06 1.641E+00
2560 -1.17773702883453E+00 9.450E-07 1.191E+00
Elapsed time = 0.10095600 seconds
CPU time = 0.36504200 seconds
fevals by thread 0: 1298
fevals by thread 1: 1278
fevals by thread 2: 1278
fevals by thread 3: 1261
Total number of fevals: 5115
gevals by thread 0: 1298000
gevals by thread 1: 1278000
gevals by thread 2: 1278000
gevals by thread 3: 1261000
Total number of gevals: 5115000
Note that the error decreases at the expected rate initially but for larger values of n we do not get the factor of 4 improvement we might hope for. This is because the inner integral in y is always approximated with 1000 points so there is an error in the values of f(x) produced that does not decrease as we increase the number of points used for the outer integral. (A better idea of course would be to decrease ny along with n.)
Note that you expect the total number of g evaluations to be 1000 times larger than the total number of f evaluations.
Your homework5 directory should contain: