% Lab #1.5 for Geophysical Inverse Theory ESS523, Fall 2005, Univ of WA % by Andy Ganse, course TA % course professor: Ken Creager % 6 Oct 2005 % --------------------------------------------------------------------- % IF YOU HAVEN'T WORKED THRU REVIEWTUT.M ALREADY YOU SHOULD DO THAT FIRST. % THEN JUST START READING AND FOLLOW THE INSTRUCTIONS AS YOU GO... % --------------------------------------------------------------------- % Hey gang, % Here's that statistics lowdown I promised that hopefully may help lend some % intuition about what covariance is, and also about the difference between % sample and population statistics, which was part of the point in lecture % this morning too. % (Eg "why aren't mtilde and mtrue equal?") % It's about as long as reviewtut.m, so it may take an hour or so to work thru, % but hopefully you may find this useful if you're not quite clear on covariance % and other statistics concepts. As before, the approach with this file is to % uncomment lines as you go, and run and change this file just as you did with % reviewtut.m last week... % Cheers! -Andy % Note you'll want to read and uncomment as you go like in reviewtut.m - % the first part below is uncommented already for you... % Say I had a "true" timeseries signal (data) that I'm adding noise to: t=(1:.1:10); % 91 time points % and let's create some "true" data values for each of the 91 time points: datapopmean=1+10*t-5*t.^2; % This signal is a "population mean" not "sample mean" % of the data. In real problems we don't know the pop mean, % or the true signal, since what we measure has noise on it. % But to learn about this stuff we've made up our own true % signal here, so we know the pop mean here. % Let's add some gaussian noise that has the same characteristics for all data pts: datanoise1=randn(1,91)*10; % ie mean=0 and stdev=10, same for all 91 data pts % note this noise is uncorrelated - those are independent fluctuations on each data point. datameasured=datapopmean+datanoise1; % Plot both the true and noisy signals: subplot(2,1,1); plot(t,datameasured); title('datameasured, ie datapopmean + datanoise1'); subplot(2,1,2); plot(t,datanoise1); title('just datanoise1 alone') % We see in the plots that we have a 91 point timeseries with fluctuations on each point. % You really must see that plot to follow the matrix/random-variables discussion below: % To make things easy above, we obtained all 91 noise points at once from the randn() % function, but note that's really only one sample of noise for each of the 91 random % variables. That is, the value of the data at each of the 91 times t is a % random variable (abbreviated "RV"). % Try recreating datanoise1 and replotting several times, to see that the instances % of the noise - the "samples" - are different each time although they're in the % same range since the noise statistics are the same. % So in the above, each datanoise1(i) is an RV (datanoise1 as a whole has 91 elements). % Similarly, datameasured(i) is an RV too because an RV plus a deterministic value % equals an RV (datapopmean(i) was deterministic not random) . % Okay, now let's just talk about the datanoise1 RV's to learn about covariance. % If instead of one sample of noise for each of the 91 random variables, we had a % bunch of samples for each of the 91 data values, I could estimate the covariances % via the Matlab function cov(). I say "estimate" because we can only do so well % with a finite number of datanoise(i) samples to try and find the statistics of % the whole infinite population of datanoise(i). % So let's make 1000 samples of the 91 RVs, and keep them all in a matrix for the % sake of tidiness. So this matrix is 1000 down by 91 across - each column contains % a vector of 1000 samples of the one RV for a given time point. % datanoise1lots=randn(1000,91)*10; % still have mean=0 and stdev=10 for all 91 RVs % Plot the covariance estimate for the 91 datanoise RVs based on those 1000 samples: % close; % gets rid of previous figure % imagesc(cov(datanoise1lots)); colorbar; % In this plot you should see a pretty good approximation of an identity matrix "I" % times our constant variance of 100 ( = stdev^2 = 10^2 which we chose above). % It's not exactly 100*I because we didn't have an infinite number of samples. % So our computed approximation here is the sample covariance, while 100*I is the % population covariance. Try redoing this with 100 or even 10 samples, and see if % the covariance matrix is a better or worse approximation of 100*I. % Okay, now let's step up in complexity a tiny bit. % Now instead of the same stats for all 91 RVs, let's say the stdevs (sqrt of cov) % are smaller near the first time point and larger near the 91st time point: % datanoise2=randn(1,91).*(1:.1:10); % ie stdev on pt.1 is 1, stdev on pt.91 is 10 % Plot it to see what it looks like: % plot(datanoise2); % Recreate datanoise2 and its plot several times to see that the samples are different % although they're in the same range, eg their spread is wider at the end. % And just like before let's make 1000 samples of each of those 91 RVs to find cov: % datanoise2lots=randn(1000,91)*diag(1:.1:10); % And let's look at these covariances: % imagesc(cov(datanoise2lots)); colorbar; % This should again be very close to diagonal, but the variances increase down the diagonal. % Last step in complexity now. % In the above two cases, we analyzed samples of 91 *independent* RVs. So the % covariance estimates were pretty close to diagonal. As independent RVs, we can % say that knowing for example datanoise(14) doesn't give us any clue if datanoise(15) % may be close or positive or negative. Whereas if there were some dependence between % the 91 RVs, some "correlation", then knowing datanoise(14)=5.231 might let us know % that datanoise(15) is likely pretty close to 5.231 rather than say = -2.508. See? % % Turns out if we smooth our noise, s we introduce such a correlation, which makes sense % intuitively too - if it's smoothed and we know datanoise(14)=5.231, then in a smooth % datanoise vector we'd expect datanoise(15) to be pretty close to 5.231 as well. % % To do the smoothing I'll "convolve with a gaussian kernel" like in reviewtut.m. % Look at the plot to see that it smoothes the noise: % x=-10:10; % s=4.5; % g=exp(-x.^2/2/s^2)/sqrt(2*pi)/s; % the gaussian kernel % datanoise1=randn(1,91)*10; % datanoise1filt=conv(datanoise1,g); % subplot(2,1,1); % plot(datanoise1); title('datanoise1'); % subplot(2,1,2); % plot(datanoise1filt); title('datanoise1filt (ie smoothed version of above noise)'); % Try recreating that datanoise1 and datanoise1filt and their plot several times, % to see how that works. % Now let's do the 1000 samples again and estimate a covariance matrix to see % how those correlations appear in there: % datanoise1lots=randn(1000,91)*10; % disp('sorry, this will take a minute or two to compute the 1000 convolutions...'); % for i=1:1000 % datanoise1filt(i,:)=conv(datanoise1lots(i,:),g); % end % close; % close divided-up plot window % imagesc(cov(datanoise1filt)); colorbar; % See how the covariance matrix is no longer diagonal? We see from it that % the values of each of the 91 RVs vary in a way that is particularly % correlated with the 10 or so RVs on either side of it. (Compare that to % the width of the Gaussian filter kernel, by the way.) % One catch here with my convolution approach - see how things broke down % toward RVs 1 and 91? Ie those results are kindof screwy - this is because % my convolution approach has edge effects troubles, so pretend you didn't % see that. ;) % Those of you who are statistics buffs may point out that by my filtering % I've changed the number of degrees of freedom and so biased my estimate % of the covariance matrix - yeah yeah yeah, but that's a small effect since % we've 1000 samples here. % If you have a few more minutes, you should try this stuff over again with % larger stdev values to see what that does to the covariance matrix plots. % And lastly, see if you can set up the filtered noise covariance example % on the case above where the stdevs increase from the 1st to 91st RV. Just % combine what we did farther above re the increasing stdevs with the filtering. % Okay okay, one more last thing to try yourself, to learn more about the % difference between statistics of a sample set from an RV, vs the population % statistics which we can only estimate if we have a finite # of samples. % Choose a mean and stdev with which to create 100 random samples using randn(). % Use Matlab's mean() and std() functions to find the sample mean and sample % stdev of those 100 samples. Are they the same as your original mean and % stdev you used to create the samples? If you have more or less than 100 % samples, do the sample mean and stdev get closer or farther from the original? % It turns out those sample means are actually Gaussian RVs themselves - % if you create a loop to recreate these 100 samples say 500 times and thus % get 500 means via mean(), and plot those in a histogram as we learned in % reviewtut.m, you should see a bell curve. What is the mean of that bell % curve? (Ie the mean of the means!) How does the stdev of that bell curve % change with the number of samples? (Turns out the answer is 1/sqrt(N)). % At any rate, this is why when we rerun our gravity problem and create % new noise on our datatrue each time, and get an estimate of the 3 model % parameters each run, those estimates will vary a bit each run. % If we reran this 100 times, we would expect that roughly 95 of those % times we would see the 95% confidence ellipse encompass the true % parameter values (mtrue). The mtrue value is the same every run, but % the mest value and the ellipse will move around a bit each run.