General Description of Features and Assumptions

This is a fairly rambling and very incompletedescription of the features of EMU and the assumptions that are built in. It needs to be reworked, but it's a starting place.

General Issues

In general, we need to keep in mind the distinction between the "project" customization (paths, file naming templates, conventions, etc), and the core toolbox, which represents the vast majority of the Toolbox code (>90% ?).

Use and Handling of Masks

Four modes exist, selected in the configuration file under "# MASKTYPE":

singleposval: using the id values in the mask file, enter the desired value in the configuration file, next to the mask path/file.
rangeposval: the mask file will be used, all id values > 0 will be taken as the mask.
rangefileIDs: the Aggregate ID configuration file will be used in conjunction with the mask file. Masks from Aggregate ID file may be overlapping. For some applications, all file id values > 0 will be used as the mask; see masktest().
rangefileID1: the Aggregate ID configuration file will be used in conjunction with the mask file, but only one Aggregate ID value must be included. This Aggregate ID will be used as the mask in all applications.

Aggregate Masks can be used in two ways (or more?). rangefileID1-kind of Aggregate Masks may be used as true spatial masks to limit processing to the selected regions (through the use of masktest() ), and also to limit the output to those regions and mask out the rest. rangefileIDs-type of Aggregate Masks are not used in that sense, but rather to output spatial means on selected regions, regions which may be overlapping; this use is represented by tsmultiplesites.c. rangefileID1-type can be used transparently by any program that relies on masktest() to select what's inside the mask.

Preferred Styles or Templates for Writing Applications

There is probably no more than 5 templates for reading and processing the data. We should think about these and write down documentation (with code fragments, examples) that lists and lays out these styles, for future reference and to start putting them down in writing.

Assumptions Built into EMU

Netcdf files
- coordinate (dimension) variables
  - names of dimensions (coordinate variables); eg, lat,lon,time; case sensitive?
  - lengths
  - units
  - supported geographic projections
  - time origin; is it read and parsed directly from the NC file, or assumed?? If we want to be able to parse time origins, maybe we should ask Ferret's Steve for a copy of the parsing code they use in Ferret!
  - time-axis type: monthly, monthly mean, annual mean, etc.
- data variables

OBSOLETE: High-level Labels for Variable Types

(* this discussion is obsolete; high-level labels have been removed from EMU and will not be re-introduced for a while *)

Each variable is completely specified by *basic* information, including file path, format (eg, binary or netcdf), data type (short, float, etc), file name construction template (if binary file), time unit, time origin, time type (eg, time series, monthly mean, annual mean), # of rows and cols, NODATA value, gain & offset, etc. This information should be contained in the var structure object. All of this information may be termed "low level", as it is what is directly interpreted and used to read the data.

To make input simpler for a specific project, some higher-level labels are used which serve as keywords for collections of low-level information. Examples I've been using are:

input: specifies the base path, file construction template (if binary), data type, presence/absence of gain & offset, etc
output: specifies the base path, file construction template (if binary), data type, presence/absence of gain & offset, etc

That is, by simply labelling a variable in vars.script "input" or "output", the toolbox automatically looks up additional, stored low-level information and uses it to populate the variable object array.

If we need to process a variable that does not fit any of those pre-existing high-level labels, then we'll need to explicitly specify all the equivalent low-level information, or at least make some assumptions.

It's important to maintain low-level and high-level information processing separate. In my current emu code, the two are intermixed, unfortunately. What we should have is a "preprocessor" that when reading the entry for a variable from the configuration file, if it encounters a high-level label, will look up the associated low-level information and populate the corresponding fields in the var structure. Once that is done, the toolbox will never have to use the high-level label again, anywhere (mostly). The advantage of this approach is that high-level labels may be redefined or introduced in a single place of two, easily, without having to go through all functions of the toolbox. Redundancy will be avoided too.

Emilio Mayorga, emiliom@u.washington.edu