Next: , Previous: Dataset Components, Up: Dataset Components


2.1 The NetCDF Data Model

A netCDF dataset contains dimensions, variables, and attributes, which all have both a name and an ID number by which they are identified. These components can be used together to capture the meaning of data and relations among data fields in an array-oriented dataset. The netCDF library allows simultaneous access to multiple netCDF datasets which are identified by dataset ID numbers, in addition to ordinary file names.

2.1.1 Expanded Model in NetCDF-4 Files

Files created with the netCDF-4 format have access to an expanded data model, which includes named groups. Groups, like directories in a Unix file system, are hierarchically organized, to arbitrary depth. They can be used to organize large numbers of variables.

Each group acts as an entire netCDF dataset in the classic model. That is, each group may have attributes, dimensions, and variables, as well as other groups.

The default root is the root group, which allows the classic netCDF data model to fit neatly into the new model.

Dimensions are scoped such that they can be seen in all descendant groups. That is, dimensions can be shared between variables in different groups, if they are defined in a parent group.

In netCDF-4 files, the user may also define a type. For example a compound type may hold information from an array of C structures, or a variable length array allows the user to read and write arrays of variable length arrays.

Variables, groups, and types share a namespace. Within the same group, a variable, groups, and types must have unique names. (That is, a type and variable may not have the same name within the same group, and similarly for sub-groups of that group.)

Groups and user defined types are only available in files created in the NetCDF-4/HDF5 format. They are not available for classic or 64-bit offset format files.

2.1.2 Naming Conventions

The names of dimensions, variables and attributes (and, in netCDF-4 files, groups, user-defined types, compound member names, and enumeration symbols) consist of arbitrary sequences of alphanumeric characters, underscore '_', period '.', plus '+', hyphen '-', or at sign '@', but beginning with a letter or underscore. However names commencing with underscore are reserved for system use. Case is significant in netCDF names. A zero-length name is not allowed. Some widely used conventions restrict names to only alphanumeric characters or underscores. Beginning with versions 3.6.3 and 4.0, names may also include UTF-8 encoded Unicode characters as well as other special characters, except for the character '/', which may not appear in a name. Names that have trailing space characters are also not permitted.

2.1.3 Network Common Data Form Language (CDL)

We will use a small netCDF example to illustrate the concepts of the netCDF data model. This includes dimensions, variables, and attributes. The notation used to describe this simple netCDF object is called CDL (network Common Data form Language), which provides a convenient way of describing netCDF datasets. The netCDF system includes the ncdump utility for producing human-oriented CDL text files from binary netCDF datasets and vice versa. (The ncdump utility has recently been enhanced to accommodate netCDF-4 features in the CDL output, but the example here is restricted to netCDF-3 CDL.)

     netcdf example_1 {  // example of CDL notation for a netCDF dataset
     
     dimensions:         // dimension names and lengths are declared first
             lat = 5, lon = 10, level = 4, time = unlimited;
     
     variables:          // variable types, names, shapes, attributes
             float   temp(time,level,lat,lon);
                         temp:long_name     = "temperature";
                         temp:units         = "celsius";
             float   rh(time,lat,lon);
                         rh:long_name = "relative humidity";
                         rh:valid_range = 0.0, 1.0;      // min and max
             int     lat(lat), lon(lon), level(level);
                         lat:units       = "degrees_north";
                         lon:units       = "degrees_east";
                         level:units     = "millibars";
             short   time(time);
                         time:units      = "hours since 1996-1-1";
             // global attributes
                         :source = "Fictional Model Output";
     
     data:                // optional data assignments
             level   = 1000, 850, 700, 500;
             lat     = 20, 30, 40, 50, 60;
             lon     = -160,-140,-118,-96,-84,-52,-45,-35,-25,-15;
             time    = 12;
             rh      =.5,.2,.4,.2,.3,.2,.4,.5,.6,.7,
                      .1,.3,.1,.1,.1,.1,.5,.7,.8,.8,
                      .1,.2,.2,.2,.2,.5,.7,.8,.9,.9,
                      .1,.2,.3,.3,.3,.3,.7,.8,.9,.9,
                       0,.1,.2,.4,.4,.4,.4,.7,.9,.9;
     }

The CDL notation for a netCDF dataset can be generated automatically by using ncdump, a utility program described later (see ncdump). Another netCDF utility, ncgen, generates a netCDF dataset (or optionally C or FORTRAN source code containing calls needed to produce a netCDF dataset) from CDL input (see ncgen). This version of ncgen can produce netcdf-3 or netcdf-4 files and can utilize CDL input that includes the netcdf-4 data model constructs. The older ncgen program is still available under the name ncgen3.

The CDL notation is simple and largely self-explanatory. It will be explained more fully as we describe the components of a netCDF dataset. For now, note that CDL statements are terminated by a semicolon. Spaces, tabs, and newlines can be used freely for readability. Comments in CDL follow the characters '//' on any line. A CDL description of a netCDF dataset takes the form

       netCDF name {
         types: [netcdf-4 only]
         dimensions: ...
         variables: ...
         data: ...
       }

where the name is used only as a default in constructing file names by the ncgen utility. The CDL description consists of three optional parts, introduced by the keywords dimensions, variables, and data. NetCDF dimension declarations appear after the dimensions keyword, netCDF variables and attributes are defined after the variables keyword, and variable data assignments appear after the data keyword.

The ncgen utility provides a command line option which indicates the desired output format. Limitations are enforced for the selected format - that is, some CDL files may be expressible only in 64-bit offset or NetCDF-4 format.

For example, trying to create a file with very large variables in classic format may result in an error because size limits are violated.