The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Geo::BUFR - Perl extension for handling of WMO BUFR files.

SYNOPSIS

  # A simple program to print decoded content of a BUFR file

  use Geo::BUFR;

  Geo::BUFR->set_tablepath('path to BUFR tables');

  my $bufr = Geo::BUFR->new();

  # If you want flag and code table values to be resolved
  $bufr->load_Ctable('your favourite C table');

  $bufr->fopen('BUFR file');

  while (not $bufr->eof()) {
      my ($data, $descriptors) = $bufr->next_observation();
      print $bufr->dumpsections($data, $descriptors);
  }

  $bufr->fclose();

DESCRIPTION

BUFR = Binary Universal Form for the Representation of meteorological data. BUFR is approved by WMO (World Meteorological Organization) as the standard universal exchange format for meteorological observations, gradually replacing a lot of older alphanumeric data formats.

This module provides methods for decoding and encoding BUFR messages, and for displaying information in BUFR B and D tables and in BUFR flag and code tables.

Installing this module also installs some programs: bufrread.pl, bufrresolve.pl, bufrencode.pl, bufr_reencode.pl and bufralter.pl. See https://wiki.met.no/bufr.pm/start for examples of use. For the majority of potential users of Geo::BUFR I would expect these programs to be all that you will need Geo::BUFR for.

Note that being Perl, this module cannot compete in speed with for example the (free) ECMWF Fortran library libbufr. Still, some effort has been put into making the module reasonable fast in that the core routines for encoding and decoding bitstreams are implemented in C.

METHODS

The get_ methods will return undef if the requested information is not available. The set_ methods as well as fopen, fclose, copy_from and rewind will always return 1, or croak if failing.

Create a new object:

  $bufr = Geo::BUFR->new();
  $bufr = Geo::BUFR->new($BUFRmessages);

The second form of new is useful if you want to provide the BUFR messages to decode directly as an input buffer (string). Note that merely calling new($BUFRmessages) will not decode anything in the BUFR messages, for that you need to call next_observation() from the newly created object. You also have the option of providing the BUFR messages in a file, using the no argument form of new() and then calling fopen.

Associate the object with a file for reading of BUFR messages:

  $bufr->fopen($filename);

Close the associated file that was opened by fopen:

  $bufr->fclose();

Check for end-of-file (or end of the input buffer provided as argument to new):

  $bufr->eof();

Returns true if end-of-file (or end of input buffer) is reached, false if not.

Ensure that next call to next_observation will decode first subset in first BUFR message:

  $bufr->rewind();

Copy from an existing object:

  $bufr1->copy_from($bufr2,$what);

If $what is 'all' or not provided, will copy everything in $bufr2 into $bufr1, i.e. making a clone. If $what is 'metadata', only the metadata in section 0, 1 and 3 will be copied.

Load B and D tables:

  $bufr->load_BDtables($table);

$table is optional, and should be (base)name of a file containing a BUFR table B or D, using the ECMWF libbufr naming convention, i.e. [BD]'table_version'.TXT. If no argument is provided, load_BDtables() will use BUFR section 1 information in the $bufr object to decide which tables to load. Previously loaded tables are kept in memory, and load_BDtables will return immediately if the tables already have been loaded. Returns table version (see get_table_version).

Load C table:

  $bufr->load_Ctable($table,$default_table);

Both $table and $default_table are optional. This will load the flag and code tables (if not already loaded), which in ECMWF libbufr are put in tables C'table_version'.TXT (not to be confused with WMO BUFR table C, which contain the operator descriptors). $default_table will be used if $table is not found. If no arguments are provided, load_Ctable() will use BUFR section 1 information in the $bufr object to decide which table to load. Returns table version.

Get next observation (next subset in current BUFR message or first subset in next message):

  ($data, $descriptors) = $bufr->next_observation();

where $descriptors is a reference to the array of fully expanded descriptors for this subset, $data is a reference to the corresponding values. This method is meant to be used to iterate through all BUFR messages in the file or input buffer (see new) associated with the $bufr object. Whenever a new BUFR message is reached, section 0-3 will also be decoded, whose content is then available through the access methods listed below. This is the main BUFR decoding routine in Geo::BUFR, and will call load_BDtables() internally, but not load_Ctable. Consult "DECODING/ENCODING" if you want more precise info about what is returned in $data and $descriptors.

Print the content of a subset in BUFR message:

  print $bufr->dumpsections($data,$descriptors,$options);

$options is optional. If this is first subset in message, will start by printing message number and, if this is first message in a WMO bulletin, WMO ahl (abbreviated header line), as well as content of sections 0, 1 and 3. For section 4, will also print subset number. $options should be an anonymous hash with possible keys 'width' and 'bitmap', e.g. { width => 20, bitmap => 0 }. 'bitmap' controls which of dumpsection4 and dumpsection4_with_bitmaps will be called internally by dumpsections. Default value for 'bitmap' is 1, causing dumpsection4_with_bitmaps to be called. 'width' controls the value of $width used by the dumpsection4... methods, default is 15. If you intend to provide the output from dumpsections as input to reencode_message, be sure to set 'bitmap' to 0, and 'width' not smaller than the largest data width in bytes among the descriptors with unit CCITTIA5 occuring in the message.

Normally dumpsections is called after next_observation, with same arguments $data,$descriptors as returned from this call. From the examples given at https://wiki.met.no/bufr.pm/start#bufrreadpl you can get an impression of what the output might look like. If dumpsections does not give you exactly what you want, you might prefer to instead call the individual dumpsection methods below.

Print the contents of sections 0-3 in BUFR message:

  print $bufr->dumpsection0();
  print $bufr->dumpsection1();
  print $bufr->dumpsection2($sec2_code_ref);
  print $bufr->dumpsection3();

dumpsection2 returns an empty string if there is no optional section in the message. The argument should be a reference to a subroutine which takes the optional section as (a string) argument and returns the text you want displayed after the 'Length of section:' line. For general BUFR messages probably the best you can do is displaying a hex dump, in which case

  sub {return '    Hex dump:' . ' 'x26 . unpack('H*',substr(shift,4))}

might be a suitable choice for $sec2_code_ref. For most applications there should be no real need to call dumpsection2.

Print the data of a subset (descriptor, value, name and unit):

  print $bufr->dumpsection4($data,$descriptors,$width);
  print $bufr->dumpsection4_with_bitmaps($data,$descriptors,$width);

$width fixes the number of characters used for displaying the data values, and is optional (defaults to 15). $data and $descriptors are references to arrays of data values and BUFR descriptors respectively, likely to have been fetched from next_observation. Code and flag values will be resolved if a C table has been loaded, i.e. if load_Ctable has been called earlier. dumpsection4_with_bitmaps will display the bit-mapped values side by side with the corresponding data values. If there is no bit-map in the BUFR message, dumpsection4_with_bitmaps will provide same output as dumpsection4. See "DECODING/ENCODING" for some more information about what is printed, and https://wiki.met.no/bufr.pm/start#bufrreadpl for real life examples of output.

Set verbose level:

  Geo::BUFR->set_verbose($level); # 0 <= $level <= 5
  $bufr->set_verbose($level);

Some info about what is going on in Geo::BUFR will be printed to STDOUT if $level > 0. With $level set to 1, all that is printed is the B, C and D tables used (with full path). Setting verbose level > 1 might be especially helpful when debugging, or for example if you want to extract as much information as possible from an incorrectly formatted BUFR message.

No decoding of quality information:

  Geo::BUFR->set_noqc($n);
 - $n=1 (or not provided): Don't decode quality information (more
   specifically: skip all descriptors after 222000)
 - $n=0: Decode quality information (default in Geo::BUFR)

Enable/disable strict checking of BUFR format for recoverable errors (like using BUFR compression for one subset message etc):

  Geo::BUFR->set_strict_checking($n);
 - $n=0: disable checking (default in Geo::BUFR)
 - $n=1: warn (carp) if error but continue decoding
 - $n=2: die (croak) if error

Confer "STRICT CHECKING" for details of what is being checked if strict checking is enabled.

Show all BUFR table C operators (data description operators) when calling dumpsection4:

  Geo::BUFR->set_show_all_operators($n);
 - $n=1 (or not provided): Show all operators
 - $n=0: Show only the really informative ones (default in Geo::BUFR)

set_show_all_operators(1) cannot be combined with dumpsections with bitmap option set (which is the default).

Set or get tablepath:

  Geo::BUFR->set_tablepath($tablepath);
  $tablepath = Geo::BUFR->get_tablepath();

Get table version:

  $table_version = $bufr->get_table_version($table);

$table is optional. If for example $table = 'B0000000000088013001.TXT', will return '0000000000088013001'. In the more interesting case where $table is not provided, will return table version from BUFR section 1 information in the $bufr object.

Get number of subsets:

  $nsubsets = $bufr->get_number_of_subsets();

Get current subset number:

  $subset_no = $bufr->get_current_subset_number();

Get current message number:

  $message_no = $bufr->get_current_message_number();

Get last WMO abbreviated header line (ahl) before current message (undef if not present):

  $message_ahl = $bufr->get_current_ahl();

Accessor methods for section 0-3:

  $bufr->set_<variable>($variable);
  $variable = $bufr->get_<variable>();

where <variable> is one of

  bufr_edition
  master_table
  subcentre
  centre
  update_sequence_number
  optional_section (0 or 1)
  data_category
  int_data_subcategory
  loc_data_subcategory
  data_subcategory
  master_table_version
  local_table_version
  year_of_century
  year
  month
  day
  hour
  minute
  second
  local_use
  number_of_subsets
  observed_data (0 or 1)
  compressed_data (0 or 1)
  descriptors_unexpanded

set_year_of_century(0) will set year of century to 100. get_year_of_century will for BUFR edition 4 calculate year of century from year in section 1.

Encode a new BUFR message:

  $new_message = $bufr->encode_message($data_refs,$desc_refs);

where $desc_refs->[$i] is a reference to the array of fully expanded descriptors for subset number $i ($i=1 for first subset), $data_refs->[$i] is a reference to the corresponding values, using undef for missing values. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method. See "DECODING/ENCODING" for meaning of 'fully expanded descriptors'.

Encode a NIL message:

  $new_message = $bufr->encode_nil_message($stationid_ref,$delayed_repl_ref);

$delayed_repl_ref is optional. In section 4 all values will be set to missing except delayed replication factors and the (descriptor, value) pairs in the hashref $stationid_ref. $delayed_repl_ref (if provided) should be a reference to an array of data values for all descriptors 031001 and 031002 occuring in the message (these values must all be nonzero), e.g. [3,1,2] if there are 3 such descriptors which should have values 3, 1 and 2, in that succession. If $delayed_repl_ref is omitted, all delayed replication factors will be set to 1. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method (although number of subsets and BUFR compression will automatically be set to 0 whatever value they had before).

Reencode BUFR message(s):

  $new_messages = $bufr->reencode_message($decoded_messages,$width);

$width is optional. Takes a text $decoded_messages as argument and returns a (binary) string of BUFR messages which, when printed to file and then processed by bufrread.pl with no output modifying options set (except possibly --width), would give output equal to $decoded_messages. If bufrread.pl is to be called with --width $width, this $width must be provided to reencode_message also.

Join subsets from several messages:

 ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr_1,$subset_ref_1,
     ... $bufr_n,$subset_ref_n);

where each $subset_ref_i is optional. Will return the data and descriptors needed by encode_message to encode a multi subset message, extracting the subsets from the first message of each $bufr_i object. All subsets in (first message of) $bufr_i will be used, unless next argument is an array reference $subset_ref_i, in which case only the subset numbers listed will be included, in the order specified. On return $nsub will contain the total number of subsets thus extracted. After a call to join_subsets, the metadata (of the first message) in each object will be available through the get_-methods, while a call to next_observation will start extracting the first subset in the first message. Here is an example of use, fetching first subset from bufr object 1, all subsets from bufr object 2, and subsets 4 and 2 from bufr object 3, then building up a new multi subset BUFR message (which will succeed only if the bufr objects all have the same descriptors in section 3):

  my ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr1,
      [1],$bufr2,$bufr3,[4,2]);
  my $new_bufr = Geo::BUFR->new();
  # Get metadata from one of the objects, then reset those metadata
  # which might not be correct for the new message
  $new_bufr->copy_from($bufr1,'metadata');
  $new_bufr->set_number_of_subsets($nsub);
  $new_bufr->set_update_sequence_number(0);
  $new_bufr->set_compressed_data(0);
  my $new_message = $new_bufr->encode_message($data_refs,$desc_refs);

Extract BUFR table B information for an element descriptor:

  ($name,$unit,$scale,$refval,$width) = $bufr->element_descriptor($desc);

Will fetch name, unit, scale, reference value and data width in bits for element descriptor $desc in the last table B loaded in the $bufr object. Returns false if the descriptor is not found.

Extract BUFR table D information for a sequence descriptor:

  @descriptors = $bufr->sequence_descriptor($desc);
  $string = $bufr->sequence_descriptor($desc);

Will return the descriptors in a direct (nonrecursive) lookup for the sequence descriptor $desc in the last table D loaded in the $bufr object. In scalar context the descriptors will be returned as a space separated string. Returns false if the descriptor is not found.

Resolve BUFR table descriptors (for printing):

  print $bufr->resolve_descriptor($how,@descriptors);

where $how is one of 'fully', 'partially', 'simply' and 'noexpand'. Returns a text string suitable for printing information about the BUFR table descriptors given. $how = 'fully': Expand all D descriptors fully into B descriptors, with name, unit, scale, reference value and width (each on a numbered line, except for replication operators which are not numbered). $how = 'partially': Like 'fully', but expand D descriptors only once and ignore replication. $how = 'noexpand': Like 'partially', but do not expand D descriptors at all. $how = 'simply': Like 'partially', but list the descriptors on one single line with no extra information provided. The relevant B/D table must have been loaded before calling resolve_descriptor.

Resolve flag table value (for printing):

  print $bufr->resolve_flagvalue($value,$flag_table,$B_table,
                                 $default_B_table,$num_leading_spaces);

Last 2 arguments are optional. $default_B_table will be used if $B_table is not found, $num_leading_spaces defaults to 0. Example:

  print $bufr->resolve_flagvalue(4,8006,'B0000000000098013001.TXT')

Print the content of BUFR code (or flag) table:

  print $bufr->dump_codetable($code_table,$table,$default_table);

where $table is (base)name of the C...TXT file containing the code tables, optionally followed by a default table which will be used if $table is not found.

resolve_flagvalue and <dump_codetable will return empty string if flag value or code table is not found.

Manipulate binary data (these are implemented in C for speed and primarily intended as module internal subroutines):

  $value = Geo::BUFR->bitstream2dec($bitstream,$bitpos,$num_bits);

Extracts $num_bits bits from $bitstream, starting at bit $bitpos. The extracted bits are interpreted as a non negative integer. Returns undef if all bits extracted are 1 bits.

  $ascii = Geo::BUFR->bitstream2ascii($bitstream,$bitpos,$num_bytes);

Extracts $num_bytes bytes from bitstream, starting at $bitpos, and interprets the extracted bytes as an ascii string. Returns undef if the extracted bytes are all 1 bits.

  Geo::BUFR->dec2bitstream($value,$bitstream,$bitpos,$bitlen);

Encodes non-negative integer value $value in $bitlen bits in $bitstream, starting at bit $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $value. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

  Geo::BUFR->ascii2bitstream($ascii,$bitstream,$bitpos,$width);

Encodes ASCII string $ascii in $width bytes in $bitstream, starting at $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $ascii. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

  Geo::BUFR->null2bitstream($bitstream,$bitpos,$num_bits);

Sets $num_bits bits in bitstream starting at bit $bitpos to 0 bits. Last byte affected will be padded with 1 bits. $bitstream must be at least $bitpos + $num_bits bits long. The parts of $bitstream before $bitpos and after last encoded byte are not altered.

DECODING/ENCODING

The term 'fully expanded descriptors' used in the description of encode_message (and next_observation) in "METHODS" might need some clarification. The short version is that the list of descriptors should be exactly those which will be written out by running dumpsection4 (or bufrread.pl without any modifying options set) on the encoded message. If you don't have a similar BUFR message at hand to use as an example when wanting to encode a new message, you might need a more specific prescription. Which is that for every data value which occurs in the section 4 bitstream, you should include the corresponding BUFR descriptor, using the artificial 999999 for associated fields following the 204Y operator, and including the data operator descriptors 22[2345]000 and 23[2567]000 with data value set to the empty string, if these occurs among the descriptors in section 3 (rather: in the expansion of these, use bufrresolve.pl to check!). Element descriptors defining new reference values (following the 203Y operator) will have f=0 (first digit in descriptor) replaced with f=9 in next_observation, while in encode_message both f=0 and f=9 will be accepted for new reference values.

Some words about the procedure used for decoding and encoding data in section 4 might shed some light on this choice of design.

When decoding section 4 for a subset, first of all the BUFR descriptors provided in section 3 are expanded as far as is possible without looking at the actual bitstream, i.e. by eliminating nondelayed replication descriptors (f=1) and by using BUFR table D to expand sequence descriptors (f=3). Then, for each of the thus expanded descriptors, the data value is fetched from the bitstream according to the prescriptions in BUFR table B, applying the data operator descriptors (f=2) from BUFR table C as they are encountered, and reexpanding the remaining descriptors every time a delayed replication factor is fetched from bitstream. The resulting set of data values is returned in an array @data, with the corresponding B (and sometimes also some C) BUFR table descriptors in an array @descriptors. next_observation returns references to these two arrays. For convenience, some of the data operator descriptors without a corresponding data value (like 222000) are included in the @descriptors because they are considered to provide valuable information to the user, with corresponding value in @data set to the empty string. These descriptors without a value are written by the dumpsection4 methods on unnumbered lines, thereby distinguishing them from descriptors corresponding to 'real' data values in section 4, which are numbered consecutively.

Encoding a subset is done in a very similar way, by expanding the descriptors in section 3 as described above, but instead fetching the data values from the @data array that the user supplies (actually @{$data_refs->{$i}} where $i is subset number), and then finally encoding this value to bitstream.

The input parameter $desc_ref to encode_message is in fact not strictly necessary to be able to encode a new BUFR message. But there is a good reason for requiring it. During encoding the descriptors from expanding section 3 will consecutively be compared with the descriptors in the user supplied $desc_ref, and if these at some point differs, encoding will be aborted with an error message stating the first descriptor which deviated from the expected one. By requiring $desc_ref as input, the risk for encoding an erronous section 4 is thus greatly reduced, and also provides the user with highly valuable debugging information if encoding fails.

Note that for character data (unit CCITTIA5) FM 94 BUFR does not provide any guidelines for how to encode strings which are shorter than the data width. In Geo::BUFR the following procedure is followed: When encoding, the requested string is right padded with blanks. When decoding, any trailing null characters are silently removed, as well as leading and trailing white space.

BUFR TABLE FILES

The BUFR table files should follow the format and naming conventions used by ECMWF libbufr software (download from http://www.ecmwf.int/products/data/software/download/bufr.html, unpack and you will find table files in the bufrtable directory). Other table file formats exist and might on request be supported in future versions of Geo::BUFR.

STRICT CHECKING

The package global $Strict_checking defaults to

  0: Ignore recoverable errors in BUFR format met during decoding or encoding

but can be changed to

  1: Issue warning (carp) but continue decoding/encoding

  2: Croak (die) instead of carp

by calling set_strict_checking. The following is checked for when $Strict_checking is set to 1 or 2:

  • Compression set in section 1 for one subset message (BUFR reg. 94.6.3.2)

  • Local reference value for compressed character data not having all bits set to zero (94.6.3.2.i)

  • Excessive bytes in section 4 (section longer than computed from section 3)

  • Illegal flag values (rightmost bit set for non-missing values)

  • Cancellation operators (20[1-4]00, 203255 etc) when there is nothing to cancel

  • Invalid date and/or time in section 1

Plus some few more checks not considered interesting enough to be mentioned here.

To the above list I would have liked to add

  • Trailing null characters in CCITTIA5 data

but for the reason given at the end of the DECODING/ENCODING section, I have restrained from that. If you want to see what character data was originally encoded (including nulls and blanks) in a BUFR file, use bufrread.pl with option --verbose 5.

BUGS OR MISSING FEATURES

Some BUFR table C operators are not implemented or are untested, mainly because I do not have access to BUFR messages containing such operators. If you happen to come over a BUFR message which the current module fails to decode properly, I would therefore highly appreciate if you could mail me this.

AUTHOR

Pål Sannes <pal.sannes@met.no>

CREDITS

I am very grateful to Alvin Brattli, who (while employed as a researcher at met.no) wrote the first version of this module, with the sole purpose of being able to decode some very specific BUFR satellite data, but still provided the main framework upon which this module is built.

SEE ALSO

Guide to WMO Table Driven Code Forms: FM 94 BUFR and FM 95 CREX; Layer 3: Detailed Description of the Code Forms (for programmers of encoder/decoder software)

https://wiki.met.no/bufr.pm/start

COPYRIGHT

Copyright (C) 2010 met.no

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 5066:

Non-ASCII character seen before =encoding in 'Pål'. Assuming CP1252