Pål Sannes > Geo-BUFR > Geo::BUFR



Annotate this POD


Open  0
View/Report Bugs
Module Version: 1.33   Source  


Geo::BUFR - Perl extension for handling of WMO BUFR files.


  # A simple program to print decoded contents of a BUFR file. Note
  # that a more sophisticated program (bufrread.pl) is included in the
  # package

  use Geo::BUFR;

  Geo::BUFR->set_tablepath('path to BUFR tables');

  my $bufr = Geo::BUFR->new();

  $bufr->fopen('name of BUFR file');

  while (not $bufr->eof()) {
      my ($data, $descriptors) = $bufr->next_observation();
      print $bufr->dumpsections($data, $descriptors) if $data;



BUFR = Binary Universal Form for the Representation of meteorological data. BUFR is approved by WMO (World Meteorological Organization) as the standard universal exchange format for meteorological observations, gradually replacing a lot of older alphanumeric data formats.

This module provides methods for decoding and encoding BUFR messages, and for displaying information in BUFR B and D tables and in BUFR flag and code tables.

Installing this module also installs some programs: bufrread.pl, bufrresolve.pl, bufrencode.pl, bufr_reencode.pl and bufralter.pl. See https://wiki.met.no/bufr.pm/start for examples of use. For the majority of potential users of Geo::BUFR I would expect these programs to be all that you will need Geo::BUFR for.

Note that being Perl, this module cannot compete in speed with for example the (free) ECMWF BUFRDC Fortran library. Still, some effort has been invested in making the module reasonable fast in that the core routines for encoding and decoding bitstreams are implemented in C.


The get_ methods will return undef if the requested information is not available. The set_ methods as well as fopen, fclose, copy_from and rewind will always return 1, or croak if failing.

Create a new object:

  $bufr = Geo::BUFR->new();
  $bufr = Geo::BUFR->new($BUFRmessages);

The second form of new is useful if you want to provide the BUFR messages to decode directly as an input buffer (string). Note that merely calling new($BUFRmessages) will not decode anything in the BUFR messages, for that you need to call next_observation() from the newly created object. You also have the option of providing the BUFR messages in a file, using the no argument form of new() and then calling fopen.

Associate the object with a file for reading of BUFR messages:


Close the associated file that was opened by fopen:


Check for end-of-file (or end of the input buffer provided as argument to new):


Returns true if end-of-file (or end of input buffer) is reached, false if not.

Ensure that next call to next_observation will decode first subset in first BUFR message:


Copy from an existing object:


If $what is 'all' or not provided, will copy everything in $bufr2 into $bufr1, i.e. making a clone. If $what is 'metadata', only the metadata in section 0, 1 and 3 will be copied (and all of section 2 if present).

Load B and D tables:


$table is optional, and should be (base)name of a file containing a BUFR table B or D, using the ECMWF libbufr naming convention, i.e. [BD]'table_version'.TXT. If no argument is provided, load_BDtables() will use BUFR section 1 information in the $bufr object to decide which tables to load. Previously loaded tables are kept in memory, and load_BDtables will return immediately if the tables already have been loaded. Will die (croak) if tables cannot be found, but not if these are local tables (Local table version number > 0) and the corresponding master tables exist (Local table version number = 0), which then will be loaded instead. Returns table version for the tables loaded (see get_table_version).

Load C table:


Both $table and $default_table are optional. This will load the flag and code tables (if not already loaded), which in ECMWF libbufr are put in tables C'table_version'.TXT (not to be confused with WMO BUFR table C, which contain the operator descriptors). $default_table will be used if $table is not found. If no arguments are provided, load_Ctable() will use BUFR section 1 information in the $bufr object to decide which table to load. Will die (croak) if table cannot be found, but not if this is a local table and the corresponding master table exists, which then will be loaded instead. Returns table version for the table loaded.

Get next observation (next subset in current BUFR message or first subset in next message):

  ($data, $descriptors) = $bufr->next_observation();

where $descriptors is a reference to the array of fully expanded descriptors for this subset, $data is a reference to the corresponding values. This method is meant to be used to iterate through all BUFR messages in the file or input buffer (see new) associated with the $bufr object, see example program in "SYNOPSIS". Whenever a new BUFR message is reached, section 0-3 will also be decoded, the contents of which is then available through the access methods listed below. This is the main BUFR decoding routine in Geo::BUFR, and will call load_BDtables() internally (unless decoding of section 4 has been turned off by use of set_nodata or set_filter_db), but not load_Ctable. Consult "DECODING/ENCODING" if you want more precise info about what is returned in $data and $descriptors.

next_observation will return the empty list (so both $data and $descriptors will be undef) in the following cases: if there are no more BUFR messages in file/input buffer (so next call to eof() will return false), if no decoding of section 4 was requested in set_nodata, if filtering was turned on in set_filter_db and the BUFR message met the filter criteria in the user defined callback function, or if the BUFR message contained 0 subsets. If you need to distinguish the first case from the rest, one way would be to check get_current_subset_number() which will return 0 only in this first case.

If an error is met during decoding, it is possible to trap the error in an eval and then continue calling next_observation (as demonstrated in source code of bufrread.pl). Care has been taken that BUFR messages with incorrectly stated BUFR length should not cause later proper BUFR messages to be skipped. But the possibility of an erroneous last BUFR message in file led to abandonment of the convenient feature retained until Geo::BUFR version 1:25 of eof always returning false if there were no more BUFR messages in file/input buffer. Instead you should expect last call to next_observation to return false (empty list).

Filter BUFR messages:


Here user is responsible for defining the callback subroutine. This subroutine will then be called in next_observation (with arguments @args if provided) right after section 3 is decoded, and, if returning true, will cause next_observation to return immediately, without even trying to decode section 4 (the data section). Here is a simple example of such a callback (without arguments), filtering on AHL and Data category (table A) of the BUFR message.

  sub callback {
      my $obj = shift;
      return 1 if $obj->get_data_category != 0;
      my $ahl = $obj->get_current_ahl() || '';
      return ($ahl =~ /^IS.... (ENMI|TEST)/);

Check result of filtering:


Will return true (1) if next_observation returned immediately as described for set_filter_cb above. But calling is_filtered should rarely be needed, as in most cases the simple check 'next if !$data' after calling next_observation would be the natural way to proceed.

Print the contents of a subset in BUFR message:

  print $bufr->dumpsections($data,$descriptors,$options);

$options is optional. If this is first subset in message, will start by printing message number and, if this is first message in a GTS bulletin, AHL (Abbreviated Header Line), as well as contents of sections 0, 1 and 3. For section 4, will also print subset number. $options should be an anonymous hash with possible keys 'width' and 'bitmap', e.g. { width => 20, bitmap => 0 }. 'bitmap' controls which of dumpsection4 and dumpsection4_with_bitmaps will be called internally by dumpsections. Default value for 'bitmap' is 1, causing dumpsection4_with_bitmaps to be called. 'width' controls the value of $width used by the dumpsection4... methods, default is 15. If you intend to provide the output from dumpsections as input to reencode_message, be sure to set 'bitmap' to 0, and 'width' not smaller than the largest data width in bytes among the descriptors with unit CCITTIA5 occuring in the message.

Normally dumpsections is called after next_observation, with same arguments $data,$descriptors as returned from this call. From the examples given at https://wiki.met.no/bufr.pm/start#bufrreadpl you can get an impression of what the output might look like. If dumpsections does not give you exactly what you want, you might prefer to instead call the individual dumpsection methods below.

Print the contents of sections 0-3 in BUFR message:

  print $bufr->dumpsection0();
  print $bufr->dumpsection1();
  print $bufr->dumpsection2($sec2_code_ref);
  print $bufr->dumpsection3();

dumpsection2 returns an empty string if there is no optional section in the message. The argument should be a reference to a subroutine which takes the optional section as (a string) argument and returns the text you want displayed after the 'Length of section:' line. For general BUFR messages probably the best you can do is displaying a hex dump, in which case

  sub {return '    Hex dump:' . ' 'x26 . unpack('H*',substr(shift,4))}

might be a suitable choice for $sec2_code_ref. For most applications there should be no real need to call dumpsection2.

Print the data of a subset (descriptor, value, name and unit):

  print $bufr->dumpsection4($data,$descriptors,$width);
  print $bufr->dumpsection4_with_bitmaps($data,$descriptors,$width);

$width fixes the number of characters used for displaying the data values, and is optional (defaults to 15). $data and $descriptors are references to arrays of data values and BUFR descriptors respectively, likely to have been fetched from next_observation. Code and flag values will be resolved if a C table has been loaded, i.e. if load_Ctable has been called earlier on. dumpsection4_with_bitmaps will display the bit-mapped values side by side with the corresponding data values. If there is no bit-map in the BUFR message, dumpsection4_with_bitmaps will provide same output as dumpsection4. See "DECODING/ENCODING" for some more information about what is printed, and https://wiki.met.no/bufr.pm/start#bufrreadpl for real life examples of output.

Set verbose level:

  Geo::BUFR->set_verbose($level); # 0 <= $level <= 6

Some info about what is going on in Geo::BUFR will be printed to STDOUT if $level > 0. With $level set to 1, all that is printed is the B, C and D tables used (with full path). Each line of verbose output starts with 'BUFR.pm: ', except for the level 6 specific output. Setting verbose level > 1 might be helpful when debugging, or for example if you want to extract as much information as possible from an incorrectly formatted BUFR message.

No decoding of section 4 (data section):

 - $n=1 (or not provided): Skip decoding of section 4 (might speed up
   processing considerably if only metadata in section 1-3 is sought for)
 - $n=0: Decode section 4 (default in Geo::BUFR)

No decoding of quality information:

 - $n=1 (or not provided): Don't decode quality information (more
   specifically: skip all descriptors after 222000)
 - $n=0: Decode quality information (default in Geo::BUFR)

Enable/disable strict checking of BUFR format for recoverable errors (like using BUFR compression for one subset message etc):

 - $n=0: disable checking (default in Geo::BUFR)
 - $n=1: warn (carp) if error but continue decoding
 - $n=2: die (croak) if error

Confer "STRICT CHECKING" for details of what is being checked if strict checking is enabled.

Show all BUFR table C operators (data description operators) when calling dumpsection4:

 - $n=1 (or not provided): Show all operators
 - $n=0: Show only the really informative ones (default in Geo::BUFR)

set_show_all_operators(1) cannot be combined with dumpsections with bitmap option set (which is the default).

Set or get tablepath:

  $tablepath = Geo::BUFR->get_tablepath();

Get table version:

  $table_version = $bufr->get_table_version($table);

$table is optional. If for example $table = 'B0000000000088013001.TXT', will return '0000000000088013001'. In the more interesting case where $table is not provided, will return table version from BUFR section 1 information in the $bufr object.

Get number of subsets:

  $nsubsets = $bufr->get_number_of_subsets();

Get current subset number:

  $subset_no = $bufr->get_current_subset_number();

If decoding of section 4 has been skipped (due to use of set_nodata or set_filter_cb), will return number of subsets. For a BUFR message with 0 subsets, will actually return 1 (a bit weird perhaps, but then this is a really weird kind of BUFR message to handle).

Get current message number:

  $message_no = $bufr->get_current_message_number();

Get Abbreviated Header Line (AHL) before current message:

  $ahl = $bufr->get_current_ahl();

If there is no AHL immediately preceding current message, default is for get_current_ahl to return undef. Sometimes that might not be what you want, e.g. when processing a file with GTS bulletins with possibly more than one BUFR message in each bulletin, and especially so if filtering on AHL using set_filter_cb.

- $n=1 (or not provided): Will cause C<get_current_ahl> to return last
   AHL extracted and not undef if currently processed BUFR message has
   no (immediately preceding) AHL
 - $n=0: Reset C<get_current_ahl> to default behaviour as described

Accessor methods for section 0-3:

  $variable = $bufr->get_<variable>();

where <variable> is one of

  optional_section (0 or 1)
  observed_data (0 or 1)
  compressed_data (0 or 1)

set_year_of_century(0) will set year of century to 100. get_year_of_century will for BUFR edition 4 calculate year of century from year in section 1.

Encode a new BUFR message:

  $new_message = $bufr->encode_message($data_refs,$desc_refs);

where $desc_refs->[$i] is a reference to the array of fully expanded descriptors for subset number $i ($i=1 for first subset), $data_refs->[$i] is a reference to the corresponding values, using undef for missing values. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method. See "DECODING/ENCODING" for meaning of 'fully expanded descriptors'.

Encode a (single subset) NIL message:

  $new_message = $bufr->encode_nil_message($stationid_ref,$delayed_repl_ref);

$delayed_repl_ref is optional. In section 4 all values will be set to missing except delayed replication factors and the (descriptor, value) pairs in the hashref $stationid_ref. $delayed_repl_ref (if provided) should be a reference to an array of data values for all descriptors 031001 and 031002 occuring in the message (these values must all be nonzero), e.g. [3,1,2] if there are 3 such descriptors which should have values 3, 1 and 2, in that succession. If $delayed_repl_ref is omitted, all delayed replication factors will be set to 1. The required metadata in section 0, 1 and 3 must have been set in $bufr before calling this method (although number of subsets and BUFR compression will automatically be set to 1 and 0 respectively, whatever value they had before).

Reencode BUFR message(s):

  $new_messages = $bufr->reencode_message($decoded_messages,$width);

$width is optional. Takes a text $decoded_messages as argument and returns a (binary) string of BUFR messages which, when printed to file and then processed by bufrread.pl with no output modifying options set (except possibly --width), would give output equal to $decoded_messages. If bufrread.pl is to be called with --width $width, this $width must be provided to reencode_message also.

Join subsets from several messages:

 ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr_1,$subset_ref_1,
     ... $bufr_n,$subset_ref_n);

where each $subset_ref_i is optional. Will return the data and descriptors needed by encode_message to encode a multi subset message, extracting the subsets from the first message of each $bufr_i object. All subsets in (first message of) $bufr_i will be used, unless next argument is an array reference $subset_ref_i, in which case only the subset numbers listed will be included, in the order specified. On return $nsub will contain the total number of subsets thus extracted. After a call to join_subsets, the metadata (of the first message) in each object will be available through the get_-methods, while a call to next_observation will start extracting the first subset in the first message. Here is an example of use, fetching first subset from bufr object 1, all subsets from bufr object 2, and subsets 4 and 2 from bufr object 3, then building up a new multi subset BUFR message (which will succeed only if the bufr objects all have the same descriptors in section 3):

  my ($data_refs,$desc_refs,$nsub) = Geo::BUFR->join_subsets($bufr1,
  my $new_bufr = Geo::BUFR->new();
  # Get metadata from one of the objects, then reset those metadata
  # which might not be correct for the new message
  my $new_message = $new_bufr->encode_message($data_refs,$desc_refs);

Extract BUFR table B information for an element descriptor:

  ($name,$unit,$scale,$refval,$width) = $bufr->element_descriptor($desc);

Will fetch name, unit, scale, reference value and data width in bits for element descriptor $desc in the last table B loaded in the $bufr object. Returns false if the descriptor is not found.

Extract BUFR table D information for a sequence descriptor:

  @descriptors = $bufr->sequence_descriptor($desc);
  $string = $bufr->sequence_descriptor($desc);

Will return the descriptors in a direct (nonrecursive) lookup for the sequence descriptor $desc in the last table D loaded in the $bufr object. In scalar context the descriptors will be returned as a space separated string. Returns false if the descriptor is not found.

Resolve BUFR table descriptors (for printing):

  print $bufr->resolve_descriptor($how,@descriptors);

where $how is one of 'fully', 'partially', 'simply' and 'noexpand'. Returns a text string suitable for printing information about the BUFR table descriptors given. $how = 'fully': Expand all D descriptors fully into B descriptors, with name, unit, scale, reference value and width (each on a numbered line, except for replication operators which are not numbered). $how = 'partially': Like 'fully', but expand D descriptors only once and ignore replication. $how = 'noexpand': Like 'partially', but do not expand D descriptors at all. $how = 'simply': Like 'partially', but list the descriptors on one single line with no extra information provided. The relevant B/D table must have been loaded before calling resolve_descriptor.

Resolve flag table value (for printing):

  print $bufr->resolve_flagvalue($value,$flag_table,$B_table,

Last 2 arguments are optional. $default_B_table will be used if $B_table is not found, $num_leading_spaces defaults to 0. Example:

  print $bufr->resolve_flagvalue(4,8006,'B0000000000098013001.TXT')

Print the contents of BUFR code (or flag) table:

  print $bufr->dump_codetable($code_table,$table,$default_table);

where $table is (base)name of the C...TXT file containing the code tables, optionally followed by a default table which will be used if $table is not found.

resolve_flagvalue and dump_codetable will return empty string if flag value or code table is not found.

Manipulate binary data (these are implemented in C for speed and primarily intended as module internal subroutines):

  $value = Geo::BUFR->bitstream2dec($bitstream,$bitpos,$num_bits);

Extracts $num_bits bits from $bitstream, starting at bit $bitpos. The extracted bits are interpreted as a nonnegative integer. Returns undef if all bits extracted are 1 bits.

  $ascii = Geo::BUFR->bitstream2ascii($bitstream,$bitpos,$num_bytes);

Extracts $num_bytes bytes from bitstream, starting at $bitpos, and interprets the extracted bytes as an ascii string. Returns undef if the extracted bytes are all 1 bits.


Encodes nonnegative integer value $value in $bitlen bits in $bitstream, starting at bit $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $value. The parts of $bitstream before $bitpos and after last encoded byte are not altered.


Encodes ASCII string $ascii in $width bytes in $bitstream, starting at $bitpos. Last byte will be padded with 1 bits. $bitstream must have been initialized to a string long enough to hold $ascii. The parts of $bitstream before $bitpos and after last encoded byte are not altered.


Sets $num_bits bits in bitstream starting at bit $bitpos to 0 bits. Last byte affected will be padded with 1 bits. $bitstream must be at least $bitpos + $num_bits bits long. The parts of $bitstream before $bitpos and after last encoded byte are not altered.


The term 'fully expanded descriptors' used in the description of encode_message (and next_observation) in "METHODS" might need some clarification. The short version is that the list of descriptors should be exactly those which will be written out by running dumpsection4 (or bufrread.pl without any modifying options set) on the encoded message. If you don't have a similar BUFR message at hand to use as an example when wanting to encode a new message, you might need a more specific prescription. Which is that for every data value which occurs in the section 4 bitstream, you should include the corresponding BUFR descriptor, using the artificial 999999 for associated fields following the 204Y operator, and including the data operator descriptors 22[2345]000 and 23[2567]000 with data value set to the empty string, if these occurs among the descriptors in section 3 (rather: in the expansion of these, use bufrresolve.pl to check!). Element descriptors defining new reference values (following the 203Y operator) will have F=0 (first digit in descriptor) replaced with F=9 in next_observation, while in encode_message both F=0 and F=9 will be accepted for new reference values. When encoding delayed repetition you should repeat the set of data (and descriptors) to be repeated the number of times indicated by 031011 or 031012 (if given the feedback that this is considered cumbersome, an option for including the set of data/descriptors just once might be added later, both for encoding end decoding).

Some words about the procedure used for decoding and encoding data in section 4 might shed some light on this choice of design.

When decoding section 4 for a subset, first of all the BUFR descriptors provided in section 3 are expanded as far as is possible without looking at the actual bitstream, i.e. by eliminating nondelayed replication descriptors (F=1) and by using BUFR table D to expand sequence descriptors (F=3). Then, for each of the thus expanded descriptors, the data value is fetched from the bitstream according to the prescriptions in BUFR table B, applying the data operator descriptors (F=2) from BUFR table C as they are encountered, and reexpanding the remaining descriptors every time a delayed replication factor is fetched from bitstream. The resulting set of data values is returned in an array @data, with the corresponding B (and sometimes also some C) BUFR table descriptors in an array @descriptors. next_observation returns references to these two arrays. For convenience, some of the data operator descriptors without a corresponding data value (like 222000) are included in the @descriptors because they are considered to provide valuable information to the user, with corresponding value in @data set to the empty string. These descriptors without a value are written by the dumpsection4 methods on unnumbered lines, thereby distinguishing them from descriptors corresponding to 'real' data values in section 4, which are numbered consecutively.

Encoding a subset is done in a very similar way, by expanding the descriptors in section 3 as described above, but instead fetching the data values from the @data array that the user supplies (actually @{$data_refs->{$i}} where $i is subset number), and then finally encoding this value to bitstream.

The input parameter $desc_ref to encode_message is in fact not strictly necessary to be able to encode a new BUFR message. But there is a good reason for requiring it. During encoding the descriptors from expanding section 3 will consecutively be compared with the descriptors in the user supplied $desc_ref, and if these at some point differ, encoding will be aborted with an error message stating the first descriptor which deviated from the expected one. By requiring $desc_ref as input, the risk for encoding an erronous section 4 is thus greatly reduced, and also provides the user with highly valuable debugging information if encoding fails.

When decoding character data (unit CCITTIA5), any null characters found are silently (unless $Strict_checking is set) removed, as well as leading and trailing white space.


The BUFR table files should follow the format and naming conventions used by ECMWF BUFRDC software (download from https://software.ecmwf.int/wiki/display/BUFR/BUFRDC+Home, unpack, build library and you will find table files in the bufrtables directory). Other table file formats exist and might on request be supported in future versions of Geo::BUFR.


The package global $Strict_checking defaults to

  0: Ignore recoverable errors in BUFR format met during decoding or encoding

but can be changed to

  1: Issue warning (carp) but continue decoding/encoding

  2: Croak (die) instead of carp

by calling set_strict_checking. The following is checked for when $Strict_checking is set to 1 or 2:

Plus some few more checks not considered interesting enough to be mentioned here.


Some BUFR table C operators are not implemented or are untested, mainly because I do not have access to BUFR messages containing such operators. If you happen to come over a BUFR message which the current module fails to decode properly, I would therefore highly appreciate if you could mail me this.


Pål Sannes <pal.sannes@met.no>


I am very grateful to Alvin Brattli, who (while employed as a researcher at the Norwegian Meteorological Institute) wrote the first version of this module, with the sole purpose of being able to decode some very specific BUFR satellite data, but still provided the main framework upon which this module is built.


Guide to WMO Table Driven Code Forms: FM 94 BUFR and FM 95 CREX; Layer 3: Detailed Description of the Code Forms (for programmers of encoder/decoder software)



Copyright (C) 2010-2016 MET Norway

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

syntax highlighting: