fix_latin - filters a data stream that is predominantly utf8 and 'fixes' any latin (ie: non-ASCII 8 bit) characters
fix_latin options <input_file >output_file Options: --use-xs <value> 'auto' | 'always' | 'never' --version list version number --help detailed help message
The script acts as a filter, taking source data which may contain a mix of ASCII, UTF8, ISO8859-1 and CP1252 characters, and producing output will be all ASCII/UTF8.
Multi-byte UTF8 characters will be passed through unchanged (although over-long UTF8 byte sequences will be converted to the shortest normal form). Single byte characters will be converted as follows:
0x00 - 0x7F ASCII - passed through unchanged 0x80 - 0x9F Converted to UTF8 using CP1252 mappings 0xA0 - 0xFF Converted to UTF8 using Latin-1 mappings
Override default ('auto') behaviour of trying to use XS module and falling back to pure-Perl version if not available. Set to 'never' to always use the Perl version or 'always' to always use XS and die if not available.
Display version number of underlying Encoding::FixLatin and XS modules.
Display this documentation.
This script was originally written to assist in converting a Postgres database from SQL-ASCII encoding to UNICODE UTF8 encoding. The following examples illustrate its use in that context.
If you have a SQL format dump file that you would normally restore by piping into 'psql', you can simply filter the dump file through this script:
fix_latin < dump_file | psql -d database
If you have a compressed dump file that you would normally restore using 'pg_restore', you can omit the '-d' option on pg_restore and pipe the resulting SQL through this script and into psql:
pg_restore -O dump_file | fix_latin | psql -d database
To take a look at non-ASCII lines in the dump file:
perl -ne '/^COPY (\S+)/ and $t = $1; print "$t:$_" if /[^\x00-\x7F]/' dump_file
This script is implemented using the Encoding::FixLatin Perl module. For more details see the module documentation with the command:
In particular you should read the 'LIMITATIONS' section to understand the circumstances under which data corruption might occur.
Copyright 2009-2014 Grant McLean
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.