The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

 Sybase::Xfer - Bcp data into a Sybae table from multiple sources

SYNOPSIS

 #!/usr/bin/perl
    use Sybase::Xfer;
    my %switches = (-from_server=>'CIA', -to_server=>'NSA', -table => 'x-files');
    my $h = Sybase::Xfer->new( %switches );
    my %status = $h->xfer(-return=>"HASH");
    print "xref failed. $status{last_error_msg}\n" unless $status{ok};

 #!/bin/ksh
    sybxfer -from_server 'CIA -to_server 'NSA' -table 'x-files'
    if [[ $? != 0 ]]; then print "transfer problems"; fi
 

DEPENDENCIES

 requires at least perl version 5.6

 Sybase::CTlib 

QUICK DESCRIPTION

Bulk copies data into a Sybase table. Data sources can include a) another Sybase table, b) the results of any Transact-Sql, c) the return values from a perl subroutine called repetitively, or d) a flat file. Comes with robust error reporting, handling, and intercepting.

Also comes with a command line wrapper, sybxfer.

DESCRIPTION

If you're in an environment with multiple servers and you don't want to use cross-server joins (aka using Component Integration Services) then this module may be worth a look. It transfers data from one server to another server row-by-row in memory w/o using an intermediate file.

The source is not limited to another Sybase table though. Other sources are a) any transact-sql, b) a perl subroutine called repetitively for the rows, or c) a flat file.

It also has some smarts to delete rows in the target table before the data is transferred by several methods. See the -truncate_flag, -delete_flag and -auto_delete switches.

The transfer is controlled entirely by the switch settings. One typically describes the from_source and to_source and, if necessary, how the transfer is proceed through other switches.

ERROR HANDLING

This is most obtuse section and fields the most questions. The following switches determine error handling behaviour and are discussed below. Granted it's 7 options and the the permutations make it all that more confusing. In time this too shall be repaired. But for now this is the way it stands.

 -error_handling       =>    abort | continue | retry
 -errror_data_file     =>    filename
 -retry_max            =>    n
 -retry_verbose        =>    1 | 0
 -retry_deadlock_sleep =>    secs
 -callback_err_send    =>    code ref
 -callback_err_batch   =>    code ref

and return values from the xfer method.

First note that there are two two catagories of errors:

client errors

Client errors are reported on the bcp_sendrow. They include errors like sending a alpha character to a float field, sending a char string 50 bytes long into a 40 byte field, or sending a null to a field that does not permit nulls. these errors can be fielded by -callback_err_send if callback so desired.

server errors

Server errors are reported on the bcp_batch. This is where the commit takes place. These errors, for example, include unique index errors, constraint errors, and deadlock errors. these errors can be fielded by -callback_err_batch callback.

what happens on an error then? that all depends on the switch settings.

-error_handling 'abort'

on client errors or server errors the transfer stops dead in its tracks. Any rows committed stay committed in the target table. Any rows sent but not committed are lost from the target table.

if -error_data_file is specified, then for client errors, the error message and the data row in error is written to this file. for server errors, error data file is ignored (I don't know which rows in the batch caused the error) and instead the error message is written to stdout.

If no -error_data_file is specified then for client errors, an *expanded* error message is written to stdout detailing the to_table datatype definitions and the data being attempted to insert with commentary on datatype mismatches. for server errors, once again, only an error message is written to stdout.

Setting -retry_max, -retry_verbose, -retry_deadlock_sleep, -callback_err_send, and -callback_err_batch have no effect.

-error_handling 'continue'

on client errors the row in error is skipped and processing continues. on server errors all the rows in the current batch are lost and not transferred to the target table. Processing continues with the next batch.

-error_data_file behaviour is the same as described above for abort. The difference is that for client errors it's now possible to have more than one error in the -error_data_file. This switch is ignored for server errors for the same reason as above in 'abort'.

Setting -retry_max, -retry_verbose and -retry_deadlock_sleep have no effect.

However, if -callback_err_send is defined then it is called on client errors and its return status determines whether or not to continue or abort. Likewise, if -callback_err_batch is defined then it is called on server errors and its return status detemines whether the xfer should continue or abort. For details about these two switches refer to the section elsewhere in this document describing each switch in detail.

-error_handling 'retry'

first off, -error_data_file must be specified so that client errors and data rows can be logged. On client errors, behaviour is exactly the same as with 'continue'. ie. if -callback_err_send is defined then it is called and its return status determines how to proceed. if no -callback_err_send is defined then error message and data rows are written to -error_data_file and processing continues. On server errors, too, if -callback_err_batch is specified then behaviour is the same as 'continue.' ie. it's return status determines if the xfer continues or aborts.

However, if no -callback_err_batch is specified then behaviour is dramitically different. First it checks -retry_verbose. If set it will print a message to stdout indicating it's retrying. when a server error is encountered under this condition the module will test for a deadlock error (errno; 1205) specifically. if it's NOT a 1205 it will temporarily set the batchsize to one and immediately resend the data writing any failed rows to the -error_data_file then continuing on. if it IS a 1205 it'll print a further message about deadlock encountered if -retry_verbose is set. Next, will sleep for -retry_deadlock_sleep seconds and then it will resend the data again. It'll do this for -retry_max times. If still after -retry_max times it's in an error state it'll abort.

NB. when 'retry', all the rows in the a batch are saved in memory so should an error occur the rows can be 'retrieved' again and resent with a batchsize of one . if the source table is large and the the batchsize is large this can have negative performance impacts on the transfer. After successful transfer of all rows in a batch the memory is garbage collected and available for the next batch.

See discussion in next section about return value settings.

AUTO DELETE

The -auto_delete switch is a way to indicate to *only* delete the rows you're about to insert. Sometimes the -delete_flag and -where_clause combination won't work - they cut too wide a swath. This option is typically used to manually replicate a table or to append to a table. -to_table must have a unique index for this option to work.

RETURN VALUES

In scalar context, the method xfer returns the following:

  0 = success
 >0 = success w/ n number for rows not transferred
 -1 = abort. unrecoverable error.
 -2 = abort. interrupt.
 -3 = abort. timeout.

In array context, the method xfer returns the following:

 [0] = number of rows read from source
 [1] = number of rows transferred to target
 [2] = last known error message
 [3] = scalar return status (listed above)

if -return => 'HASH', the method xfer returns the following:

 {rows_read}        = number of rows read from target
 {rows_transferred} = number of rows transferred to target
 {last_error_msg}   = last error message encountered
 {scalar_return}    = scalar return listed above
 {ok}               = 1 if all rows were transferred regardless of retries or warnings
                        along the way.
                    = 0 if not

The sybxfer command line app returns to the shell the following:

 0 = success
 1 = abort. unrecoverable error.
 2 = abort. interrupt.
 3 = abort. timeout.
 4 = success w/ some rows not transferred

OPTIONS SUMMARY

HELP

-help | -h

this help

FROM INFO

-from_server | -fs (string)

from server name

-from_database | -fd (string)

from database name

-from_user | -fu (string)

from username

-from_password | -fp (string)

from username password

FROM SOURCES

-from_table | -ft (string)

from table name

-from_sql (string)

string is the sql to run against the from server

-from_script (string)

string is the filename containing sql to run

-from_perl (coderef | arrayref)

coderef is perl sub to call to get data. arrayref is an array of arrayrefs.

-from_file | -ff (filename)

name of file to read data from. can be reference to a fileHandle or if filename ends in .gz or .zip it'll zcat the file first.

TO INFO

-to_server | -ts (string)

to server name

-to_database | -td (string)o

to database name

-to_table | -tt (string)

to table name

-to_user | -tu (string)

to username

-to_password | -tp (string)

to username password

FROM & TO INFO

-server | -S (string)

set from_server and to_server

-database | -D (string)

set from_database

-table | -T (string)

set from_table and to_table

-user | -U (string)

set from_user and to_user

-password | -P (string)

set from_password and to_password

MISC

-batchsize | -bs (int)

bcp batch size. Default=1000

-where_clause | -wc (string)

append string when using -from_table to sql select statement

-truncate_flag | -tf [ 1 | string ]

truncate to_table. If 'string' then it'll use a Morgan Stanley only command to modify the meta data of a production database.

-delete_flag | -df

delete to_table [where where_clause]

-app_name | -an (string)

application name

-holdlock | -hl

adds holdlock after the from_table

-trim_whitespace | -tw

strips trailing whitespace

-from_file_delimiter | -ffd "delimiter | regexp"

the delimiter used to separate fields. Used in conjunction with -from_file only.

-from_file_pack_format | -ffpf "pack format"

the 'pack' format to send to perl's unpack function to determine the fields

-from_file_map | -ffm (hashref or string or file)

the positional mapping between field names in -to_table and fields in -from_file. First position begins at 0 (not one.) Ignored if source is -from_perl.

-timeout | -to (secs)

timeout value in seconds before the transfer is aborted. Default is 0. No timeout.

-drop_and_recreate_indices | -dari [ < string | hash ref | filename > ]

drop all the indices on -to_table before the transfer begins and recreate all the indices after the transfer is complete. DO NOT specify a value unless you wish to use MorganStanley's syts application.

-set_identity_insert | -sii (column)

this switch should be used if the target table has an identity column and you want the values xferred from the source. The default is to have the server set new values. The first column starts at 1 (not zero.)

-packetsize | -ps (value)

I defer to sybase's documentation at https://login.sybase.com/detail/1,6904,954,00.html

Increase the packet size for the connection On most platforms, the packet size defaults to 512 bytes. This default is provided for backward compatibility. Larger packet sizes 2048 to 4096 bytes, for example almost always yield better performance. This is generally true for applications that fetch rows with large columns, rows with lots of columns, or lots of rows with a few small columns. It is especially true for applications that do bulk copy data transfers or that handle text and image data.

To increase a connection's packet size, set the CS_PACKETSIZE connection property, via ct_con_props, prior to opening the connection. [this is what happens when you use this switch]

Note. You may have to reconfigure SQL Server to accept a larger maximum network packet size. See the SQL Server System Administration Guide for instructions on reconfiguring the maximum network packet size.

The following trade-offs come into play when the packet size is increased:

Increasing the packet size will increase the amount of memory required by the client and the server. Any network will impose a restriction on the maximum size of transmitted packets. On most networks, this is 4K. If the TDS packet size is larger than the maximum allowed by the underlying network, then each TDS packet will actually be transmitted as one or more physical network packets. This can adversely affect performance.

-textsize | -txs (value)

From the Sybase's Open Client-Library/C Reference Manual (online version), chapter 2, Options:

specifies the value of the Adaptive Server global variable @@textsize, which limits the size of text or image values that Adaptive Server returns.

When setting this option, supply a parameter that specifies length, in bytes, of the longest text or image value that Adaptive Server should return.

The default server configuration seems to be 32k. So if youre text/images are larger than this you'll need to set this value.

AUTO DELETE

-auto_delete [c1,c2...]

c1,c2... are to_table column keys

-auto_delete_batchsize (int)

auto_delete batchsize

-scratch_db (string)

scratch database used by auto_delete

CALLBACKS

-callback_pre_send (coderef)

before row is sent callback

-callback_err_send (coderef)

error on bcp_sendrow callback

-callback_err_batch (coderef)

error on bcp_batch callback

-callback_print (coderef)

any output that normally goes to stdout callback

ERROR HANDLING

-error_handling| -eh (string)

string is abort, continue, or retry. Default is abort.

-error_data_file | -edf (filename)

name of file to write the failed records into

-retry_max n

number of times to retry an bcp_batch error

-retry_deadlock_sleep

sleep time between calls to bcp_batch if deadlock error detected

FLAGS

-echo

echo sql commands

-silent

don't print begin & end/summary messages

-progress_log

print progess message on every bcp_batch

-debug

programmer debug mode

OPTION DETAILS

from sources

 -from_table | -ft  <table_name>

from table name. Use a fully qualified path to be safe. For example, pubs..titles. This removes the dependency on the default database for the -from_user value.

 -from_sql  <string>

send sql in <string> to -from_server and xfer results to -to_table. For example, 'select author_id, lname from pubs..authors' This implies the -to_table has two columns of compatible type to store author_id and lname in it.

 -from_script <file> 

send sql in <file> to from_server and xfer results to to_table. Essentially the same as -from_sql except that it opens a file and gets the sql from there.

 -from_perl <code-ref | array-ref> 

if code-ref then call repetitively for data to transfer to -to_table. code-ref must return the following array: ($status, $array_ref) where $status less than 0 to skip, equal 0 to end the transfer, and greater than 0 to continue. $arrary_ref is an array reference to the row to send.

if array-ref then ref contains array-refs to transfer each sub array-ref being one row.

This switch is only available from the API. --from_file_map is ignored when this switch is used.

 -from_file <file>

the file to read the actual data from. It can be a delimited file or a fixed-field file. When using this option it behaves simliar to Sybase::BCP (in) and Sybase's own bcp. I've attempted to make the error handling richer. See switches -from_file_delimiter (for delimited files), -from_file_pack_format (for fixed format files) and -from_file_map (for changing the ordering). It can be a compressed file (.gz or .zip) or a reference that is sub-classed from "IO::File" (usually a FileHandle but not always.) If you want to to get fancy, it can be anything that is valid in perl's own 'open' statement such as a named pipe. eg. "cut -d: -f1,5 /etc/passwd |"

from information

 -from_server | -fs  <server_name>   

The name of the server get the data from. Also see switch -server.

 -from_database | -fd  <database_name>   

The name of the from database. Optional. Will execute a dbuse on the from server. Also see switch -database.

 -from_user | -fu  <user_name> 

username to use on the from server. Also see switch -user.

 -from_password | -fp  <password> 

password to use on the from user. Also see switch -password.

to information

 -to_server | -ts <server_name> 

name of the to server. Also see switch -server.

 -to_database | -td <database_name>

The name of the to database. Optional. If -to_table is not specified as db.[owner].table this value will be pre-prepended to -to_table. Also see switch -database.

 -to_table | -tt  <table_name>

name of the to table. USE A FULLY QUALIFIED PATH. Eg. pubs..titles. This removes the dependency on the the login default database. Also see switch -table.

 -to_user | -tu <user_name>

to user name. Also see switch -user.

 -to_password | -tp <password>

to user password. Also see switch -password.

from and to short cuts

many times the from and to user/pw pair are the same. Or the table names are the same on the two servers. These options are just a way to set both the from and to options above with one switch.

 -server | -S <server>

set from_server & to_server to this value. Also see switches -from_server and -to_server.

 -database | -D  <database>

set from_database only. The to_database name is set using a fully qualified table name. This this the way bcp works. I'd change it if I could. Also see switches -from_database and -to_database.

 -table | -T  <table>    

set the from_table & to_table to this value. Also see switches -from_table and -to_table.

 -user | -U  <user_name>

set from_user & to_user to this value. Also see switches -from_user and -to_user.

 -password | -P  <password>

set from_password & to_password to this value. Also see switches -from_password and -to_password

other qualifiers

 -batchsize | -bs  <number>    

bcp batch size into the to_table. Default=1000.

 -where_clause | -wc <where_clause>

send 'select * from from_table where where_clause' to the from_server when -from_table is a specified (otherwise this switch is only used by the -delete_flag.) The default is to use no where clause thereby just sending 'select * from from_table'. Also see -delete_flag switch.

 -truncate_flag | -tf  < 1 | user_string >

send 'truncate table to_table' to the to_server before the transfer begins. This requires dbo privilege, remember. If you don't have dbo privilege and you want to remove all the rows from the target table you have two options. Use the -delete_flag with no -where_clause or truncate the table via an alternate method (eg. under a different user name) before you run the transfer. Default is false.

If user_string is specified, a hook into MorganStanley syts command can be used. user_string is specified as such:

   '{syts=>1, logfile=>file, sytsUser=>user, sytsPwd=>pwd}' 

   or a perl hash pointer to the above.

<file> defaults to stdout, <user> defaults to -to_user and <pwd> defaults to <user>. Using 'user_string' cirmcumvents dbo privilege but does require other syts privilege.

 -delete_flag | -df     

send 'delete to_table [where where_clause]' to the to_server before the transfer begins. Also see -where_clause. Default is false. The delete will be performed in batches in the size given by -batch_size.

 -app_name | -an <val>    

name of program. Will append '_F' and '_T' to differentiate between the from and to connections. This string appears in the field program_name in the table master..sysprocesses. Will truncate to 14 characters if longer. Default is basename($0).

 -holdlock | hl

if using from_table then this switch will add an additional holdlock after the table. This is especially useful if the from_table has the potential to be updated at the same time the transfer is taking place. The default is noholdlock.

 -trim_whitespace | tw

Will set nsql's $Sybase::CTlib::nsql_strip_whitespace to true which trims trailing whitespace before bcp'ing the data into target table. If a field is all whitepace on the from table then the result of trimming whitespace will be null. Therefore, the corresponding column on the to table needs to permit nulls. Default is false.

 -from_file_delimiter | -ffd <regex>

the delimiter to use to split each record in -from_file. Can be a regular expression. This switch is only valid with -from_file. Mutually exclusive with -from_file_pack_format.

 -from_file_pack_format | -ffpf <pack format>

the pack format to apply (using unpack) to each input line. This switch is only valid with -from_file. Mutually exclusive with -from_file_delimeter.

 -from_file_map | -ffm <string, hashref or file)

the positional mapping between column names in -to_table and positional fields in source. First position in the source begins at 0. Examples:

specified as string: '(cusip=>2, matdate=>0, coupon=>5)'

specified as file: 'format.fmt' #as long as the first non-whitespace char is not a '(' and the file 'format.fmt' contains cusip => 2, matdate => 0, coupon => 5

specified as a hashref: only via the api. reference to a perl hash.

Works with -from_sql, -from_table, -from_script, -from_file. Ignore on -from_perl.

 -timeout | -to <secs>

Timout value before the transfer aborts. Default is 0, ie. no timeout. If the timeout is met then a scalar return code of -3 is returned (+3 via sybxfer script.)

 -drop_and_recreate_indices [ < string | hash ref | filename> ]

drop all the indices on -to_table before the transfer begins and recreate all the indices after the transfer is complete. Even if an error has occurred an attempt will be made to recreate the indices (if they've been dropped.) Note that unique indices may fail to be recreated if the transfer resulted in duplicates. Also, enough space must be available to recreate the indices.

DO NOT specify any value unless you want to use Morgan Stanley's syts application to drop and recreate the indices then specify the value as one of the following:

string

"(syts=>1, source=>'server.database', logfile=>'file', withSortedData=>0/1)" where server.database is the location of the -to_table with the indices to create (typically on a test server.) if no logfile is specified then syts output goes to stdout.

hash ref

is the same as above only as a perl hashref

filename

is the same as above only in a file

auto delete

The -auto_delete switch indicates that it should be ensured that any rows selected from the from_table first be removed from the to_table. This differs from the -delete_flag and -where_clause combination that makes sweeping deletes. -auto_delete was added for the sole purpose of keeping the -to_table up-to-date by transferring only 'changed' records from the -from_table and not knowing just which records changed apriori.

This switch presently only works with -from_table.

 -auto_delete [c1,c2...]

c1, c2, ... are the unique key column names into to_table. When this switch is in effect the module will create a table in -scratch_db named sybxfer$$, $$ being the current process number, with the columns c1, c2, ... Then it will bcp only those columns to this temp table. After that bcp is complete it will perform a delete (in a loop of the -auto_delete_batchsize) via a join by these columns in the temp table to the to_table so as to remove the rows, if any. After the delete is complete the temp table is dropped and all the columns specified will be bcp'ed to the to_table.

In essence, a simplisitic view is that the following is effectively done. 'delete to_table where c1='cval1' and c2='cval2' ...' for every row in the from_table for values c1, c2, ... I mention this only in this way because the explanation above seems either too convoluted (or I can't explain it clearly enough.)

 -auto_delete_batchsize | adb [i]

batchsize to use when auto-deleting rows. 3000 is the default. See -auto_delete.

 -scratch_db  [db]       

scratch database used by auto_delete. tempdb is the default. See -auto_delete.

callbacks (also see error handling)

callback switches are only valid from the API. ie. not from the script sybxfer

 -callback_pre_send <code_ref>

sub to call before sending row to to_server. first and only arg is ref to the array of the data. cb routine returns ($status, \@row). $status true means continue, $status false means abort.

It's called like this: ($status_cb_pre, $r_user_row) = $opt->{callback_pre_send}->(\@row);

It must return this: return ($status, \@row)

 -callback_print <code_ref> 

sub to call if the catching of log messages desired. No return status necessary.

It's called like this: $opt->{callback_print}->($message)

error handling

What to do upon encountering an error on bcp_sendrow or bcp_batch?

 -error_handling | -eh  <value>

Value can be abort, continue or retry. I should probably have a threshold number but I'll leave that until a later time. When set to continue the transfer will proceed and call any error callbacks that are defined (see below) and examine the return status of those to decide what to do next. If no error callbacks are defined and -error_handling set to continue the module will print the offending record by describing the row by record number, column-types and data values and continue to the next record. If -error_handling is set to abort, the same is printed and the xfer sub exits with a non-zero return code.

When value is retry it attempts to behave like Sybase::BCP on error in bcp_batch. These are where server errors are detected, like duplicate key or deadlock error.

By default, when -error_handling = retry

  • if a deadlock error is detected on the bcp_batch the program will sleep for -retry_deadlock_sleep seconds and rerun bcp_sendrow for all the rows in the batch and rerun bcp_batch once and repeat until no deadlock error or max tries.

  • if a non-deadlock error is detected on the bcp_batch the program will attempt to behave like Sybase::BCP by bcp_sendrow'ing and bcp_batch'ing every record. Those in error are written to the -error_data_file.

The default is abort.

Here's a deliberate example of a syntax type error and an example of the output from the error_handler. Note This is detected on bcp_sendrow. See below for bcp_batch error trace.

#------ #SAMPLE BCP_SENDROW ERROR FORMAT #------

row #1 1: ID char(10) <bababooey > 2: TICKER char(8) <> 3: CPN float(8) <> 4: MATURITY datetime(8) <> 5: SERIES char(6) <JUNK> 6: NAME varchar(30) <> 7: SHORT_NAME varchar(20) <> 8: ISSUER_INDUSTRY varchar(16) <> 9: MARKET_SECTOR_DES char(6) <> 10: CPN_FREQ tinyint(1) <> 11: CPN_TYP varchar(24) <> 12: MTY_TYP varchar(18) <> 13: CALC_TYP_DES varchar(18) <> 14: DAY_CNT int(4) <> 15: MARKET_ISSUE varchar(25) <bo_fixed_euro_agency_px> Column #16 actual length [26] > declared length [4] 16: COUNTRY char(4) <Sep 29 2000 12:00:00:000AM> Column #17 actual length [6] > declared length [4] 17: CRNCY char(4) <EMISCH> 18: COLLAT_TYP varchar(18) <EBS (SWISS)> 19: AMT_ISSUED float(8) <Govt> 20: AMT_OUTSTANDING float(8) <CH> 21: MIN_PIECE float(8) <> 22: MIN_INCREMENT float(8) <CLEAN> Sybase error: Attempt to convert data stopped by syntax error in source field.

Aborting on error. error_handling = abort 1 rows read before abort

#------ #SAMPLE BCP_BATCH ERROR_FILE #------ if -error_handling = retry and an error occurs on the bcp_batch then the -error_data_file will have this format.

#recnum=1, reason=2601 Attempt to insert duplicate key row in object 'sjs_junk1' with unique index 'a' mwd|20010128|10.125 #recnum=2, reason=2601 Attempt to insert duplicate key row in object 'sjs_junk1' with unique index 'a' lnux|20010128|2.875 #recnum=3, reason=2601 Attempt to insert duplicate key row in object 'sjs_junk1' with unique index 'a' scmr|20010128|25.500 #recnum=4, reason=2601 Attempt to insert duplicate key row in object 'sjs_junk1' with unique index 'a' qcom|20010128|84.625

 -retry_max <n>

n is the number of times to retry a bcp_batch when an error is detected. default is 3.

 -retry_deadlock_sleep <n>

n is the number of secords to sleep between re-running a bcp_batch when a deadlock error is detected.

 -callback_err_send <code_ref | hash_ref>

sub to call if error detected on bcp sendrow. The module passes a hash as seen below. It expects a 2 element array in return ie. ($status, \@row). $status true means continue, $status false means abort. Can also be a hash_ref meaning to store the error rows keyed by row number.

It's called like this if code_ref. @row is the array of data:

your_err_sendrow_cb(DB_ERROR => $DB_ERROR, row_num => $row_rum, row_ptr => \@row );

It must return this:

return ($status, \@row);

It stores the error like this if hash_ref:

$your_hash{ $row_num }->{msg} = $DB_ERROR; $your_hash{ $row_num }->{row} = \@row;

 -callback_err_batch <coderef>

sub to call if an error detected on bcp_batch. The module passes a hash as seen below. It expects a 2 element array in return. ie. ($status, \@row).

$status == 0 indicates to abort the batch.

$status == 1 indicates to resend and batch a row at a time and report errors to -error_data_file.

$status > 1 indicates not to do a record by record send and batch but to resend and batch once.

It's called like this:

     your_err_batch_cb(DB_ERROR  => $DB_ERROR, 
                       row_num   => $row_num,
                       rows      => \@row)

It must return this:

     return ($status, \@row);

A word about @row above. It's got a tad more info in it. @row is a array of hash refs. where:

  $row[$i]->{rn}  == row number in input file,
  $row[$i]->{row} == the actual array of data

miscellaneous boolean flags

 -echo                   

echo sql commands running under the covers. Default is false.

 -silent                 

don't print begin & end/summary messages. Default is false.

 -progress_log           

print progess message on every bcp_sendbatch. Default is true.

 -debug                  

programmer debug mode. Default is false. 0x01 = various debug messages 0x02 = print data in vertical mode before it's sent (and before map)

EXAMPLES

EXAMPLE #1 - simple table transfer

   my %opts = ( 
                -from_server   => 'EARTH',
                -to_server     => 'MARS',
                -U             => 'steve',          #user 'steve' is valid on both servers/dbs
                -P             => 'BobDobbs',       #same pw for both
                -T             => 'lamr..cities',   #same db and table on both servers

                -truncate_flag => 1,                #issue a 'truncate table lamr..cities' on to server
                -batchsize     => 2000,
              );

   my $h = new Sybase::Xfer(%opts);
   my $rs = $h->xfer();
   $rs && die 'xfer aborted';

EXAMPLE #2 - using 'from_sql'

   my %opts = (
                -from_server    => 'NYP_FID_RATINGS',
                -from_user      => 'rateuser',
                -from_password  => 'grack',
                -from_database  => 'fid_ratings',
                -from_sql       => 'select id, name, rating from rating where name like "A%"',

                -to_server      => 'NYP_FID_DATAMART',
                -to_user        => 'fiduser',
                -to_password    => 'glorp',
                -to_table       => 'fid_datamart..ratings_sap',  #NOTE FULLY QUALIFIED NAME

                -batchsize      => 500,
                -error_handling => 'abort',
               );

   my $h = new Sybase::Xfer(%opts);
   my $rs = $h->xfer();
   $rs && die 'xfer aborted';

EXAMPLE #3 - using all three callbacks

   my %opts = (
                -from_server        => 'MOTO',
                -from_user          => 'guest',
                -from_password      => 'guest',
                -from_database      => 'parts',
                -from_sql           => "select partno, desc, price from elec\n" .
                                       "UNION\n" .
                                       "select partno, name, px from m777\n",

                -to_server          => 'MASTERMOTO',
                -to_user            => 'kingfish',
                -to_password        => 'shirley',
                -to_table           => 'summary..elec_contents',

                -callback_pre_send  => \&pre_send,
                -callback_err_send  => \&err_on_send,
                -callback_err_batch => \&err_batch,

                -batchsize          => 100,
               );

 #-----
 #pre send callback. Adds 10000 to partno effectively between the time it 
 #was pulled from the source and the time it gets pushed into the target table.
 #-----
    sub pre_send {
      my @row = @{ +shift };    #array reference to the row about to be sent to the 'to' server
      $row[0] += 10000;         #manipulate @row all you want
      my $status = 1;           #status true means continue, false means abort
      return ($status, \@row);  #mandatory return args
    }


 #----
 #error on 'send row' callback - fix a syntax error by nulling out offending value.
 #----
    sub err_on_send {

        my %err_data = @_;
   
 #just to be explicit about it
        my $err_message = $err_data{DB_ERROR};  #key #1 = 'DB_ERROR'
        my $err_row_num = $err_data{row_num};   #key #2 = 'row_num' : last row sent to server
        my @row =  @{ $err_data{row_ptr} };     #key #3 = 'row_ptr' : reference to the array of



 #nb.
 #for example purposes I'm hardwiring this. I real life you'd create closure and send
 #it via that to this routine has a parameter.
 #
 #list of datatypes of the columns
        my $p_datatypes->{part_no}->{col_id} = 1;
        my $p_datatypes->{part_no}->{col_type} = 'int';
        my $p_datatypes->{descr}->{col_id} = 2;
        my $p_datatypes->{descr}->{col_type} = 'varchar(30)';
        my $p_datatypes->{price}->{col_id} = 3;
        my $p_datatypes->{price}->{col_type} = 'float';
 
        my (@col_names, @col_types, $retry_status) = ();
 
 #get column names in column order
        my @col_names =  sort { $p_datatypes->{$a}->{col_id} 
                         <=> $p_datatypes->{$b}->{col_id} }
                         keys %{ $p_datatypes };
 
 #get column types
        for my $col (@col_names) { push @col_types, $p_datatypes->{$col}->{col_type} }
 
 #for syntax errors compare data to data type
        my @row = ();
        if ($err_data{DB_ERROR} =~ /syntax/i ) {
           @row = @{ $err_data{row_ptr} };
 
 #check for character data in 'int' field
           for( my $i=0; $i<@row; $i++) {
              if($col_types[$i] =~ /int/ && $row[$i] =~ /\D/ ) {
                 $row[$i] = undef;
                 $retry_status = 1;
              }
           }
        }
 
 
 #if not a retry candidate then bail out
        unless ($retry_status) {
           cmp_print("row failed ($err_data{DB_ERROR})\n");
           for( my $i=0; $i<@row; $i++) { cmp_print("\t$i : $row[$i]\n") }
           cmp_error("xfer aborted");
        }
 
        return ($retry_status,\@row);
   }
 


 #----
 #error on 'send batch' callback
 #----
    sub err_batch {
      my %info = @_;                      #arg is a two keyed hash
      my $err_message = $info{DB_ERRROR}; #key #1 = 'DB_ERROR' 
      my $err_row_num = $info{row_num};   #key #2 = 'row_num', last row sent to server 
      my $status = 1;                     #status true means continue, false means abort
      return $status;                     #mandatory return arg
    }                

EXAMPLE #4 - Using auto_delete

   my %opts = (
                -user           => 'smoggy',
                -password       => 'smoggy',
                -table          => 'fx_rates..asia_geo_region',

                -from_server    => 'TEST',
                -to_server      => 'PROD',
 
                -auto_delete    => 'country_iso, id',   #unique key in table
                -auto_delete_batchsize => 10000,        #change the default
                -scratch_db     => 'tempdb',            #just to be explicit about it

                -batchsize      => 50000,
                -error_handling => 'abort',
               );

 

my $h = new Sybase::Xfer(%opts); my $rs = $h->xfer(); $rs && die 'xfer aborted';>

WISH LIST

  • Would like to convert from Sybase:: to DBI:: and ultimately be able to transfer data between different server makes and models.

  • Create the -to_table on the fly if it doesn't exist.

  • Incorporate logic to do the right thing when transferring data between Unicode and ISO versions of the Sybase server.

  • Allow CTlib handle to be passed in lieu of from_user/from_pwd/from_server

  • add new option -ignore_warnings 'num | regexp'. (a way to deal with the 'oversized row' message not being fatal in Sybase's bcp)

  • add a statistics summary in Xfer Summary report

  • print time on Progress Log report

  • -auto_delete option should figure out the unique keys on -to_table thereby not forcing the user to supply them.

  • add new option -direction to allow for bcp out

BUGS

  • Deadlock retry features need to more thoroughly tested.

  • -to_table residing on anything other than Sybase 11.9 or above is not supported. Morgan Stanley has an inhouse product called MSDB. This is not supported for the -to_server.

  • the examples in the documentation reek badly and need to be rewritten.

CONTACTS

Author's e-mail

stephen.sprague@msdw.com

Michael Peppler's homepage

http://www.mbay.net/~mpeppler/ for all things perl and Sybase including Sybase::CTLib, Sybase::BCP and a ton other goodies. Definitely a must see.

Sybperl mail-list

This a good place to ask questions specifically about Sybase and Perl. I pulled these instructions from Michael's page:

Send a message to listproc@list.cren.net with subscribe Sybperl-L your-name in the body to subscribe. The mailing list is archived and searchable at http://www.cren.net:8080/

Original idea

Sybase::Xfer idea inspired by Mikhail Kopchenov.

VERSION

Version 0.52, 10-SEP-2001 Version 0.51, 15-JUN-2001 Version 0.41, 15-APR-2001 Version 0.40, 01-MAR-2001 Version 0.31, 12-FEB-2001 Version 0.30, 10-FEB-2001 Version 0.20, 12-DEC-2000