The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

App::sh2p - Perl program to aid for conversion from UNIX shell to Perl

SYNOPSIS

  sh2p.pl [-i] [-t] [-f] script-name output-file | script-name [...] output-directory

  -i Do not use integer
  -t Generate test output.  Each line from the shelll script is written, with the prefix
     '#< ', before the converted line.  This is for testing and not recommended for general use.
  -f Clobber output files in the output directory

This Perl script, and associated modules in the sh2p directory, will attempt to convert the supplied shell script/s to Perl.

Only shells based upon the POSIX standard are supported. This include most popular shells like Bourne, Korn, and Bash, but specifically EXCLUDES C shell (csh, tcsh).

Most Korn shell 93 extensions are currently not supported.

CAVEAT: Incorrect shell commands will result in incorrect Perl!

DESCRIPTION

This program attempts to convert the base syntax of a UNIX shell script to Perl. It does not attempt to redesign the script for Perl but to assist in the conversion process by automating much of the tedium.

It can be run by either supplying the input and output file names, or by supplying a list of input file names and an output directory, which must be the right-most parameter and must already exist. If hyphen ('-') is supplied for either input or output file name then STDIN is read or STDOUT written to (not valid as a directory name).

The output file will be overwritten if it exists. If the directory output form is taken then an output file name is generated from the input file name, with '.pl' appended (any existing 'extension' will be removed). If this file already exists then the user will be prompted for permission to overwrite it, unless the -f (force) option is given.

The generated code will:

Have a #! line generated from $Config{'perlpath'}.

    use integer;

The POSIX shell only supports integer arithmetic, this may be supressed with the -i option

INSPECT messages

These are output whenever sh2p detects code which requires manual intervention, which is often. A message of the format '**** INSPECT: free-text' is written to the output script as a comment. A similar message is written to STDERR, prefixed by the line number and grouped by file.

autoload '$var' ignored

autoload, and typedef -fu, prepares the translator for names which are functions. Variable values containing a function name are set at runtime and cannot be used, so are ignored.

file descriptors not currently supported

<func> replaced by Perl built-in <new_func>

Advice to replace the specified program with a Perl alternative

<func> should be replaced by something like <advise>

Advice to replace the specified program with a Perl alternative

Multiple levels in 'break $level' not supported

Multiple levels in 'continue $level' not supported

Suspious conversion from expr

Many expr commands can be repeated as straight Perl - but not all.

The following line cannot be translated:

A "catch all" message where I give up. Often when setting shell options

Pipes/co-processes are not supported, use open

read through ksh pipes is not supported

The -p option to 'read' has different meanings in Bash and ksh. The shell in use is determined from the #! line. If this is not supplied then 'read' assumes the Bash syntax, which is more common.

Unrecognised shopt argument:

Only a limited number of the Bash 'shopt' options are relevant to Perl

sourced file should also be converted

The dot '.' or 'source' command has been used (and converted to 'do') but the sourced file should also be converted.

Only one option supported for typedef or declare

Pattern <pattern> not currently supported

Unable to convert shell pattern matching <$token>

User function <function-name> called in back-ticks

Shell functions can only return values between 0 and 255. A common technique for returning a string is to 'echo' or 'print' the string to STDOUT, then call the function in back-ticks (or $(...)), which, in a decent shell, should not produce a child process. This is inappropriate in Perl and the subroutine return value should be revised. This cannot reasonably be automated.

Writing dir/name.here

See 'Here documents' (below)

No conversion routine for <name>

Pipeline detected

Subshell: subshell list

Subshells should not create a child process if possible, and alterations to the environment should not affect the 'parent'. Shells do that using an environment stack, I do it using 'local'. However there are other implications, so the generated code should be inspected.

Using $PWD is unsafe: use Cwd::getcwd

The environment variable PWD is set when the shell built-in 'cd' is called. The variable is only used by the shell and not reliable in any other language.

Here documents

These work in the shell by writing a temporary file and then reading it. We use a similar method here, except the directory used is taken from the environment variable SH2P_HERE_DIR. If that is not set then the current directory is used.

The heredoc data is extracted from the script and written to a file named label.here, where 'label' is the label used to identify the heredoc. The file is read by subroutines appended to the generated script.

Currently external programs reading from a heredoc require manual intervention. Heredocs embedded inside back-ticks (or $(..)) produce a mess (some shells have problems with this as well).

External programs and built-ins

It is tempting to substitute Perl built-ins for external programs like chmod, rm, and so on. However the return codes are different and require a different testing regime. Therefore these are identified by an INSPECT message.

Functions

Functions declared externally and loaded dynamically or via '.' will not be known. These will generally be seen as unknown commands and default to an external program, called using system or back-ticks. However they may be declared in your script using 'autoload' (or 'typedef -fu'), which will register them as functions with sh2p.

The 'functions' alias, and 'typedef -f', will generate code to give a list of the subroutines in the main:: namespace (symbol table) at runtime. This will include any imported names from external modules, and is unconnected with those known at conversion time.

Note that the value of $0 inside functions differs between shell versions. In sh2p $0 is retained to be the name of the current run unit (program), which is the POSIX behaviour. The Bash specific variable FUNCNAME is converted to 'caller(0))[3]'.

There are two different syntax conventions commonly used with functions:

    name () { ... }         # Bourne shell syntax
    function name { ... }   # Korn shell syntax

Bash and Korn shells support both, however they differ in operation when it comes to variable scope. As an extension to the POSIX (and Bourne) standard, Bash and Korn shells allow local variables to be declared using typeset, declare (Bash), or local. This is mirrored in Perl by the 'my' perfix. Any variables not so declared are globals.

However there are issues with this. Bourne/POSIX never supported this, so strictly the Bourne shell syntax should not support it either. Both Bash and ksh88 (the most common) do support the declaration of local variables in both forms of syntax. Ksh93 "fixed" it by not supporting it in the Bourne syntax ('typedef var' has no effect), but supporting it in the Korn shell syntax only.

Added to that, very few people know about the capability anyway!

Variable scope

If the variable is new (top-down parsing) it is declared with 'my', otherwise we assume it is a global. The current state of block nesting is tracked, so the definition of a "new" variable is one which has not been declared in this, or a higher-level, block. This can be problematic if a variable is used for the first time in an inner block.

At runtime, the shell checks the environment block for variables before creating its own. That behaviour could be converted, but would impose a considerable overhead. Therefore the environment block is not consulted.

Redirection

When redirection is done in the shell the file is opened, read or written, then closed. This is exactly what is generated by the conversion. It might not be what you want, but at least demonstrates some of the inefficiencies of using a shell. INSPECT messages are not generated for this.

Pipelines

Some assumptions are made to pipelines. If the pipeline starts with an echo or print, then the string is moved to be an argument to the command which follows. When grep follows an external program call, that call is made in back-ticks and the grep moved to preceed the call. This should ease the editing required to use the Perl grep function.

Know bugs and short comings (TODO):

Shortcomings

Pipelines have limited support right now. Hopefully this will improve in future releases. typedef/declare support is limited Options are not fully supported for export, typedef/define, touch, and tr.

$?

Currently $? is not traced back to the previous command. It is intended to implement a second pass in a future release.

unset

The assumption is made that the variable is a scalar

Currently unsupported :

   File descriptors (exec, read -u, print -u, some redirection)
   Co-processes
   <(...) process substitution (ksh93 & Bash)
   Multiple levels in 'break' and 'continue'
   Nested 'here' documents
   Nested 'case' statements
   Options to the 'export' command
   Some tilde ~ expansions
   Redirection from if or loop statments, redirection from functions

The following built-ins are currently not converted:

                 alias    
                 bg       
                 bind     
                 builtin  
                 command  
                 eval               
                 fc       
                 fg       
                 getopts  
                 hash     
                 jobs         
                 pwd      
                 readonly     
                 time     
                 times    
                 trap       
                 ulimit   
                 umask    
                 unalias   
                 wait     
                 whence   

The following compound statements are currently unsupported: select time

The following operators are currently unsupported: |& &

SEE ALSO

AUTHOR

Clive Darke, <clive.darke@talk21.com>

COPYRIGHT AND LICENSE

Copyright (C) 2008 by C.B.Darke

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.0 or, at your option, any later version of Perl 5 you may have available.