Understanding shebang + eval combo from ancient perl script

During the daily development, we have a collection of scripts to help us automate some mundane task. Most of them are written in perl and quite often, I feel shamed to be a programmer that only know how to use those scripts without actually taking a look at their source code. However, I don’t know perl by any measure and recently, I decided to take this challenge: I quickly went through the basic aspects of the language (with the help of this nice tutorial: Learn Perl in about 2 hours 30 minutes )  and dived directly into some scripts to start reading. Then I met this daunting code chunk from the very beginning of a perl script:

#!/usr/bin/perl
eval 'exec perl5 -S $0 ${1+"$@"}'
   if 0;

So, I spent some days digging, and finally get this code chunk clears out. I’ll try my best to explain this code chunk in a newbie-friendly way (because I am one of them :)).

History

Usually, it is unnecessary for people to know that if a script should be executed by Perl, shell, or other interpreters. In other words, they can execute the script by typing the filename and the script itself will find the right interpreter to run it. This is usually done with shebang.  In perl, the common way to do so is to write


#!/usr/bin/perl

at the very first line of your perl script. However, not all system supports shebang and most likely, those systems will run your script as if it is a shell script, which, of course, will lead to the failure of execution. In this case, we need to figure out a way to tell those systems that “even if you are running the script as a shell script, please invoke perl interpreter to interpret the content of the script” and that is exactly what that daunting code does.  Now, let’s dive into this code chunk to see how it works.

Dive in

#!/usr/bin/perl
eval 'exec perl5 -S $0 ${1+"$@"}'
   if 0;

 [1]:  First line of code uses shebang and it invokes perl interpreter located under /usr/bin/, which should be enough  for systems that support shebang to know which perl interpreter should be used to run the content of script.

[2-3]:  For system that support shebang, the system already knows the content of script should be interpreted as perl. So, line 2 – 3 will be treated as perl. Since “Carriage return” is the same as “whitespace” in perl world, line 2 -3 will not get executed because of if 0. However, for system that doesn’t support shebang, the whole script is treated as shell script and thus, line 1 will be treated as shell comment and ignored. Then, the system continues to run line 2 – 3 as shell command. There is one important difference between perl and shell (i.e., bash) is that perl will spot the continuation line (because carriage return is the same as white space) but not bash. So, shell will first execute line 2. For now, let’s just say line 2 executed by shell will tell the system that “to re-run the whole script again under perl and where to find perl interpreter” (we will investigate more in detail in the following section). So now our goal is achieved: system that doesn’t support shebang will go ahead to use specified perl interpreter from line 2 to re-run the whole script as perl script. During the re-run, line 2-3 will get ignored by perl.

Line 2: what the heck is this?

Now, let’s study code from line 2 in detail.

eval 'exec perl5 -S $0 ${1+"$@"}'

The eval in shell takes a string as its argument, and evaluates it as if you’d typed that string on a command line. So, the shell actually executes

exec perl5 -S $0 ${1+"$@"}

$0 get expands to the name of the script by shell. However, ${1+”$@”} looks quite mysterious. It involves an ancient Bourne shell bug (if no argument provided, it uses an empty argument instead of nothing) and the article What does ${1+”@”} mean explains it very clear:

The ${1+"$@"} syntax first tests if $1 is set, that is, if there is an argument at all.
If so, then this expression is replaced with the whole "$@" argument list.
If not, then it collapses to nothing instead of an empty argument.

Aside note on ${1+"$@"}, it follows ${parameter+alt_value} pattern: If parameter set, use alt_value, else use null string. See more on it here.

So now, we can put all pieces together: when shell executes line 2, perl program (i.e. perl5) will be invoked and execute the script itself (expand from $0) and supply the argument list, which may be required by the script.

Example

Let me give an example.

Suppose we have a perl script named foo:

#!/usr/bin/perl
eval 'exec /wsdb/oemtools/linux/bin/perl5.16.2 -S $0 ${1+"$@"}'
    if 0;

use Config;
my $perl = $Config{perlpath};

print $perl."\n";

Besides the daunting code chunk, the rest will print out the absolute path of perl interpreter that our script get executed by. Now, let’s try out different way of executing our perl script foo:

$ perl foo
/usr/bin/perl
$ ./foo
/usr/bin/perl
$ sh foo
/wsdb/oemtools/linuxbin/perl5.16.2

The first two cases, we run the perl script in a standard way, since my system (SUSE Linux 11) supports shebang, the script gets executed by the perl interpreter specified in the shebang line. However, if we try to mimic the system that doesn’t support shebang by executing our script using shell (i.e., sh), the script is also get interpreted as perl script but with the perl interpreter from eval part. Notice sh usage here, sometimes the user of the script may assume the script is written by shell, and they will try to execute the script by sh. Then, in this case, our daunting code chunk provides a defensive mechanism that allows the perl script to be executed correctly even when it is run by shell.

With this explanation, I don’t think code chunk in find2perl will be daunting to you now:

#! /usr/bin/perl -w
    eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
        if 0; #$running_under_some_shell

 Modern days

Nowadays, people rarely use that daunting code chunk solely because some systems don’t support shebang. Increasingly, that daunting code chunk usually appears when people want to use a specified version of perl (not system default one like /usr/bin/perl) and at the same time, maintain some portability to the system that doesn’t support shebang. However, if we solely consider to use a specified version of perl instead of default one, then there is more than one way to do so. My list may not complete. Please feel free to comment below if I miss some usage on invoking customized perl.

#!/usr/bin/perl
eval 'exec /wsdb/oemtools/linux/bin/perl5.16.2 -S $0 ${1+"$@"}'
    if 0;

The first way is again our daunting code chunk. If we run our script by sh, then our customized perl (i.e., /wsdb/oemtools/linux/bin/perl5.16.2) is executed. -S as perl command option used here is to make perl use PATH environment variable to search for the script because on some system $0 doesn’t always contain the full pathname to the script. You can read more about -S option in perlrun doc and in fact, the daunting code chunk also got explained there.

The second way is to put the following code at the first line of perl script:

#!/wsdb/oemtools/linux/bin/perl5.16.2

This way you directly hardcode the customized perl interpreter in your script. This may sacrifice portability of the script.

Another way to use customized perl interpreter is put this code chunk, again, at the first line of the script:

#!/usr/bin/env perl

This will tell the system (that understands the shebang) to find the first “perl” executable in the list of $PATH. If you want to run your customized perl interpreter this way, you want to put the path to your customized perl interpreter at the beginning of $PATH environment varaible so that you ensure if first “perl” executable found by the system from $PATH is indeed the perl interpreter you want to use.

The last way to run your customized perl interpreter is somewhat similar to our daunting code chunk but with significant difference:

#!/bin/sh
#! -*-perl-*-
eval 'exec /wsdb/oemtools/linux/bin/perl5.16.2 -x -wS $0 ${1+"$@"}'
    if 0;

Let’s run it first to see what we can get. Like previous example section, we put the above code chunk inside a script called bar:

#!/usr/bin/sh
#!-*-perl-*-
eval 'exec /wsdb/oemtools/linux/bin/perl5.16.2 -x -wS $0 ${1+"$@"}'
    if 0;

use Config;
my $perl = $Config{perlpath};

print $perl."\n";
$ bar
/wsdb/oemtools/linuxbin/perl5.16.2
$ perl bar
/wsdb/oemtools/linuxbin/perl5.16.2
$ ./bar
/wsdb/oemtools/linuxbin/perl5.16.2
$ sh bar
/wsdb/oemtools/linuxbin/perl5.16.2

No matter how we execute our script, we always use our customized perl interpreter, even when system perl is explicitly specified (i.e., perl bar). The significant difference than our original daunting code chunk is the use of -x option. The -x does the following:

tells Perl that the program is embedded in a larger chunk of unrelated text, such as in a mail message. Leading garbage will be discarded until the first line that starts with #! and contains the string “perl”. Any meaningful switches on that line will be applied.

Let me walk through what exactly happen in our case. We will use the following information taken from perldoc during the walkthrough as well:

 If the #! line does not contain the word “perl” nor the word “indir”, the program named after the #! is executed instead of the Perl interpreter. This is slightly bizarre, but it helps people on machines that don’t do #! , because they can tell a program that their SHELL is /usr/bin/perl, and Perl will then dispatch the program to the correct interpreter for them.

We launch our bar script as ./bar:

1. Shell executes our script ./bar
2. The system actually executes /bin/sh ./bar because of our shebang specification.
3. sh executes /wsdb/oemtools/linux/bin/perl5.16.2 -x -wS bar
4. /wsdb/oemtools/linux/bin/perl5.16.2 skips:

#!/usr/bin/sh
#!-*-perl-*-
eval 'exec /wsdb/oemtools/linux/bin/perl5.16.2 -x -wS $0 ${1+"$@"}'
    if 0;

and executes:

use Config;
my $perl = $Config{perlpath};

print $perl."\n";

Let’s break down this step into further detail:

4.1 /wsdb/oemtools/linux/bin/perl5.16.2 executes /usr/bin/sh ./bar because it sees a shebang that doesn’t contain the word “perl”
4.2 eval part get executed (i.e., sh executes
/wsdb/oemtools/linux/bin/perl5.16.2 -x -wS bar
)
4.3 Since -x is specified, the first line #!/usr/bin/sh is ignored because it is a shebang but doesn’t contain the string “perl”. Line 2-3 is ignored because if 0. So, the execution starts with use Config; and move forward.

Let’s try launch our bar script using perl bar to see why system perl is not used in this case:

1. Shell executes the script perl bar
2. perl (i.e. /usr/bin/perl) executes /bin/sh bar because it sees a shebang that doesn’t contain the word “perl”
3. eval part get executed (i.e., sh executes
/wsdb/oemtools/linux/bin/perl5.16.2 -x -wS bar
)
4. So our script bar is executed by /wsdb/oemtools/linux/bin/perl5.16.2 instead of /usr/bin/perl

Let’s Practice

Based upon what we learn, you should not have much trouble understanding why

#!/bin/sh
eval 'exec /wsdb/oemtools/linux/bin/perl5.16.2 -wS $0 ${1+"$@"}'
    if 0;

will lead to

/bin/sh: -S: invalid option

error. The key lies in we are not using 1) shebang + string word “perl” and 2) -x option. If you have hard time finding out why, here is the answer.

Thanks for the reading!

 Reference

Advertisements

Linux Fork Bomb

Today, I learned a fun feature of shell called Linux Fork Bomb and this the piece of code I’m reading about:

:(){:|:&};:

Code Analysis

Let’s dive into this and have a little appreciation of the power of shell:

  • :() defines a function called :
  • :|: & runs function : , sends output to : and run in background
  • {...} indicates whatever inside is the content of the function :
  • : calls function for the very first time

Essentially you are creating a function that calls itself twice every call and doesn’t have any way to terminate itself. It will keep doubling up until you run out of system resources.

Some fun observation

: used as a placeholder in shell. For instance, while trueis same as while :. However, this may only work for bash because : is a built-in command for some shell and the buil-in command : has precedence over the function :. So, when we actually execute our bomb, built-in : will get executed instead of our function. So, bomb has been defused.

Here also offers  some insights on how to prevent fork bomb like this. It involves RLIMIT_NPROCIt is definitely worth to dig further.

You can watch a live demo and see how powerful the linux fork bomb can be.