Wednesday, March 16, 2011

'tee' coprocess vs. 'tee' pipe in Korn shell

I had this piece of code which logged the output of a while loop to a file but also showed it on the screen:
while ...
do
...
done | tee logfile

A code change required to set a variable in the while loop and make it known after the loop had ended so I tried
while ...
do
   x=123
...
done | tee logfile
echo x=$x
but x was empty since in a piped sequence the while loop is considered to be a sub shell and thus it cannot set variables in the parent process.

There is a solution though in Korn shell using coprocesses.
# Start a coprocess to log input to logfile via tee 
# and report it also to the current tty
(tee logfile >/dev/tty)|&

# Send output of while loop to coprocess
while ...
do
   x=123
...
done >&p
echo x=$x
will report x correctly.

This works fine for my example where the script is run always in a terminal window.

If the script is run in the background or via cron its (terminal) output needs to be captured, anything else does not make sense, the script writer had an idea why things would be written to the terminal, surely not just for fun.

Tuesday, March 15, 2011

How to create a drop-down menu in OpenOffice.org

There are two easy ways how to create a drop-down menu in OpenOffice.org calc (this worked already in version 2.x and has been tested up to 3.2.1).

The difference between the two ways is the source of the selection entries.

Selection entries are to be entered manually

Go to a cell where the drop-down menu should appear.
Do Data -> Validity....
In the popup window choose the Criteria tab (usually selected by default).

In the Allow menu choose List.

It will show a new textfield called Entries where you enter the choices for your drop-down menu line by line.

Selection entries come from a cell range on the spreadsheet

Go to a cell where the drop-down menu should appear.
Do Data -> Validity....
In the popup window choose the Criteria tab (usually selected by default).

In the Allow menu choose Cell range.

It will show a new textfield called Source where you enter the choices for your drop-down menu as a cell range e.g. B2:B10.
Note: the cell range needs to be entered manually and can't be selected on the spreadsheet.

Monday, March 14, 2011

Background processes and file descriptors in shell scripts

Lately I stumbled upon an issue in a shell script which left me puzzling for a while.

Reduced to a simple example it goes like this:
envision you have two files wrapper.sh and script.sh whereas wrapper.sh is supposed to call script.sh in backticks:
wrapper.sh
#!/bin/sh
x=`script.sh`      # run script.sh and collect its output    
echo x=$x
script.sh
#!/bin/sh
(sleep 60)&        # start a background process
echo pid=$!        # report the pid of the background process
exit 0

The expected output of wrapper.sh was x=pid=12345 kind of immediately after running it.

The unexpected but experienced behaviour was that wrapper.sh was waiting until the background process had finished. This was defying the purpose of the script since in the original scenario wrapper.sh should have managed (like sending signals) the background process after doing some work in between.

Some experimenting with variations of the scripts and some reading finally revealed the clue to the issue.
  • Background (better: forked) processes inherit the file descriptors of their parent process
    i.e. the 'sleep' background process has the same open fds as script.sh
  • Running a command in backticks means to collect its stdout until its stdout is closed
    i.e. wrapper.sh waits until the stdout of script.sh is closed for good.
  • Since the 'sleep' background process writes to the same stdout as script.sh the fd is kept open even after script.sh has finished.
    It does not matter if 'sleep' is actually writing anything or not, the point is that if it would write something it would write to the inherited open stdout.
    ('sleep' is just an example. In the real world it would very likely be another script with some complex tasks to fulfil).
  • The solution is to close stdout of the background process
    #!/bin/sh
    (exec >&-; sleep 60)&  # start a background process but close stdout first
    echo pid=$!            # report the pid of the background process
    exit 0
    

Some hints to explain the situation is the process table showing that wrapper.sh has a defunct sub process (the former script.sh) and the 'sleep' process is a child of init (pid 1). Also a slightly different sub process (echo sub; sleep 60)& leads to x=pid=12345 sub thus showing that wrapper.sh gathered the output of script.sh plus the output of the the sub process.

I wonder how many people are paying attention to this, it is an issue which can be easily overlooked. In essence background processes in scripts like script.sh are daemons since script.sh gives up control of the sub process by simple exiting at some point. So who controls the sub processes, in particular where should they write their output to? Rereading the essentials of a daemon process helps and I will definitly pay more attention to this in the future.

An experiment for the curious:
what happens if stdout was redirected to a file and multiple sub processes were started, each writing to stdout aka. the file? Would everything be written to the file? In which order?
#!/bin/sh
exec 1>/tmp/out
(for i in 1 2 3 4 5; do echo aaaaaaaa; sleep 1 ; done)&
(for i in 1 2 3 4 5; do echo bbbbbbbb; sleep 1 ; done)&
echo DONE

Friday, March 11, 2011

Traps and exit codes in shell scripts

Traps in shell scripts are a nice way to provide cleanups, first of all the removal of temporary files but also any other kind of do-at-the-end things (see also the END clause in awk and Perl).

So for general cleanup one would set the trap for signal 0 which isn't a signal but the indication that the script exited normally with an exit code 0.
Other signals might do something additionally to the cleanup and would get caught by traps on their own.

The questions about signals to be discussed in this article:
  • How to ignore them: sometimes one does not want a script to be interrupted by certain signals
  • How to catch them, react accordingly and exit
  • If exiting after a signal: which exit code should be used
Assuming that we have a handler for exit 0 (when the script ends normal) there also should be a signal handler for signal INT and I want to discuss various setups and what happens after signal INT has been received by the script.
Catch signal INT (Ctrl-C)
and ignore it
Catch signal INT (Ctrl-C)
and exit
#!/bin/sh
trap "echo exiting" 0
trap "echo got INT" INT
...
#!/bin/sh
trap "echo exiting" 0
trap "echo got INT ; exit 1" INT
...
This script
  • will echo 'got INT' and
  • will resume its operation and will not end
This script
  • will echo 'got INT' and
  • will exit with exit code 1 to indicate that this was not a normal ending.
    Instead of 1 there could be any number.
There is a special case of the script above if one chooses to exit with 0 after catching a signal.
#!/bin/sh
trap "echo exiting" 0
trap "echo got INT ; exit 0" INT
...
This script
  • will echo 'got INT' and
  • will echo 'exiting' and
  • will exit with exit code 0.
Instead of 'echo ...' there should be some real action in a production script of course.

So if one decides that a signal should not be ignored there is one big question to be answered: does the observer of the script (a calling script or a user) need to know that the script ended due to receiving a signal and because of which signal in particular? This question should be answered with the consideration in mind that scripts often exit with small exit codes due to something going wrong throughout the script.

  • All (or many) signals are mapped to the same non-zero exit code
    There is little room for variation here. The exit code could be a any number. If the script uses a small number (e.g. exit code 1) that might be indistinguishable from other error induced exit codes in the script. Alternatively one could use a high number (greater than 128) to distinguish endings caused by a signal from other endings in the script. But by mapping all signals to one exit code the script does not give its observer a chance to find out exactly which signal led to its end (this can of course be a deliberate design decision).
    trap "echo got SIGNAL; exit 1" INT QUIT TERM
    (the message 'got SIGNAL' could be used to distinguish this type of exit from other exit 1 reasons in the script)
    or
    trap "echo got SIGNAL; exit 129" INT QUIT TERM

  • Signals should be distinguished from each other i.e. mapped to different exit codes
    Same exit code
    trap "echo got INT; exit 1" INT
    trap "echo got TERM; exit 1" TERM
    
    All signals lead to the same exit code.
    The echo statement is a differentiator but is probably not present in a real life script.
    Different exit codes
    (small numbers)
    trap "echo got INT; exit 1" INT
    trap "echo got TERM; exit 2" TERM
    
    Here signals lead to different exit codes.
    Issue: they are probably not distinguishable from other points of exit in the script
    Different exit codes
    (high numbers)
    trap "echo got INT; exit 130" INT
    trap "echo got TERM; exit 143" TERM
    
    Additionally to showing different exit codes the exit codes have been set with a formula 128+signal_number which follows the convention of sh.


So if you are interested in capturing signals in scripts, ending the script and also getting a meaningful exit code telling you which signal then
  • capture each signal individually
  • explictly put an 'exit n' into the signal handler
  • choose n to be 128+signal

This way a calling script can differentiate:

script.sh
ex=$?
if [ $ex -eq 0 ] ; then
  # All ok
elif [ $ex -lt 128 ] ; then
  # An error occured in the script
else
  # Script ended due to signal  $ex-128
fi

Thursday, March 10, 2011

Counting lines in shell variables

Very often one stores the output of a command in a variable, sometimes the output is a multiline string and one wants to count the number of lines (all of the below is in Bourne shell).

Example:
A=`who`
NUM_USERS=`echo "$A"|wc -l`
echo "Number of users: $NUM_USERS"
Unfortunately this approach is not correct.

If there are users on the system it works fine and the correct count is reported.
But if there are no users then the code above still reports a user count of 1.

Why?
Because echo of an empty variable still adds a newline to the output which is counted by wc. Look at this example (assuming that you don't have a variable called avTyh).
echo "$avTyh" |wc -l
1

Replacing echo by printf does not help much.
printf "$A"|wc -l
works correctly for empty variables but it counts wrongly for a multiline string since it omits the final newline.

Rather than using an if ... else ... construct there is a more elegant solution. Look at this:
printf "$A${A:+\n}" |wc -l
The parameter substitution ${A:+\n} achieves the following:
If $A is set and is non-null then substitute a newline; otherwise substitute nothing.
So if $A is empty then "$A${A:+\n}" is something empty and nothing which counts to zero.
If $A is non-empty then "$A${A:+\n}" is something plus a newline.

Note: in the end I think the strange thing is that in an assignment like A=`who` the string is missing a final newline which leads to the issue after all, I should check why this is the case, maybe in another post.

Friday, March 4, 2011

Signal handling in shell background processes

This article is about my learning experience in signal handling and monitoring sub processes.

Yesterday I got puzzled when a supposedly simple test program did not act as expected.
The shell script trap.sh below sets a trap to catch SIGINT (or signal 2) and exit upon receiving it. It works as expected when run standalone but when invoked as a background process in another test script trapWrapper.sh it failed.




trap.sh trapWrapper.sh
#!/bin/sh
# Catch signal 2
trap "echo trapped 2;I=1" 2
# Wait until var 'I' is set to something
while : ; do 
  sleep 1; [ -n "$I" ] && break 
done
echo DONE
#!/bin/sh
# Run trap.sh in the background
./trap.sh 2>&1  &
pid=$!
# Sleep 5 seconds 
sleep 5
# ... and then kill the background process
kill -2 $pid
# Wait 
wait $pid
# Exit with exit code of background process
exit $?
  Running trapWrapper.sh will wait forever and never end.
When killing it with Ctrl-C it will go away but the trap.sh
process will be left behind and needs to be killed manually.
(the bigger idea behind all this is to have a monitoring script which starts a number of background processes and kills them after a certain timeout period has passed).

So what's the difference when run in background?

The sh man page has the answer:

man sh
...
  Signals
     The INTERRUPT and QUIT signals for an  invoked  command  are
     ignored if the command is followed by &. Otherwise, signals
     have the values inherited by the shell from its parent, with
     the  exception  of  signal 11 (but see also the trap command
     below).

i.e. SIGINT in a background process is ignored (as well as SIGQUIT).

SIGTERM is not mentioned here so the next idea is to enhance trap.sh and adding a signal handler for it and changing trapWrapper.sh so that it sends SIGTERM to the background process.

trap.sh trapWrapper.sh
#!/bin/sh
# Catch signal 2 (INT) and 15 (TERM)
trap "echo trapped 2;I=1" 2
trap "echo trapped 15;I=1" 15
# Wait until var 'I' is set to something
while : ; do 
  sleep 1; [ -n "$I" ] && break
done
echo DONE
#!/bin/sh
# Run trap.sh in the background
./trap.sh 2>&1  &
pid=$!
# Sleep 5 seconds 
sleep 5
# ... and then kill the background process
kill -15 $pid
# Wait 
wait $pid
# Exit with exit code of background process
exit $?
  after 5 seconds this will result in what we wanted:
trapped 15
DONE

Something to remember: the supposedly stronger kill with SIGINT (and SIGQUIT would be the same) does not work due to the ignored signal whereas SIGTERM works fine.

So if you write a script which should act upon SIGINT or SIGQUIT let is also act upon SIGTERM, just to be safe.

Note: this might be different in other shells. When you test this interactively you'll see the difference:

sh csh
$ trap.sh&
8379
$ ptree 8379
    8360  sh
      8379  /bin/sh ./trap.sh
        8385  sleep 1
$ kill -2 8379
$ ptree 8379
    8360  sh
      8379  /bin/sh ./trap.sh
        8785  sleep 1
$ kill 8379
$ trapped 15
DONE
% trap.sh&
[1] 8116
% ptree 8116
    33465 -csh
      8116  /bin/sh trap.sh
        8207  sleep 1
% kill -2 8116
% trapped 2
DONE
the background process ignores the signal the background process in csh accepts SIGINT and exits
In case you've wondered about ptree: this was tested on a Solaris box.