Friday, December 17, 2010

Rsync with 3 machines

In this article I will describe how to use rsync to sync some files on three servers where the same list of files is maintained by users i.e. a change to a file could happen on any machine (this is different to a hierarchical sync where files are pushed from a master server to a set of other machines).

Rsync at your company

I assume that rsync is installed on all three servers, not necessarily though in the same location, the rsync binaries need to exist on the local and remoate machines which talk to each other.
My test case were two Solaris sparc servers (they had rsync pre-installed) and one Solaris x86 server where I had to manually add it (I got it from a sunfreeware site and put it into /tools/rsync/bin/rsync but I leave that to the user).

Rsync mirror design

In order to have a proper control of what is sync-ed the main rsync process should be running on one machine only and perform the following steps.

Assume you have 3 machines A, B, C.
  • A->B: update files on machine B which are newer on machine A.
  • B->A: update files on machine A which are newer on machine B.
At this stage A and B are in sync. Now do the same with machine C.
  • A->C: update files on machine C which are newer on machine A.
  • C->A: update files on machine A which are newer on machine C.
Now A and and C are in sync but it could be that B is missing some updates from C so do:
  • A->B: update files on machine B which are now newer on A (and came from C)
(of course this is not a 100% safe algorithm since the whole sync scenario is not atomic: after syncing A and B and while A and C are syncing there could be a change happening on B which is not picked up in this round)

Here is the basic rsync command:
rsync --archive --update --verbose --stats --rsh=ssh --cvs--exclude 
        --file-from=somefile /app/foo/conf/ server2:/app/foo/conf
The --archive option is a summary option (recursion and preserving everything like timestamps, symbolic links etc. except hard links).
The --update options says to skip files from transferring which are newer on the target system.
The --verbose and --stats options are for reporting only.
The --rsh=ssh option means to use ssh as the login mechanism to the remote system.
The --cvs-exclude options excludes all CVS related files from being checked. It is basically a filter for certain file names and file extensions.
The --file-from=somefile lists a file which contains all filenames to be checked (nothing else will).
The first argument is the source directory to be checked and the second argument is the target machine server2 and directory to be checked (these could be different: no need for rsync that /app/foo/conf is in the same place).

There are a couple of noteworthy additional options:
--rsync-path=/tools/rsync/bin/rsync tells rsync where to find rsync on the remote system.
--dry-run tells rsync to do a check only but not do a real file transfer.

Note also that if 2 files are equal but have different time stamps rsync will update the time stamps so that they are in sync.

Solution

The above design has been implemented with
  • the script is /tools/rsync/scripts/rsync.sh on server A
  • the list of files to be checked is in /tools/rsync/scripts/rsync.files (a list of config files)
  • password free ssh access from machine A to the remote machines has been enabled by adding the public key (.ssh/id_rsa.pub) into the remote machines's .ssh/authorized_keys file

Cron (UNIX tool to run regular jobs)

There is a simple cron job on machine A:
7,17,27,37,47,57 * * * * cd /tools/rsync/scripts; ./rsync.sh | /usr/ucb/mail -s "rsync `date`"  foo@Bar.COM
i.e. it runs every 10 minutes, notification is via email to 'foo' (this could be improved).

Password free ssh

In order for rsync hosts to communicate via ssh but without password one needs to generate a public/private key on the central machine and it needs to be added to the remote machines.
  • On machine A: generate key files .ssh/id_rsa and .ssh/id_rsa.pub
ssh-keygen -t rsa        
Generating public/private rsa key pair.
Enter file in which to save the key (/app/foo/.ssh/id_rsa):  [Enter return]
Enter passphrase (empty for no passphrase):  [Enter return]
Enter same passphrase again:  [Enter return]
Your identification has been saved in /app/foo/.ssh/id_rsa.
Your public key has been saved in /app/foo/.ssh/id_rsa.pub.
The key fingerprint is:
24:ab:31:1e:f1:74:16:4d:0f:8e:70:19:1b:31:2e:db foo@machineA
  • Check the public key (this is one line which I wrapped for readability)
cat $HOME/.ssh/id_rsa.pub
ssh-rsa AABAB3NzaC2yc2EAAAABIwCAAIEAvpzxLumVmSRPKmgwSk9NGPUDcxfFpypUAdi3UGpZ2QSqoak
QaDQyp4RPVoLA2gADjW3Y132TJZLEBCmBaX7A588XGg/svXuCnXXXuRYL0wwO8iRCleCO50mzNfY4XcOxM
P62JIVdlDOMsnY/eSYpK+ex/9RomVRa/bMw9b/D/e0= foo@machineA

  • Enter this line into .ssh/authorized_keys on remote machines (B and C)
(Note: the keys above are fake, just in case someone wonders)

This scheme works nicely for systems where there aren't that many changes happening at the same time.
Once can easily envision that if two users change the same file at the same time on machines B and C the subsequent rsync will first copy B's version to A and then C's version to A and then forward A's version (which is equal to C) to B thus wiping out the original change on B i.e. this scheme does not guarantee data consistency. I am using it to maintain certain config files where there are changes only once or twice per week and only a handful of users have access to the files.
There is also the issue of files being synced while updated by users at the same time. One would need much more clever file-locking-across-multiple-systems approaches to tackle this.


See also Rsync man page (lots of options)

Thursday, December 16, 2010

Gnuplot - Indexed Datafiles

If you don't want to plot all lines in a datafile there is a way in gnuplot to achieve that called indexed datafiles. It requires though a certain datafile format and is not always applicable.

Here I present an easy example how to use indexed datafiles.

The main idea is that you want to split the rows in a datafile into chunks and plot only one or more chunks of these. The - probably not so easy to fulfill - requirement is that data chunks are separated by two empty lines. The chunks are indexed (starting at 0, thus the name indexed datafile) and can be used in a plot statement as described further down.

Consider this example (call it indexed.dat):
20
3
17


34
40
25


14
10
15

The gnuplot code and its generated graphs


set term png medium size 200,200

set yrange [0:*]
unset key

# generate a graph from the first chunk of data
set output 'indexed1.png'
set title 'first data set'
plot 'indexed.dat' index 0 using 1 with lines

# generate a graph from the second chunk of data
set output 'indexed2.png'
set title 'second data set'
plot 'indexed.dat' index 1 using 1 with lines

# generate a combined graph from the second 
# and third chunk of data
set output 'indexed3.png'
set title "second and third\n data set"
plot 'indexed.dat' index 1:2 using 1 with lines


Gnuplot - Stacked Histograms

Since gnuplot cannot generate pie charts an alternative are stacked histograms.
In fact stacked histograms are even better in my mind since one can put the histograms next to each other and this allows better comparability than looking at a number of pie charts.
A single pie chart might make sense but in reality it's more often about how does the current chart compare to a previous one.

Here I present an easy example how to generate stacked histograms (available in gnuplot since version 4.1).
For fancier examples go to the Gnuplot histogram demos.

Consider this example (call it stackedhisto.dat):
year foo bar rest
1900 20 10 20
2000 20 30 10
2100 20 10 10
We have 1 row with header information and 3 rows of data.
For each year we have measured 3 values foo, bar and rest which we want to show in graphs in two different ways.

The first graph shows the stacked histogram with the nominal values of the data i.e. the height of the first bar is 50 (=20+10+20).

The second graph shows the percentage distribution i.e. all values are scaled to 100.
The same nominal '20' in graph 1 leads to percentages 40, 33.3 and 50 in graph 2.
One box of this type of graph is often depicted as a pie chart so rather than comparing 3 pie charts (one for each year) here we have 3 boxes in one graph, much easier to compare.

The gnuplot code

#
# Stacked histograms
#
set term png size 300,300
set output 'stackedhisto.png'
set title "Stacked histogram\nTotals"

# Where to put the legend
# and what it should contain
set key invert reverse Left outside
set key autotitle columnheader

set yrange [0:100]
set ylabel "total"

# Define plot style 'stacked histogram'
# with additional settings
set style data histogram
set style histogram rowstacked
set style fill solid border -1
set boxwidth 0.75

# We are plotting columns 2, 3 and 4 as y-values,
# the x-ticks are coming from column 1
plot 'stackedhisto.dat' using 2:xtic(1) \
    ,'' using 3 \
    ,'' using 4


# New graph
# We keep the settings from above except:
set output 'stackedhisto1.png'
set title "Stacked histogram\n% totals"
set ylabel "% of total"

# We are plotting columns 2, 3 and 4 as y-values,
# the x-ticks are coming from column 1
# Additionally to the graph above we need to specify
# the titles via 't 2' aso.
plot 'stackedhisto.dat' using (100*$2/($2+$3+$4)):xtic(1) t 2\
    ,'' using (100*$3/($2+$3+$4)) t 3\
    ,'' using (100*$4/($2+$3+$4)) t 4

The generated graphs

Fast setup of a simple MySQL database on Solaris

This assumes that the MySQL software is installed already on your machine e.g. in /usr/sfw/bin on Solaris 10.

The recipe below will create a simple MySQL db on your machine in no time, of course it will be only accessible on your machine (the socket will be local), the data sit in your home directory but it suffices to get a simple db to play with.

The initial setup is easy.
  • Choose a directory e.g. $HOME/mydata where to put the db.
  • Choose a socket e.g. /tmp/my.$USER.sock for the server/client connection.

In a Bourne or Korn shell these 4 commands get the db going:
DATADIR=$HOME/mydata
MYSQL_UNIX_PORT=/tmp/my.$USER.sock
/usr/sfw/bin/mysql_install_db  --datadir=$DATADIR
/usr/sfw/sbin/mysqldb &
and then run the following to connect to the db (like isql in Sybase):
/usr/sfw/bin/mysql
will get you a prompt , do
use test;
which will connect you to the 'test' db and
create table aa (bb int);
will create a table 'aa' with one column 'bb' of type integer.

For more details the use the online ref http://dev.mysql.com/doc/refman/5.0/en/

Have fun, MySQL has a huge number of tuning possibilities, not necessarily though for the beginner or for simple small dbs.

Note: this article was written in 2007 when MySQL 5.0 was hot, I haven't checked with later releases, in particular after Oracle has taken on Sun Microsystems which had acquired MySQL earlier.

Wildcard subtleties in C- and Bourne shell

The idea for this article came when I encountered a - what I first thought strange - behaviour of a hanging nawk command and later found out that I had stumbled upon a well documented feature of C-shell which I had not been aware of.
Here is the allegedly hanging command: nawk -f e*.awk aa* where e*.awk was supposed to match the awk script examples_bynumber.awk via wildcard and aa* was a typo, there were no files of that name.

The issue came up when I was trying to use multiple wildcards in one command where only some are matching. Here are the differences by shell.

Csh

When csh sees something like a* b* it tries to expand the patterns to filenames.
If at least one can be found then a* b* is a non-empty string and the command built successfully.
If there is neither a file beginning with a or b then the expanded string is empty and csh responds with No match.

Here is a set of possible translations for an nawk command executing an awk script e.awk where there are no files beginning with aa.
nawk -f e.awk aa    -> nawk -f e.awk aa  
    nawk is executed and will report an error about 'aa'
nawk -f e.awk aa*   -> No match          
    csh reports the error since aa* is empty
nawk -f e*.awk aa   -> nawk -f e.awk aa 
    the wildcard match 'e*.awk' is successful
    and nawk is executed and will report an error about 'aa'
nawk -f e*.awk aa*  -> nawk -f e.awk
    the wildcard match 'e*.awk aa*' is successfully translated to   e.awk
    and nawk is executed waiting for stdin, thus seemingly 'hanging', this was my case
and another variation if no e*.awk scripts exist but a file aa does:
nawk -f e*.awk aa*  -> nawk -f aa
    the wildcard match 'e*.awk aa*' is successfully translated to aa 
    and nawk is trying to execute aa as an awk script

Sh, ksh

In contrast to that Bourne and Korn shell leave the wildcard param as is if it cannot be expanded and they do not report about pattern mismatches. (you can see this via truss -a).
nawk -f e.awk aa    -> nawk -f e.awk aa 
nawk -f e.awk aa*   -> nawk -f e.awk aa*

nawk -f e*.awk aa   -> nawk -f e.awk aa 
nawk -f e*.awk aa*  -> nawk -f e.awk aa*   

and again if aa exists but no e*.awk scripts:
nawk -f e*.awk aa*  -> nawk -f e*.awk aa   
i.e. nawk is always invoked and left to handle the issue of existing files.

Conclusions

Wildcard handling differs vastly by shell type (I have not tested others like bash, zsh or tcsh) and should be used with caution.
In csh if you imagine a command like some_command -f a* -m b* c* then this can lead to quite different executions depending on which files exists.
If there is no file beginning with a then the command will fail (-f being called without a parameter).
If there are 2 files beginning with b (say b1 and b2) then b1 will be passed as a parameter to -m and b2 will be part of the file list.

People who are using wildcards frequently should be aware of this possible trap and treat them more cautiously (advice to myself :-) )

Arrays in nawk: test for existance and implicit creation of elements

When using arrays in nawk I still stumble upon handling the test for existance wrongly.

Consider this example:

nawk '{for(i in u) print "1",i,u[i]; if(u["ab"]!="") print "ab exists"; for(i in u) print "2",i,u[i]}'

This script should print all entries in array u then do a check if u["ab"] is non-empty and again print all entries in array u.

Since there is no obvious assignment to u[] the assumption is that neither of the loops will print anything so it is somewhat unexpected that the first line of input will trigger a line of output:

2 ab
i.e. the second loop finds an array element ab. Why?

Because the test u["ab"]!="" implicitly created the array entry u["ab"] with an empty content.

The correct way to test for existance which does not create an entry implicitly is like this:
nawk '{for(i in u) print "1",i,u[i]; if("ab" in u) print "ab exists"; for(i in u) print "2",i,u[i]}'

Rule: never test with array[ind]=="". Always use ind in array.

Replace 'grep ...| awk ' by awk pattern matching

I've seen so many occurances of 'grep ...|awk' in my life with people missing out on awk's pattern matching capabilities that I decided to blog about it in what will eventually become a scripting best practices series.

Very often people are tempted to use constructs like this:

grep somepattern filename | awk '{dosomething}'
e.g. find all lines starting with the digit 1
grep '^1' /etc/hosts | awk '{print $2}'

These are 2 processes connected with a pipe and that can be simplified as just one awk process:

awk '/^1/ {print $2}' /etc/hosts

It makes even more sense if there are multiple greps in the pipe.

grep '^1' /etc/hosts | grep -v localhost | awk '{print $2}'
vs.
awk '/^1/ && !/localhost/ {print $2}' /etc/hosts
i.e. combining pattern matching with logical expressions is a useful construct.

This example showed also that an 'if' clause in awk can be written quicker as a pattern match.
The example above is nicer than the equivalent code

awk '/^1/  { if($2!="localhost") print $2}' /etc/hosts
though admittedly both codes are not exactly equal:
the example above rejects any line containing the string 'localhost'
whereas the 'if' example rejects lines where the second field is equal to 'localhost'.

sed scripts

I always wanted to use complex sed scripts in parts of my work. I wrote some but always ended up replacing them by something else (awk mostly) so maybe this page is superfluous but after having spent some time on it I wanted to keep a record.

sed usage in scripts is most of the times reduced to these cases:

  • simple substitution (one instance or whole line): 's/.../.../' or 's/.../.../g'
  • deletion of ranges or special matches eg. empty lines: '/^$/d'
  • multiple substitutions in sequential order: -e 's/.../.../' -e 's/.../.../'
  • capture partial strings: 's/...\(...\).../\1/'
  • output only matching lines: sed -n '....p'

One can do much more comples patterns with sed when invoking its concept of hold space and pattern space which are basically two internal buffers to store (multiple) lines.
The usage is though a bit tricky and thus it has not found widespread use (no offense to dedicated sed scripters out there but in my 20 years in the UNIX world I've seen a lot of scripts and I have seen very rare occurances of this).

Imagine the following example:
you have the output of timex in a file with some text before and some text after.
You want to put a start tag just before the timex output and an ending tag right after.

some text might appear here
real        32.04
user        0.14
sys         0.22
... and some more text there

Your script should achieve the following output:

<start_timex>
real        32.04
user        0.14
sys         0.22
<end_timex>
some text might appear here
... and some more text there
The script below does it and it got complex because I wanted to handle a lot of cases:
  • there might be zero, one or multiple lines before the timex output
  • there might be zero, one or multiple lines after the timex output (especially 'sys' being the last line made the script very complex and unreadable basically)
  • there might be one or many sections of timex output with arbitrary text in between which all should appear (separately tagged) at the beginning
i.e.
some text might appear here
real        32.04
user        0.14
sys         0.22
and something in between
real        5.00
user        0.01
sys         0.01
... and some more text there
should end up like this
<start_timex>
real        32.04
user        0.14
sys         0.22
<end_timex>
<start_timex>
real        5.00
user        0.01
sys         0.01
<end_timex>
some text might appear here
and something in between
... and some more text there

This script does it (and it still has one bug: when there is no surrounding text at all it will add an empty line to the bottom to the output which is due to the last 'p' statement in 'sys').

sed -n '
# We call sed with '-n' so that we control exactly what we print
/^real/ {
        # print start tag 
        i\
<start_timex>
        # print current line (in pattern space) and go to next line of input
        p
        d
}
/^user/ {
        # print current line (in pattern space) and go to next line of input
        p
        d
}
/^sys/ {
        # print current line (in pattern space)
        p
        # print end tag
        i\
</end_timex>
     $ { # Check if 'sys' is the last line
        # Exchange hold space and pattern space, there might be something left in hold space
        x
        # remove first newline in ex-hold space
        s/^\n//
        # print pattern space
        p
     }
     # Go to next line of input
     d
}
$ { # Last line
    # at the end check if something is still in hold space and print that too
    # Append last line to hold space
    H
    # Exchange hold space and pattern space
    x
    # remove first newline in ex-hold space
    s/^\n//
    # print pattern space
    p
    # Go to next line, there is none ie. end the program here
    d
}
{       # This action is processed for each(!) line (thus we needed to delete the previous matches)
        # add pattern space to hold space where it will stay until we call it back
        H
        # delete pattern space and start next cycle
        d
}'  filename

Compare this to the following nawk script:

/^real/   { print "<start_timex>"; print; next }
/^user/   { print; next }
/^sys/    {print; print "<end_timex>"; next }
{  # capture non-timex lines in buffer 'text'

   if(text=="") text=$0;
   else text=text "\n" $0;
}
END { if(text!="") print text }
This is - in my view - better readable, better to understand, and mainly because of the better control structures.

The hold and pattern space management (append to, exchange with, automated addition of newlines etc.) is confusing and adds so little value that there is no real reason to learn it in my view.
Don't get me wrong: I use sed all the time with the usages outlined above. But anything more complex like rearranging lines should be left to other tools.

Find and -exec with backticks like `... {}`

In this article I'll explain the reason for the following rule.

Rule:
You can't use backticks in a find's exec.

First of all in order to have a working example I created a subdir with a couple of more subdirs in it and the following find statement lists all subdirs named 'dd'.

mkdir -p aa/dd aa/bb/dd aa/cc/dd
find aa -name dd
aa/bb/dd
aa/cc/dd
aa/dd

A slightly more complex use of find is something like

find aa -name dd -exec dirname {} \;
which lists all directories which contain a 'dd' subdir or file
aa
aa/bb
aa/cc

My naive knowledge of find led me to believe that an equivalent command is

find aa -name dd -exec echo `dirname {}` \;
Very wrong. Here is the result:
.
.
.
So obviously '{}' is not supplied to 'dirname' as expected and none of the directory names is printed,

A little googling in comp.unix.shell revealed that I hadn't understood how 'exec' is invoked.
You cannot use any shell-type substitution or redirection in the 'exec' clause since 'exec'
executes the 'exec()' system call.

A hacker's way of doing it is to invoke another shell in the 'exec' clause.
Since the shell does not know about the '{}' parameter it has to be supplied as a positional parameter in a rather tricky way:

find aa -name dd -exec /bin/sh -c 'echo `dirname {}`' \;
does not work.

find aa -name dd -exec /bin/sh -c 'echo `dirname $0`' {} \;
works as expected but note that '{}' is seen as argument $0.

And with csh instead of sh you'll have to change the arg number to $1:

find aa -name dd -exec /bin/csh -c 'echo `dirname $1`' {} \;

Originally this problem came up when I wanted to rename all subdirs of a certain name to something else so

find aa -name dd -exec mv {} `dirname {}`/zz \;
would have been nice and short but it doesn't work.
find aa -name dd -exec /bin/sh -c 'mv $0 `dirname $0`/zz' {} \;
does it now for me.

Solutions with 'xargs'

Here is an invocation with 'xargs' (note that '-i' provides individual args (as {}) rather than one long arg list):
find aa -name dd -exec dirname {} \; | xargs -i mv {}/dd {}/zz

The general problem is that I need a 'dirname' to get the parent dir and I need a 'mv' to rename the dir.
Putting all of this into the 'xargs' section also means that I have to invoke another shell e.g.

find aa -name dd | xargs -i /bin/sh -c 'mv $1 `dirname $1`/zz' {} {}
which is of similar complexity than the 'find-only' solution.

Simpler but one more in the pipeline:

find aa -name dd | xargs -i dirname {} | xargs -i mv {}/dd {}/zz

Create a comma separated string in shell: a b c -> "(a,b,c)" (no, not sed)

Well, the title does not really explain the real issue but I couldn't find a better one.

Another try: assume you have a list of tokens in a shell script and you want to build a comma separated list out of those tokens and you want to do this in a loop there is usually an issue with one comma too many at the beginning or at the end.

For simplicity I use a simple token list.
This example of course could be solved faster with sed.
I add a more complex example at the end.

A first approach is as follows and will create a list with an empty first element so to speak:
TOKENS="a b c"
LIST="("
for token in $TOKENS ; do
   LIST="${LIST},${token}"
done
LIST="${LIST})"
echo $LIST
(,a,b,c)
So how does one get rid of the extra empty element at the beginning of the list?

One might add an if-statement:
TOKENS="a b c"
LIST="("
for token in $TOKENS ; do
   if [ "x$LIST" = "x(" ] ; then
      LIST="${LIST}${token}"
   else
      LIST="${LIST},${token}"
   fi
done
LIST="${LIST})"
echo $LIST
(a,b,c)
In my search for a shorter one line solution I found the following approach using shell paremeter substitution.
TOKENS="a b c"
LIST=""
for token in $TOKENS ; do
      LIST="${LIST:-(}${LIST:+,}${token}"
done
LIST="${LIST})"
echo $LIST
(a,b,c)
This is admittedly not easy to understand on the first glance unless you are very familiar with the parameter substitution.
I'm using the complementary idea of :- and :+ .
${LIST:-(} will put out either the current value of LIST (if it exists and is not empty) or a ( .
So in the first invocation of the loop LIST is not yet set and thus ( is put out.
In the next rounds LIST is set and will be put out as is.
${LIST:+,} will put out either a comma (if LIST exists and is set) or nothing at all.
In the first invocation of the loop LIST is not yet set and nothing is put out.
In the next rounds LIST is set and a comma will be put out.

In all cases the token is appended at the end.

Here is a an example with more complex tokens which contain a space and they should be surrounded by quotes in the resulting list.
A="sam smith"
B="jane jones"
C="gabe miller"
LIST=""
for token in "$A" "$B" "$C" ; do
   LIST="${LIST:-(}${LIST:+,}'${token}'"
done
LIST="${LIST})"
echo $LIST
('sam smith','jane jones','gabe miller')

Wednesday, December 15, 2010

Echo multiline variables with and without quotes

When working with multiline variables in a Bourne or Korn shell there are certain subtleties to using quotes which - if used careless - can give quite different results.
A multiline variable is a variable which contains two or more lines of text.
I will discuss for a given multiline variable $T the differences between
  • echo $T
  • echo "$T"
  • printf $T
  • printf "$T"
  • printf "%s" $T
  • printf "%s" "$T"
If you know which of these commands show the same result you can skip this page.

Example: assume your /etc/hosts looks like this
#
# Internet host table
#
127.0.0.1       localhost
10.6.102.3      comp-nis
10.6.129.146    foo.Bar.COM loghost
192.168.127.1   foo.Bar.COM-ce1 # DO NOT MODIFY
and your variable $T should contain all lines with the string 'host' in it ie. lines 2, 4 and 6.
% T=`grep host /etc/hosts`
echo $T displays all 3 lines in one line
% echo $T
# Internet host table 127.0.0.1 localhost 10.6.129.146 foo.Bar.COM loghost

echo "$T" displays all 3 lines as 3 lines ie. truly showing the results of the grep command
% echo "$T"
# Internet host table
127.0.0.1       localhost
10.6.129.146    foo.Bar.COM loghost

Since the use of echo is discouraged these days one needs to look at printf.
printf $T dislays only the first word in $T which happens to be just # in our case.
% printf $T
#%

printf "$T" shows all 3 lines but without the final newline
% printf "$T"
# Internet host table
127.0.0.1       localhost
10.6.129.146    foo.Bar.COM loghost%

printf "%s" $T shows all 3 lines in one line by removing all white space (blanks and newlines) i.e. also without the final newline
% printf "%s" $T
#Internethosttable127.0.0.1localhost10.6.129.146foo.Bar.COMloghost%

printf "%s" "$T" shows all 3 lines but without the final newline (same output as printf "$T")
% printf "%s" "$T"
# Internet host table
127.0.0.1       localhost
10.6.129.146    foo.Bar.COM loghost%
So out of all versions only echo "$T" did what one would expect and could be replaced by printf "$T\n" or printf "%s\n" "$T" ie. printf with a final newline.

The difference becomes important when you feed the contents of a variable to some command like U=`echo $T | some_cmd ...`

Assume you want to replace 'host' by 'HOST' in our example.
U=`echo "$T" | sed 's/host/HOST/'` is the right solution: 3 changed lines in $U
% echo "$U"
# Internet HOST table
127.0.0.1       localHOST
10.6.129.146    foo.Bar.COM logHOST
U=`echo $T | sed 's/host/HOST/'` will produce a one line output and only the first occurance of 'host' will be replaced
% echo "$U"
# Internet HOST table 127.0.0.1 localhost 10.6.129.146 foo.Bar.COM loghost
Sometimes though transferring a multiline output into a single line is what is needed so the difference in behaviour can be used to one's advantage.
Note that in csh there doesn't seem to be a way to preserve the multiline nature in a variable (at least I don't know of any).

Korn shell - Time (built-in command)

In this and some later blogs about I will look into a couple of Korn shell features, I'm only interested in Korn-shell features for scripts though, not in the interactive usage.
Today my topic is the

Time (built-in command)

The built-in time command has a nice feature which distinguishes it from the system commands /bin/time or /bin/timex or the Bourne shell built-in time. (I didn't check other shells built-in time).

You cannot just run it with an external command but also with a function.

Example:
#!/bin/ksh
f() {
    sleep $1
}
time f 62
will report
real    1m2.01s
user    0m0.00s
sys     0m0.00s
Note: the space is a tab (not a sequence of spaces as in timex).

A little sed editing will get the output format closer to timex:
(time f 62) 2>&1 | sed '/^real/,/^sys/ {
# this is: tab zero m to be replaced by tab
s/ 0m/ /
# replace m by colon
s/m/:/
# remove trailing s
s/s$//
}'

but still not quite (timex runs e.g. on Solaris 10)
timex sleep 62

real        1:02.02
user           0.00
sys            0.00
In later posts I will look into coprocesses and job control.