Thursday, January 26, 2012

Text-to-Audio on my Mac

I recently experimented a little with creating speech from a given text.

First of all I had not known that this functionality existed on my Mac. A little web search discovered a number of pages explaining the functionality but for the sake of the reader (and maybe even more my own's sake not having to remember all this stuff) I'll describe it here.

There are two applications involved:
  1. TextEdit where you write the text to be converted to speech
  2. Automator which will do the conversion
So in the first step open TextEdit (Finder->Applications->TextEdit) and write some text which you would like to hear.
Then you need to start Automator (Finder->Applications->Automator).
  • In the first window choose the workflow Text .
  • Change the field Get content from and select TextEdit
  • Click the Choose button
  • In the lefthand column under Library click Text and in the next column double click Text to Audio File.
  • In the Text to Audio File frame you can choose a voice by selecting an entry in System Voice: I pick Alex.
  • Choose a filename (it will be saved in aiff format) and a location where to store the result.
  • Click the Run button in the upper righthand corner of Automator. Now your text should get transformed into speech and a file containing the output will be created.
  • Click the Results button in the Text to Audio File frame. Listen to result by double clicking on the file icon.
The recipe above works fine if you do text to speech translation once in a while.

Regular task:
If you do it regularily you can create an Automator workflow and reuse it whenever needed. Simply do a 'Save As...' and save this workflow under a recognizablee name. Note that it will always use the same output filename and location and this overwrites previous audio files.

More voices:
You can also download and install other voices if you're not happy with the standard ones. Good ones possibly need to be paid for, some sites offer trials e.g. InfoVox from Assistiveware.

Speech control:
You can insert certain control elements into the text to better control the speech like volume changes of certain words, extra pauses etc. I have been using the silence element e.g. a pause of 5 seconds can be achieved with [[slnc 5000]]
Here is a comprehensive list of speech commands from Apple (the page seems to be deprecated but the commands still work).

Wednesday, January 18, 2012

How to track sub processes

I write a lot of scripts and one of the common problems (at least in my area of work) is how do I keep track of all sub processes and how do I cleanup all processes which a script might have started.
(Note: this has been developed on a Solaris 10 system which features the particular ptree command to easily check the process tree of a given process)
In this article I will only deal with the tracking of sub processes, eventually one would want to kill them if needed which is either fairly easy with kill -9 (but risking leftovers like temporary files) or can become complex if a script spawns new processes when receiving a weaker kill signal.

So here is the scenario:
script a.sh runs another script b.sh.
Before a.sh exits it wants to ensure that b.sh has not left any processes behind i.e. it wants to identify b.sh and all of its child processes (so that they can be killed if still running).

Running another shell script in the background


Example 1: a.sh runs b.sh in the background

ptree suffices in such a case
b.sh:
#!/bin/sh
sleep 200

a.sh:
#!/bin/sh
b.sh&
ptree $!
ps -u $USER -o pid,ppid,args |grep $!
(ptree should show the process tree of the last background process.
the ps command should show process ids (pid) , parent process ids (ppid) and script arguments (args) of all processes of user $USER)

Output of ptree:
59691 /bin/csh -c a.sh
   59716 /bin/sh a.sh
     59717 /bin/sh b.sh
       59719 sleep 200

Output of ps:
59731 59716 grep 59716
59717 59716 /bin/sh b.sh
59719 59717 sleep 200

So both b.sh and its sub process 'sleep' are shown in the process list and one could get the pids and kill them if needed.

There are more complex situations where ptree/ps don't help, and these are covered in the next parts.

Sub process detaching

This time we consider an example where the sub process detaches itself from the current process tree.

What do I mean by that?
Every process has a parent id so that if a process spawns a process which spawns another process they are all connected via their parent id (process' 1 id becomes the parent id of process 2, process' 2 id becomes the parent id of process' 3 aso.).
A process can break this chain though and can detach itself from its parent so that it gets the init pid 1 as parent id (all processes can be traced back to pid 1 in a UNIX system).

Example 2: here b.sh runs a process in the background itself
b.sh:
#!/bin/sh
sleep 200&

If you run this script and check your process list you will find something like this, a sleep process with ppid 1
7968     1 sleep 200

If you run a.sh from the previous example with the new b.sh your ptree and ps output will look as follows:
Output of ptree:
8932  /bin/sh ./a.sh
   8933  <defunct>
Output of ps:
8936  8932 grep 8933
ie. ps does not show anything at all and ptree shows a.sh with a defunct sub process. This defunct sub process is the leftover of b.sh.
Why is it a defunct? Because it has ended but its parent a.sh has not (yet) waited for it to finish.

Here is a new a.sh which solves that (remember this rule: a defunct process is always due to bad code in the parent, not the process which became defunct):
#!/bin/sh
./b.sh&
wait
ptree $!
ps -u $USER -o pid,ppid,args |grep $!
Running this a.sh will generate no ptree output at all:
b.sh has finished when running ptree, the sleep process is detached from the b.sh process hierarchy.

So how can we track down the 'sleep' process?
We need to use another process attribute: the process group id (pgid).

In the new a.sh I have removed the ptree call (since it won't return anything as shown above) and enhanced the ps command to show also the pgid, this time greping for the pid of a.sh (rather than b.sh as before).
#!/bin/sh
./b.sh&
wait
ps -u $USER -o pid,ppid,pgid,args |grep $$
Output of ps:
18028 32741 18028 /bin/sh a.sh
18030     1 18028 sleep 200
18031 18028 18028 grep 18028
18032 18031 18028 ps -u andreash -o pid,ppid,pgid,args
So the sleep process can be found in the list of processes with pgid 18028 (the pid of a.sh) since all sub processes of a.sh seem to be grouped by pgid.

Happy? Not quite. The next part will show that this solution also might fail.

What if there is no pgid?

The former example does work under certain assumptions only:
you need to run a.sh in a shell which supports pgid creation (csh, ksh), it does not work if you run it in Bourne shell.
(all the examples above were tested in csh, the standard user's working shell in our environment).

sunflower% sh
$ ./a.sh
27103 27099 27098 grep 27099
27099 27098 27098 /bin/sh ./a.sh
$ ps -o pid,ppid,pgid,args|grep sleep
27277 27098 27098 grep sleep
27102     1 27098 sleep 200
What you notice is that the sleep process has pgid 27098 which is also the parent pid of a.sh ie. a.sh did not create its own process group. Searching for processes with pgid equal to the pid of a.sh is futile.

The solution is to write a script which puts its sub processes into a process group of its own, and one way to do it is to use the monitor option of ksh:
set -m
will put b.sh (and all sub processes of b.sh) into a process group with pgid equal to b.sh's pid
ie. again I'm greping for $! (so I reversed the $$ again)

a.sh:
#!/bin/ksh
set -m
./b.sh&
wait
ps -u $USER -o pid,ppid,pgid,args |grep $!
will lead to output of ps:
31103     1 31102 sleep 200
31105 31101 31101 grep 31102

This seemed to me a very nice solution until it dawned upon me how this could fail too.

Recursive use of pgid creation

Using the same technique as described in the last part a sub process can not just detach itself from the process hierarchy but can also create its own process group and thus the original script will have lost track completely.

Replace b.sh by the following code:
b.sh:
#!/bin/ksh
set -m
sleep 200&

Output of a.sh will look like this (just the grep command):
38331 38327 38327 grep 38328
and when you check the 'sleep' process it shows its pid also as pgid:
% ps -o pid,ppid,pgid,args |grep sleep
38329     1 38329 sleep 200

How can such a process be identified as being a grandchild of a.sh?

Up to know I don't have an answer, it seems to me that a process can completely hide its origins and thus cannot be tracked or followed.
(a long time ago I posted the question to comp.unix.shell but didn't receive anything at the time)

If you have wondered throughout the article why do I bother at all?
very often I'm facing the scenario that I have to write script a.sh (i.e. I own it and control what it does) but script b.sh comes from a colleague, different department or even from another company. I need/want to ensure that - if I start other scripts in my script - no processes are left behind when my script ends. This cannot be guaranteed.

Why it is impossible to track all sub processes

Over time I got suggestions to use newtask (and then kill off all processes found by pkill -T taskid) or write a C program and use setsid or a Perl program and use POSIX::setsid to create a new session leader so that basically all child processes are tagged with the same kind of attribute which then can be used to identify them and do something about.

All of these suggestions have the same flaw than the one with pgid which I described above and the following argument should prove that it is impossible to track all sub processes and its sub processes (if the sub processes can be any kind of process and its code is not controlled by you).

Assume that your flavour of UNIX supports a way that you can generate a sub process with a certain attribute which distinguishes the sub process and its offspring from the current process (and possible parent processes).
In the same fashion a sub process of the sub process can use this technique to distinguish itself from the sub process. The current process will find the sub process but it cannot find the sub process of the sub process anymore.

Solutions would be that the OS would restrict the setting of that attribute in way that the current process can set it for sub processes but sub processes of the sub process would be blocked to set that attribute or that processes need to notify their parent processes about attribute changes somehow which is not available/possibly in any of the UNIXes I know.

Summary:
  • a process can track (and kill) all of its sub processes
  • a process can track (and kill) all of a sub process's descendants
    • if the sub process sets a certain attribute equal to its process id
    • if none of the sub process descendants changes that attribute

Even if you think you are in (code) control of all sub processes and their descendants you might not be aware of all side effects: a process might unknowingly start a daemon.
Just envision the calling of gconfd: it will be started if it is not running yet. The process which actually caused the start of gconfd will very likely have no idea that it is there since it is only trying to get a service. That the service required a daemon and that proper cleanup would mean the daemon to be killed and that the daemon maybe services other processes too (and thus should not be killed) are all considerations with no easy answers.

Tuesday, January 17, 2012

OpenOffice.org - copy subtotal cells only

Recently a question was raised how to copy the cells showing sub totals rather than copying data cells and subtotals.
I could not find an easy solution. Below I describe a two-step solution which basically consists of
  • Applying a filter to show only the sub total rows
  • Using Copy / Paste special to get a copy of the sub totals (without the formulas)


The data

Assume you have 2 columns of data like this:

The sub totals

Data -> Subtotals... and then ticking X and OK will result in adding extra rows for sub totals.


This was the starting point of the question being asked.

Applying a filter


Data -> Standard Filter and enter
  • 2 filter criteria for X .*Sum and .*Total in order to capture both the Sum and Total rows
  • Tick Regular expression
  • Tick Copy results to... and enter a cell on the sheet (A16 in this example)
    (this is important: don't copy to another sheet since the formula won't work)


Copy the result

The filter resulted in A16:B20. Two things to note:
  • There are no data rows anymore
  • Column B still contains the formulas


Paste special

Now use Paste special to paste the sub totals into a new position. Deselect Formulas in order to copy the data only. Ensure that everything else is ticked, in particular Numbers.


The result

A copy of the sub totals in D16:E20.
Note that column D does not contain formulas.

How to create a histogram in OpenOffice.org

The recipe below has been tested in OpenOffice 3.0.1.
It is unclear (though expected) that it will work also in newer revisions.

The issue

Suppose you have a set of data, time data in my example, each representing when a certain event has happened. But rather having to digest the detail data you're only interested in high level information like how often did the measured event occur in an hour.

Here's the example data:
12:08
15:36
13:00
14:59
13:59
12:45
15:47
14:29
15:01
So you got a number of events at certain times, unsorted, uncounted.

Assume that these data are in column A in your spreadsheet, maybe labeled Time in the first row.

The goal

A histogram which depicts the frequency of the events per hour like this





The resulting histogram shows
3 events before 13:00
1 event between 13:00 and 14:00
2 events between 14:00 and 15:00
3 events after 15:00

How to get there


Identify the bins


In the example above identifying the bins for the histogram is rather easy: you pick full hours. It is also rather easy to find the minimum and maximum hours.
When your list of data is very long you might not easily see the minimum nor maximum nor might it be obvious how to set the bin intervals, a little trial and error is necessary to get there.

So looking at the data all events are later than 12:00 and none is beyond 16:00, therefore I'm choosing these bins:
13:00
14:00
15:00

I'm entering the bins into column B so that the spreadsheet looks like this now:

Calculate the frequencies


StarOffice contains a FREQUENCY function which takes two arrays as input and also returns an array of results (maybe something one has to get used to. The example will make it clear how to use it.
  • Enter Ticks (or any other describing string) into cell C1
  • Click into cell C2 and click on the functions icon.
  • Out of the list of functions select FREQUENCY
  • Enter your data range and your bin range into the resp. parameter fields so that it looks like this:

The spreadsheet should now look like this:

Create the chart

  • Mark columns B and C (click on B and drag towards C so that both are highlighted)
  • Insert -> Chart...
  • Step 1: leave the chart type at Column
  • Step 2: Data series in columns and First row as label should be ticked, additionally tick also First column as label
  • Step 3: simply click Next
  • Step 4: enter describing strings e.g. Histogram into Title, hours into X axis, frequency into Y axis

You're done.