Preventing Duplicate Runs In A UNIX Shell Script

There are several ways to prevent a UNIX shell script from running multiple times simultaneously.  This post will describe a couple of ways to accomplish this.  Naturally, the first inclination generally  is to simply use ‘ps -ef’ to look for the presence of your script and exit if a currently executing copy is found.  This has several drawbacks, one of which is the tendency towards confusion when another  process is editing, tailing, or otherwise interacting with the script, yet it’s not really being executed, which is what we’re trying to prevent.  The two methods described here are daemons and PID files.

If there is a script that needs to be executed every minute, such as a monitoring tool, it can be scheduled in cron and the functionality to prevent duplicate runs can be embedded, which we’ll cover later.  Another alternative is to execute the script as a UNIX daemon.  This sounds a lot more complicated that it really is, so let’s dig into it.  First, the script will need to execute in a continuous loop, with some sort of sleep in between runs to prevent it from spinning on a CPU.  This is accomplished by updating the logic inside the script to something like this:

while true
do
  call_your_main_function

  sleep 30
done

Now, add it to the ‘/etc/inittab’ and the script will run continuously, without running over top of itself.  In Linux, edit the ‘/etc/inittab’ file and add a new entry:

newscr:35:respawn:/sbin/runuser -s /bin/ksh oracle -c /path_to_your_script/your_script_name

Where:

newscr    Abbreviated name for your script
35        Linux run levels that should be executing this script (3 & 5)
respawn   Automatically respawn the script if it dies
name      The name of the script to execute, with a fully qualified path.
          Also include something similar to the 'runser' command
          if the script is to be executed by non-root UNIX acccount.

Issue the ‘init q’ command to reload the ‘/etc/inittab’ and the script should start executing as a daemon.  AIX requires the use of the ‘mkitab’ command to create a new entry, but otherwise it works in a similar fashion.  Another advantage of this technique over cron is that the ‘sleep’ setting can be adjusted, thereby running a script on intervals less than one minute.  This would depend on the length of time required to execute the function inside the loop.

Another method for preventing duplicate runs is to use the very common Linux/UNIX concept of a PID file.  This is simply a file, in a known location, that contains the process ID of the executing script.  This file can be used as a lookup device to make sure the script matches the PID and exiting the script if needed.

A basic implementation of this requires very little coding.  At the top of the script, capture the command line that was used when the script was called:

# Save off the current script name with all command-line parameters
if [[ "$*" != "" ]]
then
  CURRENT_SCRIPT="$(print "$_" | awk -F \/ '{print $NF}') $*"
else
  CURRENT_SCRIPT="$(print "$_" | awk -F \/ '{print $NF}')"
fi

Then, name the ‘PID’ file somewhere in the top of the script:

PIDFILE="/var/tmp/myscript.pid"

Add a function that can check the ‘PID’ file against the system:

function already_running {
  # No PID file means it's not running
  if [[ ! -f $PIDFILE ]]
  then
    return 1
  fi

  if [[ $(ps -p $(cat $PIDFILE) -o args | grep -c "${CURRENT_SCRIPT}") -ne 0 ]]
  then
    return 0
  else
    return 1
  fi
}

Before any action occurs in the script, check and exit if it’s already being executed:

# Prevent multiple copies from running
if already_running
then
  # Print the results in an interactive shell
  if [[ -t 1 ]]
  then
    echo "Another copy is already running, exiting."
  fi

  exit 1
fi

Now,  save off the current PID if it wasn’t already running:

# Save off the current PID
echo "$$" > ${PIDFILE}

Hopefully, one of these two techniques will allow your script to implement duplicate-run prevention.

Tagged , , , ,

LDAP Cutover and the ‘/etc/passwd’ File

We are currently testing an LDAP implementation at work and we ran into an interesting scenario.  It turns out that we had a script (written in C), that parsed the ‘/etc/passwd’ file in AIX to ensure that a particular UID belonged to the appropriate user.  Once we cut over to LDAP, this no longer worked since the entries are not kept in ‘/etc/passwd’ for non-system accounts.  To compensate for this change with the least amount of change to the C source, I wrote a Korn Shell script that created the same output as ‘cat /etc/passwd’ using the ‘lsuser’ command in AIX.  This wouldn’t work in Linux, but we only have a need in AIX at this point.  In other scripts, we have been able to convert to the ‘finger -m’ command because we are only interested in users that are currently logged into the box.

To obtain the information from the ‘lsuer’ command, we first obtain a list of all accounts on the box like this:

lsuser | awk '{print $1}'

We then parse through the results of this and query each account individually:

lsuser | awk '{print $1}' | while read NAME; do lsuser -f $NAME; done

Taking this output, it’s fairly straightforward to write a script that creates the same output as ‘cat /etc/passwd’.  The 2nd field is hardcoded in the output as an empty string since the O/S manipulates this value and it’s not important for this exercise.

Once a script had been written that would produce the same output, a change was made to the C source that used a ‘popen/pclose’ instead of an ‘fopen/fclose’.  It took some experimenting but it finally worked, after I remembered that the ‘rm’ command in UNIX requires an ‘-f’ switch if a setuid bit is set on the source binary; otherwise the ‘are you sure?’ prompt requires user input during the execution of the ‘popen’.

Oracle OID to TNS

If you ever need to dump the contents of an Oracle OID (LDAP) server to a tnsnames.ora file, there’s an easy method using an awk script I wrote.  The interesting thing about this script is that the output will be neatly formatted automatically.

The awk script consists of this:

**** Cut here ****

BEGIN {PAD="  ";CNT=0;FS="="}
{OUT=""; PC=1;CNT+=1}
{
  if (CNT == 1)
  {
    O=$2
    sub(",cn","",O)
    print O " = "
  }
  else if (CNT == 2)
  {
    sub("^orclnetdescstring=","",$0)
    {for (i=1;i<=length;i++)
      {
        CHR=substr($0,i,1)

        if (CHR == "(")
        {
          PC+=1

          WRK=OUT
          sub(/ */,"",WRK)
          if (WRK != "")
            printf OUT "\n"

          OUT=""
          for (p=1;p<PC;p++)
            OUT=PAD OUT

          OUT=OUT CHR
        }
        else if (CHR == ")")
        {
          OUT=OUT CHR
          printf OUT "\n"
          PC-=1
          OUT=""

          for (p=1;p<PC;p++)
            OUT=OUT PAD

        }
        else
          OUT=OUT CHR
      }
    }
  }
  else if (CNT == 3)
    CNT=0
}

**** Cut here ****

To use this awk script, paste everything  in between the ‘**** Cut here ****’ lines into a file and name it whatever you like, for example ‘oid2tns.awk’.  Once we have the script, we need to query the OID server and pass along the output.  This will require a server that has OID installed; in our case we use the OID built into OAS.  If you have your infrastructure and MT built on the same box, make sure you set the environment for infrastructure.

Here’s the syntax to combine the OID query and the awk script:

$ORACLE_HOME/bin/ldapsearch -h [SubYourOID] -b '' -s sub '(objectclass=orclNetService)' orclNetDescString | awk -f oid2tns.awk
Tagged

Data Dictionary Query

We have been working through all of the SQL with high buffer-gets in our Banner databases and we came across a data-dictionary query that is performing 300 million per day:

SELECT a.column_name
 FROM all_tab_columns a, all_ind_columns b
 WHERE a.table_name = b.table_name
 AND a.column_name = b.column_name
 AND a.owner = b.index_owner
 AND b.table_name = :1
 AND b.index_owner = :1
 AND b.index_name =
 (SELECT constraint_name
 FROM all_constraints
 WHERE table_name = :1
 AND owner = :1
 AND constraint_type = :1
 AND ROWNUM = 1)

So, we figured out that this query was coming from a forms library file (*.pll) and it’s used on a few forms that support user-defined columns.  This is similar to flex-fields in ERP.  Unfortunately, this query uses rule-based optimization so it can be a challenge to tune.  After some research, it was determined that this query could be rewritten to force cost-based optimization with a slight tweak:

SELECT a.column_name
 FROM all_tab_cols a, all_ind_columns b
 WHERE a.table_name = b.table_name
 AND a.column_name = b.column_name
 AND a.owner = b.index_owner
 AND b.table_name = :1
 AND b.index_owner = :1
 AND b.index_name =
 (SELECT constraint_name
 FROM all_constraints
 WHERE table_name = :1
 AND owner = :1
 AND constraint_type = :1
 AND ROWNUM = 1)

This query now performs 10,000 buffer-gets per run!  At least we can tune it knowing the cost.  Next, we added some hints:

SELECT /*+ NO_CPU_COSTING */ a.column_name
 FROM all_tab_cols a,
 all_ind_columns b
 WHERE a.table_name = b.table_name
 AND a.column_name = b.column_name
 AND a.owner = b.index_owner
 AND b.table_name = :1
 AND b.index_owner = :1
 AND b.index_name = ANY (SELECT constraint_name
 FROM all_constraints
 WHERE table_name = :1
 AND owner = :1
 AND constraint_type = 'P'
 AND ROWNUM = 1)

We finish this off by adding some sys stats:

EXEC DBMS_STATS.GATHER_DICTIONARY_STATS;

At last!  This new, improved query creates 8 buffer-gets per execution.  A pretty good improvement for such a tiny fix.  We updated the ‘.pll’, recompiled all the forms, and we’re ready to move on to the next challenge.

Tagged , , , , , ,

Quest SQL Optimizer

We recently purchased licenses for the Quest tool SQL Optimizer.  This is a highly recommended piece of software if you can spare the money, since it easily allows different query strategies to iteratively  be tested on a problematic piece of SQL.  Everything was going just fine until I increased the optimizer and index  intelligence levels both to the max!  After the tool automatically  tested about different 40 different scenarios, it appeared to lose its mind slightly.  After trying to kill it gracefully, with no luck, I killed it using the Windows Task Manager.  I quickly forgot about it since I was leaving for the day.

The next day I started up SQL Optimizer and it looked different.  I could no longer enter in a query.  The screen looked almost blank.  So, I uninstalled and reinstalled with no luck.  I figured it automatically saved session information, and it turned out I was correct.  So, to solve this problem I did this:

  • Uninstalled SQL Optimizer.
  • Deleted the complete directory structure from this point down: ‘c:\Documents and Settings\[MyLogin]\Application Data\Quest Software\Quest SQL Optimizer for Oracle’.
  • Reinstalled SQL Optimizer.

Hopefully this will save you some time if you ever encounter this same type of situation.

ETA:  I’ve encountered this same scenario several more times, but I have figured out that the product doesn’t need to be uninstalled.  Just delete the directory and SQL Optimizer will recreate it once it relaunches.

Tagged , ,

Utilizing Large Memory Pages for Oracle on AIX

Recently, we were upgrading to ERP R12 and decided to utilize large memory pages in AIX since we had 196gb available during the upgrade and large memory pages would gain significant throughput.  There were a couple of lessons learned along the way, though, even if we had already successfully implemented this strategy in several AIX 10gR1 databases.

For the record, Metalink has a pretty good article (Note ID 372157.1) on setting up your system for large memory pages so there’s no need to regurgitate the entire thing, but here are the steps in a nutshell.  Be forewarned that Oracle Support will probably fight you every step of the way since this process negates the need for AMM, but 23% performance gains on our batch processes speaks for itself.

  1. Configure the AIX large page pool by calculating the number of large pages required for the SGA:
    • num_of_large_pages = INT((total_SGA_size-1)/16MB)+1
    • As root:
      • vmo -p -o lgpg_regions=num_of_large_pages -o lgpg_size=16777216
      • vmo -o lru_file_repage=0
    • Bear in mind that the PGA is not part of the SGA and will not utilize the large memory pages.  If not specified in the init file, the PGA defaults to 20% of the size of the SGA.
    • We always configure more large memory pages than we need.  If the box needs excessive pages for small memory segments it will effectively ‘eat’ pages to provide the minimum small segments.  This can be a cpu-intensive exercise so it’s best to avoid it altogether.
  2. Enable the Oracle UNIX account to access large memory pages.
    • As root, check the current settings for the user account:
      • lsuser -a capabilities oracle
    • Add the ‘CAP_BYPASS_RAC_VMM’ and ‘CAP_PROPAGATE’ if needed:
      • chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle
    • We encountered a bug in Oracle 10.0.4.2 that prevented the database from using large memory pages.  There are two notes (372157.1 & ID 728444.1) that document the workarounds.  As oracle, edit the XCOFF file header in the oracle binary to enable it to use large page data and set an environmental variable:
      • export ORACLE_SGA_PGSZ=16m
      • cd $ORACLE_HOME/bin/oracle
      • cp –p oracle oracle.orig
      • ldedit -b lpdata oracle
  3. Configure the SGA to best utilize large memory pages.
    • Set ‘pre_page_sga=false’ so that the instance doesn’t timeout trying to pre-page all of those large memory pages.
    • Set ‘lock_sga=true’ so that you get a contiguous piece of memory that can’t be taken by another process on the box.
    • Set ‘sga_target=0’ so that automatic memory management is turned off.  There is no need to dynamically resize memory settings when a large SGA is in use.  By the time the instance figures out what to resize, it’s too late.
    • Set ‘db_block_hash_latches=32768’ to take advantage of the extra memory pages.
  4. Verify the use of large memory pages:
    • Get the PID of the ‘pmon’ process:
      • PID=$(ps –ef | grep pmon | grep ${ORACLE_SID})
    • Look at the memory usage of the PID:
      • svmon –P ${PID}
-------------------------------------------------------------------------------
Pid Command          Inuse      Pin     Pgsp  Virtual 64-bit Mthrd  16MB
7603124 oracle        13461204 13381632        0 13409786      Y     N     Y
PageSize                Inuse        Pin       Pgsp    Virtual
s    4 KB               56900          0          0       5482
m   64 KB                1929        512          0       1929
L   16 MB                3265       3265          0       3265
Vsid      Esid Type Description              PSize  Inuse   Pin Pgsp Virtual
1ef68ef  7000001a work default shmat/mmap           L     16    16    0      16
1ab3aab  70000012 work default shmat/mmap           L     16    16    0      16
1c14ec1  70000059 work default shmat/mmap           L     16    16    0      16
18f308f  70000080 work default shmat/mmap           L     16    16    0      16
      • svmon -G
size       inuse        free         pin     virtual   mmode

memory     20971520    19946466     1025054    15275762    19163953     Ded
pg space   25165824       40491
work        pers        clnt       other
pin        14627122           0           0      644544
in use     19163953           0      778417
PageSize   PoolSize       inuse        pgsp         pin     virtual
s    4 KB         -     5085122       40491      831554     4306705
m   64 KB         -       92738           0       66667       92738
L   16 MB      3266        3265           0        3266        3265

Korn Shell Functions

Korn Shell functions are an easy way to break larger Korn Shell scripts into more manageable chunks.  You have the option of using the function to return a true/false value as well as normal number or string output.  The basic format of a function in the Korn Shell is like this:

function test_function {

typeset local_variables

body_of_function

}

It’s always a good idea to declare your local variables in a function using the ‘typeset’ command so they don’t inherit values from calling functions or scripts.  The command-line parameters become localized in a function so ‘$1’ isn’t the same as the ‘$1’ that was passed to the originating script.  If you want to pass parameters to a function, call it with the values like this: “test_function parameter1 parameter2”.  If you want to pass on the original values passed to the script on the command line, pass them to the function like this: “test_function $1” or to pass all of the parameters, like this: “test_function $@”.  Please note use of the ‘shift’ statement will clear out the command line switches so this will not work if the command line switches have been parsed and removed from the environment.

The returning of true/false values from functions is counter-intuitive when you are more familiar with other languages.  In Korn Shell, a value of zero passed back from a function equals ‘true’ and a value of non-zero equals ‘false’.  So, here’s how this plays out:

function test_name {

typeset FIRST=”$1”

typeset LAST=”$2”

if [[ “$FIRST” = “$LAST” ]]

then

return 0

else

return 1

fi

}

read FIRST?”Enter first name: “

read LAST?”Enter last name: “

if test_name “$FIRST” “$LAST”

then

print “First name equals last”

else

print “First name does not equal last”

fi

Now, when this script is executed, it asks for a first and last name and indicates if they are the same.  This is not very useful as a script but it illustrates the true/false return quite nicely.  It should be noted that functions return the value of the last statement executed.  The ‘return’ statement exits the function as well as providing the appropriate feedback.  Also, note that double-quotes were used to supply the parameters to the function.  This ensures that the entire variable is passed as “$1” or “$2” instead of interpreting spaces as a separator in the call.

Here’s an example of a function returning a string value instead of Boolean:

function full_name {

typeset FIRST=”$1”

typeset LAST=”$2”

print “$FIRST $LAST”

}

read FIRST?”Enter first name: “

read LAST?”Enter last name: “

print “Full Name=$(full_name“$FIRST” “$LAST”)”

This brings up another interesting point.  You can call a function using the old-style of the grave symbols:

print “Full Name=`full_name“$FIRST” “$LAST”`”

Or, you can use the non-deprecated form presented in the previous example.

One other concern is how to successfully call Boolean functions in an ‘if’ call.  Here’s a simple example:

if [[ test_name “henry” “henry” && “$ME” = “henry” ]]

then

print “hi henry”

fi

This won’t work!  The Boolean and the non-boolean portions need to be separated.  So, here’s the correct way to combine the two:

if test_name “henry” “henry” && [[“$ME” = “henry” ]]

then

print “hi henry”

fi

Hopefully this blog entry has encouraged the use of functions in shell scripting.  Thanks.

Awk Tips

Return all fields to the right of a particular field:

echo “testing|1|2|3|4″ | awk -F \| ‘{for (i=3;i<NF+1;i++) o = (o FS $i)} END {print substr(o,2)}’

or

echo “testing|1|2|3|4″ | awk -F \| ‘{print substr($0, index($0,$3)) }’  (See original link)

results in: ’2|3|4′ or starting with the third field, all fields to the right.

To format an integer with comma-separators:

echo “123456″ | awk ‘{for (i=1;i<length($0)+1;i++) if ((!(i%3)) && (i!=length($0))) o = (“,” substr($0,length($0)-i+1,1) o); else o = (substr($0,length($0)-i+1,1) o)} END {print o}’

results in ’123,456′

Oracle RDBMS Tips

Two-phase commit views:

dba_2pc_neighbors

dba_2pc_pending

To kill all non-local database connections in UNIX:

ps -ef | grep LOCAL=NO | grep ${ORACLE_SID} | awk ‘{print $2}’ | while read pid; do kill -9 $pid; done

Follow

Get every new post delivered to your Inbox.