return to PRS Technologies website
Informix Tech Notes article on Update Statistics |
Overview
This article contains the following information:
If you have a database that is larger than a few megabytes in size, you quickly learn that updating the statistics for the database is absolutely essential to performance. Without the information generated by the UPDATE STATISTICS command, the optimizer has practically no information with which to develop a query plan. The result is a horrendous number of sequential scans and terrible performance. Updating the database statistics is the single most important thing you can do to get the database to function efficiently. Other tuning efforts can improve (or worsen) performance by a few percentage points; updating statistics will improve performance by orders of magnitude.
But you probably already know this. The purpose of this article is not to describe what the UPDATE STATISTICS command does, or to quantify the impact. It is rather to address the problem of how to go about doing it in an efficient mannerefficient for both the database and the DBA.
You can find the procedure recommended by Informix for updating statistics in the Performance Guide for Informix Dynamic Server, Version 7.3, February 1998. The discussion starts on page 10-6. The recommendation for doing a thorough statistical update is as follows:
Based on the sampling techniques used by the UPDATE STATISTICS command, any table with more than 26,570 rows is considered a "very large" table. You may wish to adjust the value used in your installation after testing other values.
This, of course, is the general recommendation. You may find, if you have time to do extensive testing, that a different series of statistics will give better performance for your installation.
In any case, the problem for the DBA is generating all the statements needed to update the statistics for an entire database. In our situation, we have a database with 982 tables and 1,465 indexes. It requires 1,903 UPDATE STATISTICS commands to comply with the recommendations above. If you have a very stable system, you could build all these commands once and then run them as often as necessary. However, if you frequently change the structure of the database by adding or dropping tables and indexes, you would have to modify your massive SQL every time you made a change. Just what you neededmore work!
There is also the problem of run time. It would be nice if updating the statistics did not take too long. But it does if the system is of any size, especially the "high" commands on very large tables. One way to speed up the process is to break the string of SQL statements into multiple files and run the files simultaneously. When I first started using this approach, it cut the run time on our system from about 7 hours to about 3 hours. But there is additional complexity here also: each of the command files should be balanced so that, ideally, they all complete at about the same time. They should also be balanced across the dbspaces to minimize I/O contention. How do you determine which commands to put in which files in order to achive this balance?
The solution presented here is a shell script, an awk program, and an Informix® 4GL program which take care of all of the above complexities for you. They do not produce a perfect solution (and I will note the imperfections below), but the solution is much quicker to obtain and much more flexible than doing the job manually.
In order to create and run multiple SQL command files simultaneously, I took the following approach:
(An easy modification to the 4GL program would be to write the commands to a temporary table instead of a single output file, and then sort the table in stream order and write the individual command files directly from the 4GL program. This would eliminate the awk program.)
The "brain" of the system is the 4GL program. This program reads the system catalog tables to get the necessary information for every table and index in a database.
Three options are available to control the number of streams generated. The first option, and simplest, is to add no qualifiers to the command that executes the program. This will place all UPDATE STATISTICS commands in a single file. The second choice is "automatic" stream generation. One stream or command file will be created for each dbspace that contains a table. All the commands for all the tables in one dbspace will be in one command file. The third option is to specify a fixed number of streams. In this case, the program will optimize the assignment of the commands to the streams based on the balancing criteria discussed above.
If there is any "magic" in this program, it is in the optimization function called when using a fixed number of streams. A cost for each UPDATE STATISTICS command is calculated based on the number of rows in the table using a formula derived from testing on my system. The parameters of the formula may not be accurate on your system, but they provide a starting point.
The program collects all the information needed to create the SQL commands for each table, including the cost, and stores it in temp tables. When all the database tables have been processed, the program reads the temp tables and writes the actual commands to an output file.
Three programs are defined to implement this system: a 4GL program to create the actual UPDATE STATISTICS commands, an awk program to split the list of SQL commands into multiple files, and a shell script which runs the first two programs and then executes the SQL command files.
updstat.sh
This is the shell script that controls the whole process:
#!/usr/bin/ksh # # updstat.sh - Update statistics for a database # # Runs updstat.4ge to generate a list of sql statements that # will properly update the statistics for the specified # database. Splits these statements into one or more command # files, then runs all the files simultaneously. # # ARGUMENTS: # -d dbname database name (required) # <n> number of simultaneous streams to generate # -a auto-generate streams # # Determine if this is a single stream or multistream run # if [ $# -eq 3 ] then single_stream=Y else single_stream=N fi # Get the database name from the arguments # arglist="" while [ $# -gt 0 ] do case $1 in -d) dbname=$2 arglist="$arglist -d $dbname" shift shift ;; *) arglist="$arglist $1" shift ;; esac done # # Set environment # INFORMIXDIR=/usr/informix ONCONFIG=onconfig.bp2 INFORMIXSERVER=bright2 INFORMIXSQLHOSTS=/etc/sqlhosts PATH=$INFORMIXDIR/bin:$PATH export INFORMIXDIR ONCONFIG INFORMIXSERVER INFORMIXSQLHOSTS PATH # Move to work directory and clean up from previous runs # progdir=`pwd` cd /work/informix/updstat if [ -d $dbname ] then cd $dbname rm -f * else mkdir $dbname cd $dbname fi # Build the list of sql commands # (If generating a single stream, the output file will be # "updstat.<dbname>.sql". If generating multiple streams, # the output file will be "updstat.<dbname>.tmp".) # $progdir/updstat.4ge $arglist [ $? -gt 0 ] && exit if [ $single_stream = "Y"] then # # Run single SQL file # dbaccess $dbname updstat.$dbname.sql > updstat.$dbname.out 2>&1 else # # Split commands into multiple command files # (The command files will be named # "updstat.<dbname>.<nnn>.sql" where <nnn> is a sequence # number, plus one additional file named # "updstat.<dbname>.PROC.sql".) # nawk -f $progdir/updstat.awk -v dbname=$dbname updstat.$dbname.tmp # # Run the sql command files simultaneously # for file in `ls updstat.$dbname.???.sql` do outfile=`echo $file | sed 's/sql/out/'` dbaccess $dbname $file > $outfile 2>&1 & done wait # # Run the command file for stored procedures last # dbaccess $dbname updstat.$dbname.PROC.sql > updstat.$dbname.PROC.out 2>&1 fi
Be careful of the number of simultaneous streams that you generate. There may be a limit defined in the UNIX kernel on the number of processes that can be generated by one user. Check with your UNIX system administrator if you get an error indicating too many processes.
The command line parameters of this script are passed to the 4GL program. Within the designated work directory, a subdirectory will be created (if it does not already exist) where all the work files associated with the database being processed are to be placed. This subdirectory will have the same name as the database.
updstat.4ge
The following is the 4GL program that generates the SQL UPDATE STATISTICS commands. In addition to the arguments accepted by the shell script, the 4GL program will also accept an argument of -t <table name>. This facilitates generating the commands for a single table when needed; for instance, after unloading and reloading a table.
The code is divided into three modules: globals.4gl, main.4gl, and build.4gl. globals.4gl contains definitions of the variables used globally throughout the program. main.4gl controls the flow of the processing. build.4gl contains the functions that gather the required information and then create the SQL statements. There are a few comments at the beginning of each module.
globals.4gl
There are some arrays defined here that are sized to handle a fairly large database, but they will not handle every situation. You will need to check the following array sizes to make sure they accommodate your system:
g_colname maximum number of columns in a table g_hi_list maximum number of columns in a table needing HIGH update g_med_list maximum number of columns in a table needing MEDIUM update g_low_list maximum number of columns in a table needing LOW update g_streams maximum number of streams to be generated globals define scratch char(2048), # Work area g_single smallint, # Single table? g_auto smallint, # Auto-generate streams? g_tabid integer, g_tabname char(18), g_nrows integer, g_dbspace char(3), # Dbspace (taken from # partnum) g_colname array[500] of char(18), g_numcols smallint, g_idxrec array[100] of record col1 smallint, col2 smallint, col3 smallint, col4 smallint, col5 smallint, col6 smallint, col7 smallint, col8 smallint, col9 smallint, col10 smallint, col11 smallint, col12 smallint, col13 smallint, col14 smallint, col15 smallint, col16 smallint end record, g_numidx smallint, g_hi_list array[500] of smallint, g_med_list array[500] of smallint, g_low_list array[500] of smallint, g_numhi smallint, g_nummed smallint, g_numlow smallint, g_stream array[200] of record name char(3), cost float end record, g_numstrm smallint, # Number of output streams g_currstrm smallint # Current output file number end globalsmain.4gl
The last line written to the output file is an UPDATE STATISTICS for procedure command. Statistics for procedures are updated after statistics for all tables are updated so that the procedures are optimized using the latest statistical data for any tables that are referenced.
# updstat - Generate update statistics SQL files to update # the statistics for a table or an entire database. # # Written by Rick Baker # iirab@brightpoint.com # Brightpoint, Inc. # Indianapolis, IN # # Must be user informix or root to run. # Written to follow the guidelines found in "Performance # Guide for Informix Dynamic Server", Version 7.3, February # 1998: # # update small tables HIGH. # update MEDIUM DISTRIBUTIONS ONLY all columns that # do not head or differentiate an index. Use resolution 1.0, # .99 for very large tables. Use one command per table. # update HIGH all columns that head or differentiate an # index. Use one command for each column. # update LOW for all columns in a multicolumn index. # ARGUMENTS: # -d <dbname> database name (required) # -t <tabname> table name (optional) # <n> number of SQL files to write (optional) # -a (auto) create one SQL file for each # dbspace containing tables or indexes # # DISCUSSION # # Three different methods of processing the tables are provided. # # 1. Sequential processing. All tables are processed sequentially # in a single SQL command file. # # 2. Multi-stream processing. <n> SQL command files are created. # Each table will require one or more UPDATE STATISTICS # commands. All the commands for a table are sent to one command # file to avoid multiple commands being run against the same # table simultaneously. An attempt is made to balance the workload # for each command file by balancing the "cost" for each table. # "Cost" is based on the number of rows in the table and the # number of columns which must have UPDATE STATISTICS HIGH run # against them. # # 3. Auto-stream generation. An SQL command file is created for # each dbspace which contains tables. The SQL commands for # each table are written to the command file for the dbspace # which contains the table. # # All output from this program is actually written to a single # file. Before each UPDATE STATISTICS command is written to this # file, it is prepended with the number of the SQL command file. # The output file must be read by an awk program and split into # the actual SQL files. # # MODIFICATION HISTORY # # 1/14/99 rab Change auto-stream to put all commands for one # table in the same command file. # Put the "procedures" command in stream 0 and modify # the awk program and shell script to run it after # all other streams have completed. # 1/25/99 rab Change logic for multi-stream: for each table, # evaluate the cost and store cost and commands in # temp tables. Select from the temp tables in order # of cost and assign to streams. (Almost 50% reduction # in overall run time.) # # 3/24/99 rab Add -t option to process a single table. # # 8/19/99 rab Use 7.3 recommendations. # Add nrows to temp table 2 so it is available at # write time. ################################################################## ## database sysmaster globals "globals.4gl" #################################################################### main #################################################################### call init() call process() call finish() end main #################################################################### function init() #################################################################### # # This function: # Gets command line arguments: # database name (required) # optional arguments # Sets the database # Sets lock mode # Sets read mode # Calls function to create temp tables # Calls function to initialize the cursors # Calls function to initialize the cursors in build.4gl # Starts the report # define i smallint, dbname char(18) # Set database and selection criteria let dbname = null let g_single = false let g_auto = false let g_numstrm = 1 for i = 1 to num_args() case when arg_val(i) = "-d" let i = i + 1 let dbname = arg_val(i) when arg_val(i) = "-t" let i = i + 1 let g_tabname = arg_val(i) let g_single = true when arg_val(i) = "-a" let g_auto = true let g_numstrm = 0 otherwise if arg_val(i) >= 1 and arg_val(i) <= 999 then let g_numstrm = arg_val(i) else let scratch = "Invalid argument: '", arg_val(i) clipped, "'. Aborting..." display scratch exit program (1) end if end case end for if dbname is null then display "No database name specified. Aborting..." exit program (1) end if close database database dbname # Set parameters for reading the system tables set lock mode to wait 30 whenever error continue set isolation to dirty read whenever error stop # Create temp tables call create_temp() # Prepare cursors call init_cursors() call init_tmp_cursors() # Check for multiple streams if g_numstrm > 1 then call init_streamlist() end if # Start the report if g_single then let scratch = "updstat.", g_tabname clipped, ".sql" else let scratch = "updstat.", dbname clipped, ".tmp" end if start report report1 to scratch end function # init() ################################################################## function create_temp() ################################################################## # # This function creates the temp tables. # create temp table updstat1 ( level char(1), tabname char(18), colname char(18) ) with no log create temp table updstat2 ( tabname char(18), dbspace char(3), cost float, nrows integer ) with no log end function # create_temp() ################################################################## function init_cursors() ################################################################## # # This function defines all the cursors used in this module. # let scratch = "select a.tabid, a.tabname, a.nrows,", " hex(a.partnum), hex(b.partn)", " from systables a, outer sysfragments b", " where a.tabtype = 'T'", " and a.tabid = b.tabid", " and b.fragtype = 'T'" if g_single then let scratch = scratch clipped, " and a.tabname = '", g_tabname clipped, "'" else let scratch = scratch clipped, " order by tabid" end if prepare systables_spec from scratch declare systables_curs cursor for systables_spec let scratch = "select colno, colname", " from syscolumns", " where tabid = ?", " order by colno" prepare syscolname_spec from scratch declare syscolname_curs cursor for syscolname_spec let scratch = "select part1, part2, part3, part4,", " part5, part6, part7, part8,", " part9, part10, part11, part12,", " part13, part14, part15, part16", " from sysindexes", " where tabid = ?", " order by part1, part2, part3, part4,", " part5, part6, part7, part8" prepare sysindexes_spec from scratch declare sysindexes_curs cursor for sysindexes_spec end function # init_cursors() ################################################################## function init_streamlist() ################################################################## # # This function initializes the stream list. If a specified number # of streams are to be used, the cost field is set to zero for # the requested number of elements. # define i smallint for i = 1 to g_numstrm let g_stream[i].cost = 0 end for end function # init_streamlist() ################################################################## function process() ################################################################## # # This function loops through each table in the database. It calls # functions to get the column name and index data, identify which # columns need to be specified in which SQL statements, determine # the SQL file, and save the data in the temp tables. After all # tables have been processed, a function is called to write the # SQL command file. # define partnum_p char(10), # Partnum from systables partnum_f char(10), # Partnum from sysfragments prev_tabid integer let prev_tabid = 0 # Loop through each table foreach systables_curs into g_tabid, g_tabname, g_nrows, partnum_p, partnum_f # Don't repeat a table if g_tabid = prev_tabid then continue foreach end if # Set the partnum if partnum_p = 0 then let g_dbspace = partnum_f[3,5] else let g_dbspace = partnum_p[3,5] end if # Process small tables separately if g_nrows <= 1000 then call small_table() continue foreach end if # Load the column names call get_colnames() # Load the index definitions call get_indexes() # Build the column lists call build_lists() # Save the data in the temp tables call save_data() # Save the tabid let prev_tabid = g_tabid end foreach # Write the SQL statements call write_stmts() end function # process() ################################################################## function get_colnames() ################################################################## # # This function loads the array g_colname with the names of each # column in the current table. # define colno smallint let g_numcols = 1 foreach syscolname_curs using g_tabid into colno, g_colname[g_numcols] let g_numcols = g_numcols + 1 end foreach let g_numcols = g_numcols - 1 end function # get_colnames() ################################################################## function get_indexes() ################################################################### # This function loads the array g_idxrec with the definition of # each index defined for the current table. # let g_numidx = 1 foreach sysindexes_curs using g_tabid into g_idxrec[g_numidx].col1, g_idxrec[g_numidx].col2, g_idxrec[g_numidx].col3, g_idxrec[g_numidx].col4, g_idxrec[g_numidx].col5, g_idxrec[g_numidx].col6, g_idxrec[g_numidx].col7, g_idxrec[g_numidx].col8, g_idxrec[g_numidx].col9, g_idxrec[g_numidx].col10, g_idxrec[g_numidx].col11, g_idxrec[g_numidx].col12, g_idxrec[g_numidx].col13, g_idxrec[g_numidx].col14, g_idxrec[g_numidx].col15, g_idxrec[g_numidx].col16 # Increment the index to this array let g_numidx = g_numidx + 1 end foreach let g_numidx = g_numidx - 1 end function # get_indexes() ################################################################## function finish() ################################################################### # This function updates statistics for all procedures and then # closes the report file. # if not g_single then # Update statistics for all procedures let g_currstrm = 0 let scratch = "update statistics for procedure;" output to report report1() end if finish report report1 end function # finish()
build.4gl
As mentioned previously, allocating commands to one or more command files can follow one of three methods. The simple method is a single stream. In the auto-stream method, one command stream is created for each dbspace that contains a table. All the commands for all the tables that reside in one dbspace are put in the same stream. The logic behind this is that the UPDATE STATISTICS command reads from the dbspace where the table is located. Putting commands for tables in the same dbspace in the same stream eliminates I/O contention in each dbspace. In addition, if the space used in the dbspaces is reasonably well balanced, the time to execute each of the streams should be similarly balanced. (However, this reasoning does not consider row size. The number of rows in a table is much more important than the total size of the table in determining the time it takes to execute a command.)
The third method uses a fixed number of streams. The number of streams to be used is input by the user on the command line. In this case, the "cost" of executing each command is estimated and the commands are placed into the streams so that the total cost is balanced across the streams. All of the commands for one table are placed together in the same stream in order to prevent two different streams from trying to read the same table at the same time. This method does not take into account the dbspace in which the tables reside. If two tables from the same dbspace were being processed at the same time, there could be some I/O contention.
The "cost" of executing each UPDATE STATISTICS command is computed in formulas that were determined from testing on our system. I timed the commands as they ran against tables with varying numbers of rows, then determined a formula which approximates the data observed. The results will vary on other systems. If you want to have very accurate cost formulas, you will need to do testing on your system and modify the formulas appropriately.
globals "globals.4gl" ################################################################## function init_tmp_cursors() ################################################################### # This function creates cursors to be used with the temp tables. # # Prepare insert statements let scratch = "insert into updstat1 values(?, ?, ?)" prepare ins_updstat1 from scratch let scratch = "insert into updstat2 values(?, ?, ?, ?)" prepare ins_updstat2 from scratch # Prepare select cursors let scratch = "select colname", " from updstat1", " where tabname = ?", " and level = ?" prepare tbl_col_stmt from scratch declare tbl_col_curs cursor for tbl_col_stmt let scratch = "select tabname, dbspace, cost, nrows", " from updstat2", " order by cost desc" prepare cost_stmt from scratch declare cost_curs cursor for cost_stmt end function # init_tmp_cursors() ################################################################## function build_lists() ################################################################### # This function creates the lists of columns in the current table # that need high, medium, and low commands. # Build the lists call build_hi_list() call build_med_list() call build_low_list() end function # build_lists() ################################################################## function build_hi_list() ################################################################### # This function creates the list of columns in the current table # that need HIGH statistics: those that head an index and those # that differentiate an index. # define i smallint # Reset the counter let g_numhi = 0 # Build list of columns that head an index for i = 1 to g_numidx call add_to_hi_list(g_idxrec[i].col1) end for # Add columns that differentiate indexes that have the same # starting column(s). # (Use the table's dbspace). for i = 2 to g_numidx case when g_idxrec[i].col1 != g_idxrec[i-1].col1 exit case when g_idxrec[i].col2 != g_idxrec[i-1].col2 call add_to_hi_list(g_idxrec[i].col2) when g_idxrec[i].col3 != g_idxrec[i-1].col3 call add_to_hi_list(g_idxrec[i].col3) when g_idxrec[i].col4 != g_idxrec[i-1].col4 call add_to_hi_list(g_idxrec[i].col4) when g_idxrec[i].col5 != g_idxrec[i-1].col5 call add_to_hi_list(g_idxrec[i].col5) when g_idxrec[i].col6 != g_idxrec[i-1].col6 call add_to_hi_list(g_idxrec[i].col6) when g_idxrec[i].col7 != g_idxrec[i-1].col7 call add_to_hi_list(g_idxrec[i].col7) when g_idxrec[i].col8 != g_idxrec[i-1].col8 call add_to_hi_list(g_idxrec[i].col8) when g_idxrec[i].col9 != g_idxrec[i-1].col9 call add_to_hi_list(g_idxrec[i].col9) when g_idxrec[i].col10 != g_idxrec[i-1].col10 call add_to_hi_list(g_idxrec[i].col10) when g_idxrec[i].col11 != g_idxrec[i-1].col11 call add_to_hi_list(g_idxrec[i].col11) when g_idxrec[i].col12 != g_idxrec[i-1].col12 call add_to_hi_list(g_idxrec[i].col12) when g_idxrec[i].col13 != g_idxrec[i-1].col13 call add_to_hi_list(g_idxrec[i].col13) when g_idxrec[i].col14 != g_idxrec[i-1].col14 call add_to_hi_list(g_idxrec[i].col14) when g_idxrec[i].col15 != g_idxrec[i-1].col15 call add_to_hi_list(g_idxrec[i].col15) when g_idxrec[i].col16 != g_idxrec[i-1].col16 call add_to_hi_list(g_idxrec[i].col16) end case end for end function # build_hi_list() #################################################################### function build_med_list() ################################################################## # # This function creates the list of columns in the current table # that need MEDIUM statistics: all columns in indexes that are # not already identified as needing HIGH statistics. # define i smallint # Reset the counter let g_nummed = 0 # Build the list for i = 1 to g_numidx call add_to_med_list(g_idxrec[i].col1) if g_idxrec[i].col2 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col2) if g_idxrec[i].col3 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col3) if g_idxrec[i].col4 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col4) if g_idxrec[i].col5 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col5) if g_idxrec[i].col6 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col6) if g_idxrec[i].col7 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col7) if g_idxrec[i].col8 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col8) if g_idxrec[i].col9 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col9) if g_idxrec[i].col10 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col10) if g_idxrec[i].col11 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col11) if g_idxrec[i].col12 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col12) if g_idxrec[i].col13 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col13) if g_idxrec[i].col14 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col14) if g_idxrec[i].col15 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col15) if g_idxrec[i].col16 = 0 then continue for end if call add_to_med_list(g_idxrec[i].col16) end for end function # build_med_list() ################################################################## function build_low_list() ################################################################## # # This function creates the list of columns in the current table # that need LOW statistics: all columns in multicolumn indexes. # define i smallint # Reset the counter let g_numlow = 0 # Build the list for i = 1 to g_numidx if g_idxrec[i].col2 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col1) call add_to_low_list(g_idxrec[i].col2) if g_idxrec[i].col3 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col3) if g_idxrec[i].col4 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col4) if g_idxrec[i].col5 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col5) if g_idxrec[i].col6 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col6) if g_idxrec[i].col7 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col7) if g_idxrec[i].col8 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col8) if g_idxrec[i].col9 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col9) if g_idxrec[i].col10 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col10) if g_idxrec[i].col11 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col11) if g_idxrec[i].col12 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col12) if g_idxrec[i].col13 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col13) if g_idxrec[i].col14 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col14) if g_idxrec[i].col15 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col15) if g_idxrec[i].col16 = 0 then continue for end if call add_to_low_list(g_idxrec[i].col16) end for end function # build_low_list() ################################################################## function small_table() ################################################################## # # This function creates the SQL statement for a small table. # define cost float # Insert record for HIGH command execute ins_updstat1 using "H", g_tabname, "" # Insert cost record let cost = .001 * g_nrows + .2952 execute ins_updstat2 using g_tabname, g_dbspace, cost, g_nrows end function # small_table() ################################################################## function add_to_hi_list(colnum) ################################################################## # # This function adds a column to the list of columns that are to # be updated HIGH if the column is not already in the list. # define colnum smallint, i smallint # Reverse sign if negative (negative column numbers in sysindexes # indicate that the column is sorted descending instead of ascending) if colnum < 0 then let colnum = - colnum end if # Check the current list for i = 1 to g_numhi if colnum = g_hi_list[i] then #Column is already in the list: exit the function return end if end for # Column not found: add to list let g_numhi = g_numhi + 1 let g_hi_list[g_numhi] = colnum end function # add_to_hi_list() ################################################################## function add_to_med_list(colnum) ################################################################## # # This function adds a column to the list of columns that are to # be updated MEDIUM if the column is not already in the HIGH list # and not already in the MEDIUM list. # define colnum smallint, i smallint # Reverse sign if negative (negative column numbers in sysindexes # indicate that the column is sorted descending instead of # ascending) if colnum < 0 then let colnum = - colnum end if # Check the HIGH list for i = 1 to g_numhi if colnum = g_hi_list[i] then # Column is already in the HIGH list: exit the function return end if end for # Check the MEDIUM list for i = 1 to g_nummed if colnum = g_med_list[i] then # Column is already in the MEDIUM list: exit the function return end if end for # Column not found: add to list let g_nummed = g_nummed + 1 let g_med_list[g_nummed] = colnum end function # add_to_med_list() ################################################################## function add_to_low_list(colnum) ################################################################## # # This function adds a column to the list of columns that are to # be updated LOW if the column is not already in the list. # define colnum smallint, i smallint # Reverse sign if negative (negative column numbers in sysindexes # indicate that the column is sorted descending instead of # ascending) if colnum < 0 then let colnum = - colnum end if # Check the current list for i = 1 to g_numlow if colnum = g_low_list[i] then # Column is already in the LOW list: exit the function return end if end for # Column not found: add to list let g_numlow = g_numlow + 1 let g_low_list[g_numlow] = colnum end function # add_to_low_list() ################################################################## function save_data() ################################################################## # # This function saves the info for one table in the temp tables. # The formulas used to compute costs are based on empirical data. # define cost float, i smallint, colno smallint if g_numstrm > 1 then let cost = 0 end if # Insert records for HIGH commands for i = 1 to g_numhi let colno = g_hi_list[i] execute ins_updstat1 using "H", g_tabname, g_colname[colno] end for let cost = cost + (9e-5 * g_nrows - 9.5925) * g_numhi # Insert records for MEDIUM commands for i = 1 to g_nummed let colno = g_med_list[i] execute ins_updstat1 using "M", g_tabname, g_colname[colno] end for let cost = cost + 1e-5 * g_nrows + 27.336 # Insert records for LOW commands for i = 1 to g_numlow let colno = g_low_list[i] execute ins_updstat1 using "L", g_tabname, g_colname[colno] end for let cost = .0001 * g_nrows + .9661 # Insert cost record execute ins_updstat2 using g_tabname, g_dbspace, cost, g_nrows end function # save_data() ################################################################## function write_stmts() ################################################################## # # This function reads the info from the temp tables and creates # the SQL statements. # define cost float # For each record in the cost table foreach cost_curs into g_tabname, g_dbspace, cost, g_nrows #Set the output stream let g_currstrm = set_stream(g_dbspace, cost) #Write the statements call write_medium() call write_high() call write_low() end foreach end function # write_stmts() ################################################################## function write_medium() ################################################################## # # This function writes the updates statistics medium commands for # all columns in the g_med_list. If the table is large, distribu- # tion and confidence are specified. # define first smallint, colname char(18), level char(1) # Start the update statistics statement let scratch = "update statistics medium for table ", g_tabname clipped, " (" let first = true # Append the column names let level = "M" foreach tbl_col_curs using g_tabname, level into colname if first then let scratch = scratch clipped, colname let first = false else let scratch = scratch clipped, ", ", colname end if end foreach if first then # No records found return end if let scratch = scratch clipped, ")" # Check for "large" tables if g_nrows >= 100000 then let scratch = scratch clipped, " resolution 1.00 0.99" end if let scratch = scratch clipped, " distributions only;" output to report report1() end function # write_medium() ################################################################## function write_high() ################################################################## # # This function writes the updates statistics high commands for # all columns in the g_hi_list. # define colname char(18), level char(1) # Process the list of columns let level = "H" foreach tbl_col_curs using g_tabname, level into colname let scratch = "update statistics high for table ", g_tabname if colname is null then # Small table. Do the whole table. let scratch = scratch clipped, ";" else let scratch = scratch clipped, " (", colname clipped, ");" end if output to report report1() end foreach end function # write_high() ################################################################## function write_low() ################################################################## # # This function writes the updates statistics low commands for # all columns in the g_low_list. # define first smallint, colname char(18), level char(1) # Start the update statistics statement let scratch = "update statistics low for table ", g_tabname clipped, " (" let first = true # Append the column names let level = "L" foreach tbl_col_curs using g_tabname, level into colname if first then let scratch = scratch clipped, colname let first = false else let scratch = scratch clipped, ", ", colname end if end foreach if first then # No records found return end if let scratch = scratch clipped, ");" output to report report1() end function # write_low() ################################################################## function set_stream(dbspace, cost) ################################################################## # # This function sets the global variable g_currstrm with the # stream or output file number for the current statement. For # auto_stream, the dbspace provided as a parameter determines the # stream. For fixed multi-stream case, the stream is selected by # determining which stream has the least cost so far. The cost # for the selected stream is updated to include the cost of the # current table. # define dbspace char(3), cost float, # Cost of this statement mincost float, mincost_strm smallint, i smallint # Handle auto-stream if g_auto = true then # See if the dbspace is in the array for i = 1 to g_numstrm if dbspace = g_stream[i].name then return i end if end for # Not found: add to the array let g_numstrm = g_numstrm + 1 let g_currstrm = g_numstrm let g_stream[g_numstrm].name = dbspace return g_numstrm end if # Handle single stream if g_numstrm = 1 then return 1 end if # Handle multi-stream let mincost = g_stream[1].cost let mincost_strm = 1 for i = 2 to g_numstrm if g_stream[i].cost < mincost then let mincost = g_stream[i].cost let mincost_strm = i end if end for # Update the cost let g_stream[mincost_strm].cost = g_stream[mincost_strm].cost + cost return mincost_strm end function # set_stream() #################################################################### report report1() #################################################################### define prepend char(4), len smallint, comma smallint, i smallint output top margin 0 bottom margin 0 left margin 0 right margin 255 page length 1 format on every row # Build the prepend string if g_single then let prepend = "" else let prepend = g_currstrm using "<<<|" end if # Prepend the output file number let scratch = prepend clipped, scratch let len = length(scratch) # If the line is longer than 80 bytes, break it into 80-byte lines while len > 0 if len <= 80 then print scratch clipped exit while else # Find break point let comma = 0 for i = 81 to 1 step -1 if scratch[i] = " " then print scratch[1, i-1] let scratch = prepend clipped, scratch[i+1, len] let len = length(scratch) continue while else if scratch[i] = "," then let comma = i end if end if end for # No space found if comma > 0 then # Break at a comma print scratch[1, comma] let scratch = prepend clipped, scratch[comma+1, len] let len = length(scratch) continue while else # No comma found: break on first space or comma for i = 81 to len if scratch[i] = " " then print scratch[1, i-1] let scratch = prepend clipped, scratch[i+1, len] let len = length(scratch) continue while else if scratch[i] = "," then print scratch[1, i] let scratch = prepend clipped, scratch[i+1, len] let len = length(scratch) continue while end if end if end for end if # Nothing to break on: print the whole line print scratch clipped exit while end if end while end report # report1()
updstat.awk
This is the awk program that splits the output file from the 4GL program into multiple SQL command files. Each line of the output file has a file number followed by a pipe symbol "|" and then the SQL command. The awk program reads the file number and writes the command to the appropriate file.
BEGIN \ { FS = "|"; } { close(outfile); if ($1 == "") outfile = sprintf("updstat.%s.PROC.sql", dbname); else { filenum = $1; # See if this is a new output file found = 0; for (i=1; i<=imax; i++) if (filenum == filelist[i]) { found = 1; break; } if ( found == 0) { # Start a new output file filelist[++imax] = filenum; outfile = sprintf("updstat.%s.%03d.sql", dbname, filenum); print "set lock mode to wait;" > outfile; } else outfile = sprintf("updstat.%s.%03d.sql", dbname, filenum); print $2 >> outfile; } }
A final word of caution: if you load a new table (or an entire database), you will want to run UPDATE STATISTICS LOW for the new or refreshed tables before running this program. The reason is that this program determines the number of rows in each table from the column nrows in the system catalog table systables. A new table will have 0 in this column until some form of the UPDATE STATISTICS command is run against the table. If the program finds 0 in this field, it will treat the table as a "small" table and simply run UPDATE STATISTICS HIGH. You probably do not want to run a HIGH command on every table in a database, or on very large tables.
While this is a fairly complex set of programs, I have found that its benefits far outweigh the cost of setting it up. This program produces a very thorough statistical analysis of the database for the optimizer to use, and the multi-stream options execute much more quickly than any other method that uses only a single process. It is also reasonably easy to revise, whether you want to simply change some of the empirical values or change the whole approach to updating statistics (Informix has been known to change their recommendations on this subject). I hope you will find the program as valuable as I have.
The original idea for this program came from Jerry Van Zant, who now works for Aggreso, Inc. Jerry provided his version of the program to me several years ago. I have modified it extensively since then, including adding the concept of multiple streams, but the basic idea of extracting table information from the system catalog tables and creating the UPDATE STATISTICS commands programmatically was his.
Rick Baker is the Systems and Database Manager for Brightpoint, Inc. in Indianapolis. He has worked with Informix DBMS since 1989. Rick can be reached via email at rick.baker@brightpoint.com.