Recently, I have been learning lots of interesting stuff, not directly related to db2 or database administration. I have been poking around ruby, reading about JSON, and been curious about couchdb. I wanted to test all these things out in a mini project.
Without any practical use in mind, rather than just trying stuff out, I decided to make a db2 analytics tool. Yes, yet another db2 analytics tool, because I also wanted to somehow tie ruby/json/couchdb back to my database administration. Couchdb is a schemaless database with map/reduce making it very strong in analytics. It will be the datastore for the JSON values converted using ruby from a datasource. The datasource in this case comes from db2pd(it could be from get snapshots or from sql tables, db2diag logs, etc).
The workflow basically starts with a db2pd output, which is then converted to JSON, and the stored on couchdb. Converting to JSON makes it very flexible, where it can easily be made into an object for your object oriented programs. It can easily be stored in a plain old database table using an object relational datamapper of somekind. There are a lot of things to can convert it to once its in JSON.
The first db2pd output I will try to do analytics will the the db2pd -appl sampling output. My objective is to determine the connection states(UOW Waiting, Lock-wait, Executing, Compiling, etc) by day and by hour. Maybe determine the breakdown of the statuses by a day of the week or hour of the day. This should be a good way to learn map/reduce.
On the next post, I will try to create a db2pd parser and converter. If there is anyone who is familiar with ruby and want to help me out with tips on how to accomplish this, please don't hesitate to make yourself known in the comments
Thursday, November 12, 2009
Saturday, June 20, 2009
db2avis workload
db2avis is very helpful in recommending indexes. The utility is usually given a single sql statement for which it runs a permutation of what-if scenarios to find the best index.
Giving it a complete workload, not just the select sql, but including the deletes/inserts/updates to consider will make its trade-off calculations more complete.
Here is a simple way of generating a workload file from the dynamic SQL cache. The output can be used to feed the db2advis utility.
db2pd -d sample -dyna | \
awk -f dynamicSQL.awk | \
grep -v "NumRef NumExe Text" | \
grep -v "Dynamic SQL Statements:" | \
grep -v "^$" | \
awk 'BEGIN { FS="0x"; RS="0x"; } {print}' | \
awk 'BEGIN { RS=""; } {print "--#SET FREQUENCY " $7 "\n" substr($0,71) ";\n" }' > workload.out
db2advis -d sample -i workload.out
This should hopefully be a good starting point.
# dynamicSQL.awk can be found on the previous post.
Giving it a complete workload, not just the select sql, but including the deletes/inserts/updates to consider will make its trade-off calculations more complete.
Here is a simple way of generating a workload file from the dynamic SQL cache. The output can be used to feed the db2advis utility.
db2pd -d sample -dyna | \
awk -f dynamicSQL.awk | \
grep -v "NumRef NumExe Text" | \
grep -v "Dynamic SQL Statements:" | \
grep -v "^$" | \
awk 'BEGIN { FS="0x"; RS="0x"; } {print}' | \
awk 'BEGIN { RS=""; } {print "--#SET FREQUENCY " $7 "\n" substr($0,71) ";\n" }' > workload.out
db2advis -d sample -i workload.out
This should hopefully be a good starting point.
# dynamicSQL.awk can be found on the previous post.
Labels:
db2advis,
db2pd,
shell scripting
Friday, June 19, 2009
grep -p (paragrah) on linux using awk
AIX grep has a pretty nifty way of matching a paragraph using grep -p. It does not exist on my linux box, making parsing with ksh and bash a little irritating.
Here is an awk file that will simulate what we have on AIX. (just replace the "Dynamic SQL Statements:" part)
#-- -----
#-- dynamicSQL.awk
#-- -----
BEGIN {
FS="\n";
RS="";
}
/Dynamic SQL Statements:/ { print }
#-- -----
Here is how to get the Dynamic SQL Statements paragraph on a db2pd output.
db2pd -d sample -dynamic sql | awk -f dynamicSQL.awk
or you could do it this way too:
db2pd -d sample -dynamic sql | awk 'BEGIN { FS="\n"; RS=""; } /Dynamic SQL Statements:/ { print }'
producing this output:
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe Text
0xA7C62B50 76 2 1 1 1 1 select count(*) from employee
0xA7C627B0 76 1 0 0 0 0 SET CURRENT LOCALE LC_CTYPE = 'en_US'
parse away!
Here is an awk file that will simulate what we have on AIX. (just replace the "Dynamic SQL Statements:" part)
#-- -----
#-- dynamicSQL.awk
#-- -----
BEGIN {
FS="\n";
RS="";
}
/Dynamic SQL Statements:/ { print }
#-- -----
Here is how to get the Dynamic SQL Statements paragraph on a db2pd output.
db2pd -d sample -dynamic sql | awk -f dynamicSQL.awk
or you could do it this way too:
db2pd -d sample -dynamic sql | awk 'BEGIN { FS="\n"; RS=""; } /Dynamic SQL Statements:/ { print }'
producing this output:
Dynamic SQL Statements:
Address AnchID StmtUID NumEnv NumVar NumRef NumExe Text
0xA7C62B50 76 2 1 1 1 1 select count(*) from employee
0xA7C627B0 76 1 0 0 0 0 SET CURRENT LOCALE LC_CTYPE = 'en_US'
parse away!
Labels:
db2pd,
shell scripting
Wednesday, June 17, 2009
Database Configuration Versioning
The crontab takes care of grabbing a copy of my database manager and database configuration files(db2 dbm and db cfg). A regular "get dbm and get db cfg" in time will generate a lot files. Multiply these by the number of databases and environments and you have a lot more.
Making the scripts smarter will make the number of files manageable. Saving configs only when there are changes should do the trick.
Now how to implement it?
. get cfg (filter the active logs S0000nnn.LOG, not a cfg)
. if there are changes, timestamp and keep file
Using this method, one can call the script manually even if scheduled and the number of files will only grow if there are changes.
This is also very friendly with code versioning tools(cvs/svn/git diff), so you can check them in if you have them.
The timestamp on the file makes it easier to determine what changed when.
scripts:
compare.sh = diff of file, needed by getDBCFG and getDBMCFG
getDBCFG.sh = ./getDBCFG.sh DBNAME
getDBMCFG.sh = ./getDBMCFG.sh
Making the scripts smarter will make the number of files manageable. Saving configs only when there are changes should do the trick.
Now how to implement it?
. get cfg (filter the active logs S0000nnn.LOG, not a cfg)
. if there are changes, timestamp and keep file
Using this method, one can call the script manually even if scheduled and the number of files will only grow if there are changes.
This is also very friendly with code versioning tools(cvs/svn/git diff), so you can check them in if you have them.
The timestamp on the file makes it easier to determine what changed when.
scripts:
compare.sh = diff of file, needed by getDBCFG and getDBMCFG
getDBCFG.sh = ./getDBCFG.sh DBNAME
getDBMCFG.sh = ./getDBMCFG.sh
Labels:
database administration
Tuesday, June 16, 2009
Subscribe to:
Posts (Atom)