Quantcast
Channel: Pardy DBA
Viewing all 64 articles
Browse latest View live

EM12c “Supplied date is in the future” event post-DST fixed w/agent clearstate and restart

$
0
0

Since about an hour before the transition from daylight savings time back to standard time, I’ve noticed some events showing up in the incident manager that make it appear as though my agents are time traveling.

The events are showing up as metric evaluation errors on database instance and agent targets stating, for example: “Row(8): Supplied date is in the future : now = Mon Nov 05 08:42:41 EST 2012 supplied value = Mon Nov 05 09:38:42 EST 2012″. Sometimes the times reported are exactly one hour apart: “Row(1): Supplied date is in the future : now = Sun Nov 04 01:59:52 EDT 2012 supplied value = Sun Nov 04 01:59:52 EST 2012″. I see these from the repository database instance, several monitored database instances, and four different agents.

I’m seeing cases where a monitored database instance has this event and the monitoring agent does NOT, and I’m also seeing cases where both a monitored database and the agent performing the monitoring both show the event.

It seems that stopping the agent, running an emctl clearstate agent, then starting up the agent on each affected server will, after a few minutes, cause these events to clear. I only noticed this right when it started since I was logged in to take down SAP instances that can’t handle the time change, I wasn’t expecting to hit a related problem with EM12c.

emctl stop agent ; emctl clearstate agent ; emctl start agent

If you’re impatient, you can force collection of the EMDStatus metric collection to speed things up in clearing the events:

emctl control agent runCollection agenthost.domain.com:3872:oracle_emd EMDStatusCollection


Filed under: Cloud Control

EM12cR2 ORA-20233: Agent targets cannot be directly blacked out

$
0
0

Here’s an annoying change Oracle implemented in EM12cR2. You can no longer directly black out an agent target through the OEM GUI interface. Attempting to do so will result in an unhelpful error message on the GUI stating simply: Error editing the blackout “Blackout-Jan 14 2013 10:13:59 AM”. This worked in EM12cR1 and I made use of this functionality very frequently, and I’ve only now dug in a bit to figure out why.

Apparently Oracle has decided to remove the ability to black out an agent target. You couldn’t tell that from the error on the screen, but OEM follows up that error by generating an incident and a problem, and picking through the incident packaging directory you can see the actual error message:

[2013-01-14T10:12:43.245-05:00] [EMGC_OMS1] [ERROR] [EM-02916] [oracle.sysman.eml.admin.rep.blackout.BlackoutEvents] [host: omshost.domain.com] [nwaddr: x.x.x.x] [tid: EMUI_10_12_43_/console/admin/rep/blackout/blackoutConfig] [userId: USERNAME] [ecid: 004oo4JeaWFDwW95zf8DyW0007vt0004WR,0:1] [APP: emgc] [LOG_FILE: /oracle/oem/Middleware12cR2/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/sysman/log/emoms.log] [URI: /em/console/admin/rep/blackout/blackoutConfig] Submit of Blackout Failed.[[
java.sql.SQLException: ORA-20233: Agent targets cannot be directly blacked out
ORA-06512: at “SYSMAN.MGMT_BLACKOUT_ENGINE”, line 2368
ORA-06512: at “SYSMAN.MGMT_BLACKOUT_ENGINE”, line 2395
ORA-06512: at “SYSMAN.MGMT_BLACKOUT”, line 39
ORA-06512: at “SYSMAN.MGMT_BLACKOUT_UI”, line 1403
ORA-06512: at line 1

It seems like it was a late decision to remove this functionality since it’s still a bit rough around the edges. If you’re logged in as SYSMAN and attempt to create a blackout against an agent target, the “Search and Select: Targets” screen will not even contain ‘Agent’ in the drop down selection list for target type. But if you follow Oracle’s security recommendations and login as a regular user account rather than SYSMAN, you will still see ‘Agent’ available in the selection list, but attempting to use it throws the error and incident described above.

The workaround appears to be to select a HOST target to blackout, rather than an agent. This seems to replicate the functionality (black out everything on that host) that was achieved previously by blacking out an agent.


Filed under: Cloud Control

EM12cR2 PSU1 (12.1.0.2.1) Patch 14840279 Now Available

$
0
0

I just noticed that the first PSU for EM12cR2 is out. It’s under patch number 14840279, and gives us a new version for the EM12cR2 setup: 12.1.0.2.1.

I’ve applied this patch without any trouble. I did so on top of the Dec 2012 performance bundle (patch 14807119). The PSU is a superset of the performance bundle so some patches were rolled back, but everything applied cleanly and my OMS came up fine. It also includes the fix for the EM_JOB_METRICS issue I posted about before so if you aren’t comfortable applying one off patches and have been tolerating the increased redo while waiting for a bundled patch, this PSU is for you.

The only minor issue I had was in the post-patch application step when running post_deploy.sh. The file wasn’t executable so I simply had to chmod +x post_deploy.sh before running ./post_deploy.sh.


Filed under: Cloud Control

Benchmarking filesystem performance for Oracle using SLOB

$
0
0

Update: Please see Part 1.5 for updated scripts and part 2 for results.

This post will cover techniques I’ve used to run SLOB (see http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/) to benchmark the performance of various filesystems and raw devices when used for Oracle datafiles and redo logs. I will not write about the actual results, since they’re only relevant to my specific hardware and software setup (spoiler alert: raw wins), but instead discuss the methodology and database parameters and configurations I used in hopes that they will help others to run their own tests, or that others will notice flaws I can remedy to produce more correct results. Likewise, this post is not about specific tuning parameters for individual filesystems, but instead about a way to run the tests to compare performance from one set of tuning parameters to another.

After you have downloaded and installed SLOB, you need to get a good idea about exactly what you want to test and which metrics you will look at to determine the results. In my case we have been having some OS issues resulting in unclean shutdowns that lead to long fsck times on our Oracle data volumes, so I am investigating alternative filesystems to try to find something with performance matching or beating ext3 that won’t be subject to fsck on boot. I also chose to run tests to compare the redo logging subsystem’s performance when redo is stored on various filesystems or raw devices.

So I am running three different tests:

  1. Pure read-only I/O performance
  2. Concurrent read-write I/O performance
  3. Write-only redo I/O performance

For each test I first needed to identify an appropriate metric. For read-only performance the obvious choice is physical reads per second. For concurrent read-write performance I measured the sum of physical reads per second and physical writes per second. For redo performance I measured redo generated per second.

After selecting your metrics you next need to determine how to configure the database to make sure you are testing what you wanted to test. To that end, I configured the database as described below. If you want to use SLOB to test other aspects of performance you need to monitor the wait events noted in your AWR reports to be sure that the database isn’t doing something you don’t really want to test. For example, if you are running a ton of SLOB tests overnight and the autotask window kicks in and the SQL Tuning Advisor or Segment Advisor start running, those will skew your results so you may wish to disable those tasks.

Setting Up Test Parameters

Each test requires a different set of initialization parameters and other configuration to isolate the desired variable (filesystem or raw device, in my case). I think the parameters I used are valid for the tests I ran, but I am very interested in any comments from others. For each of the various filesystems I wanted to test, the storage admin created a LUN, mounted it to the server, and created the filesystem (or configured the raw device). I put a separate tablespace on each LUN, each containing a single 10GB datafile.

Read-Only I/O Performance

This is the easiest item to force. I want the database to fulfill read requests from disk rather than from the buffer cache, so I simply took my existing baseline configuration (as used for our SAP systems) and set db_cache_size to 64M. With such a small buffer cache only minimal data will be cached, and the majority of reads will come from disk. You can confirm this in the SLOB-generated AWR report by verifying that the number of physical reads per second is relatively close to the number of logical reads per second. For example, if you show 20,000 logical reads per second and only 100 physical reads per second, you haven’t done it right as most gets are coming from cache. You may need to lie to Oracle about your CPU_COUNT to get your buffer cache small enough to force physical read I/O.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user read sessions and zero writers.

Concurrent Read-Write I/O Performance

This one took a lot of effort for me to come up with a setup that measured what I wanted. Initial efforts showed significant waits on log buffer space or cache buffers chains and a lot of redo generation without corresponding physical writes so I had to tweak things until I found a setup that (mostly) produced AWR reports with “db file sequential read”, “db file parallel read”, “write complete waits” and “DB CPU” within the top waits.

I eventually settled on a db_cache_size of 128M to force read I/O to physical while allowing writes to occur without waiting on cache buffer chains. I set log_buffer=1073741824 to reduce log buffer waits on writes, though Oracle seems to have limited the log buffer to 353,560K as reported in the AWR reports. I created three 64M redo logs on raw devices and ran the database in NOARCHIVELOG mode to force frequent checkpoints and corresponding physical writes to datafiles and exclude any I/O spent on archiving redo logs.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user sessions, with half of them as readers and half of them as writers.

Write-Only Redo I/O Performance

For this test I wanted to purely measure the amount of redo generated while excluding datafile checkpoints. I set db_cache_size to 32G to allow reads to come from cache and created three 4G redo log groups on each tested filesystem, intending the full test block to run without causing a log switch that would force a checkpoint.

To run these tests in SLOB, I used 8, 16, and 32 concurrent user write sessions, with zero readers.

A Harness For SLOB

Setting up each SLOB run involves running the provided setup.sh script to create the users, schemata and data for each test, then running the provided runit.sh script with parameters to indicate the desired number of readers and writers. For example, to use 32 different users with their data in tablespace TABLESPACENAME, and then run SLOB with 16 writers and 16 readers, you would run:

./setup.sh TABLESPACENAME 32 ; ./runit.sh 16 16

After the run SLOB will produce an AWR report which you should review to see the results of your test run. SLOB also produces a drop_users.sql script to clear out the users generated by setup.sh, and you should run that script and re-create your users anytime you change the user count. A benchmark run only once has no validity, and a benchmark with nothing to compare to is useless, so you’ll want to create some scripts to run SLOB repeatedly, saving the AWR reports in between so that you can review the overall results.

Here is the content of the script I used to automate individual SLOB runs, with discussion after:

#!/bin/sh
# harness to run SLOB

FS=$1
NUMUSERS=$2
NUMWRITERS=$3
NUMREADERS=$4
RUNNUM=$5
RUNDIR="run${RUNNUM}"

echo "Dropping users and bouncing DB"

sqlplus -s / as sysdba<<EOF
@drop_users
alter system switch logfile;
alter system checkpoint;
shutdown immediate;
exit;
EOF

rm /oracle/SLOB/arch/SLOB*dbf

sqlplus -s / as sysdba<<EOF2
startup;
exit;
EOF2

echo "Setting up users for $FS $NUMUSERS"

./setup.sh $FS $NUMUSERS

echo "Running SLOB: $FS $NUMUSERS $NUMWRITERS $NUMREADERS (run $RUNNUM)"

./runit.sh $NUMWRITERS $NUMREADERS

echo "Renaming AWR report"

mv awr.txt $RUNDIR/SLOB-AWR-$FS-$NUMUSERS-$NUMWRITERS-$NUMREADERS-$RUNDIR.txt

To provide a consistent environment for each test, I drop the previous test’s users, force a redo log switch and checkpoint, and then bounce the database. I also remove any archived logs generated by the previous run so I don’t run out of space (this is a test DB I don’t need, don’t do that on any DB you can’t afford to lose). This script takes five arguments: the tablespace name, the number of concurrent users, the number of writers, the number of readers, and the run number for each combination. I’ve named my tablespaces after the filesystem (“BTRFS”, “EXT3″, “EXT4″, “XFS”, “RAWTS”, and so on) so when it calls setup.sh the user data is created within the appropriate tablespace on the filesystem to be tested. The last line renames SLOB’s awr.txt to reflect the filesystem, user count, number of readers/writers and run number.

I save this script as mydoit.sh and create another script to call it repeatedly. This example is for the read-only testing:

#!/bin/sh
echo "Starting runs for 8 readers"
./mydoit.sh EXT3 8 0 8 1
./mydoit.sh EXT4 8 0 8 1
./mydoit.sh BTRFS 8 0 8 1
./mydoit.sh XFS 8 0 8 1
./mydoit.sh RAWTS 8 0 8 1

./mydoit.sh EXT3 8 0 8 2
./mydoit.sh EXT4 8 0 8 2
./mydoit.sh BTRFS 8 0 8 2
./mydoit.sh XFS 8 0 8 2
./mydoit.sh RAWTS 8 0 8 2

./mydoit.sh EXT3 8 0 8 3
./mydoit.sh EXT4 8 0 8 3
./mydoit.sh BTRFS 8 0 8 3
./mydoit.sh XFS 8 0 8 3
./mydoit.sh RAWTS 8 0 8 3

echo "Starting runs for 16 readers"
./mydoit.sh EXT3 16 0 16 1
./mydoit.sh EXT4 16 0 16 1
./mydoit.sh BTRFS 16 0 16 1
./mydoit.sh XFS 16 0 16 1
./mydoit.sh RAWTS 16 0 16 1

./mydoit.sh EXT3 16 0 16 2
./mydoit.sh EXT4 16 0 16 2
./mydoit.sh BTRFS 16 0 16 2
./mydoit.sh XFS 16 0 16 2
./mydoit.sh RAWTS 16 0 16 2

./mydoit.sh EXT3 16 0 16 3
./mydoit.sh EXT4 16 0 16 3
./mydoit.sh BTRFS 16 0 16 3
./mydoit.sh XFS 16 0 16 3
./mydoit.sh RAWTS 16 0 16 3

# and so on for 32 concurrent sessions...

After this script runs I have a pile of AWR reports to review and compare their read performance. For each metric I’m interested in I saved the results from each run into a spreadsheet and generated average results for each set. That gives me an average number across three runs for each filesystem’s physical read I/O per second for 8 concurrent users, 16 concurrent users, and 32 concurrent users. Similar scripts run for the redo-only testing and for the read/write testing, with the results logged in the same way. I then generate charts within the spreadsheet to visually compare the results. Some performance trends are very obvious visually and we are already following up with our OS vendor for some more information.


Filed under: Database

Oracle 11.2.0.3 and up now certified on Oracle Linux 6 (note 1304727.1)

$
0
0

According to MOS note 1304727.1, Oracle Database 11.2.0.3 and up is now certified on Oracle Linux 6.x!  I’ve been waiting for this.


Filed under: Database

Benchmarking filesystem performance for Oracle using SLOB (Part 1.5): Improved test scripts

$
0
0

Minor update 20130207: Adjusted read-write testing script for three redo logs, not two.

In Part 1 I described a starter methodology for using SLOB to benchmark filesystem performance for Oracle. After many SLOB runs and much valuable advice from Kevin Closson, I now have an improved method. My next post will contain the results of these tests, with all numbers scaled against RAW performance for the sake of comparison. Since my hardware and network don’t match yours, my specific numbers aren’t very relevant; but for a DBA or SA considering changing the filesystem they use for Oracle without a priori knowledge of how to best tune it, this will hopefully present a useful baseline.

Previously I was bouncing the database, dropping and recreating the SLOB users before every test run. Now I am initializing the test batches for each filesystem/volume by bouncing the database, creating the users and executing a throwaway SLOB run to prime the database.

Test Scripts

Below are the scripts I am using to test read-only, redo generation, and read-write performance. If you use any of them, edit the scripts to reflect your base directory (the oracle user’s home), SLOB install directory and desired location for SLOB AWR output. Before running any of the scripts, prepare your environment by creating a tablespace on each filesystem/device type that you wish to test. Name the tablespace something that will make it clear to you which filesystem you are testing. I used a single 16GB datafile for each tablespace. Each script runs SLOB setup.sh to create the users and then SLOB runit.sh to execute the tests. After an initial throwaway run to warm things up, they will run SLOB three times each with 8, 16, and 32 sessions. You can adjust the session count by changing the for loops. At the end of each script the awr.txt file generated by SLOB is renamed and moved to the $SLOBAWR directory, using the filename convention expected by the awr_info.sh script (found in the latest SLOB release.) All tests are performed in NOARCHIVELOG mode, don’t do that on a database that you need. The scripts assume that you are executing them from within the SLOB directory.

Read-Only Performance

Before running the read-only test, I created two 32G redo logs on raw devices. The read-only test should not hit redo at all so where you place them doesn’t really matter. For read-only testing you should use a very small db_cache_size, I used 64M. The small cache will make sure that reads are fulfilled through physical I/O rather than cached.

#!/bin/sh
#
# Usage: ./testread.sh FS RUNCOUNT
#
# Script assumes a tiny SGA to force physical reads
# Script will create SLOB schema users in tablespace named $FS, and run each test $RUNCOUNT times

TYPE=$1
NUMRUNS=$2
BASEDIR=/oracle/SLOB
SLOBDIR=$BASEDIR/SLOB
DROPUSERS=$SLOBDIR/drop_users.sql
AWRDIR=$SLOBDIR/SLOBAWR
COUNTER=1

if [ -z "$TYPE" -o -z "$NUMRUNS" ]; then
    echo "Usage: $0 FS RUNCOUNT"
    exit 1
fi

mkdir $AWRDIR >& /dev/null

echo "Starting SLOB read-only performance testing for $TYPE ($NUMRUNS runs)"

echo "Dropping existing users and bouncing database"

sqlplus -s / as sysdba<<EOF
@$DROPUSERS;
shutdown immediate;
startup;
exit;
EOF

echo "Setting up SLOB user schemata"
$SLOBDIR/setup.sh $TYPE 32

echo "Performing N+1 logswitch"
sqlplus -s / as sysdba<<EOF2
alter system switch logfile;
alter system switch logfile;
alter system switch logfile;
exit;
EOF2

echo "Throwaway SLOB run to prime database"
$SLOBDIR/runit.sh 0 8

for i in 8 16 32; do (
    while [ $COUNTER -le $NUMRUNS ]; do
        echo "Running SLOB for $i readers (run #$COUNTER)"

        $SLOBDIR/runit.sh 0 $i
        echo "Renaming AWR report"
        mv awr.txt $AWRDIR/SLOB-AWR-$TYPE-$COUNTER.0.$i
        COUNTER=$((COUNTER+1))
    done )
done

Redo Performance

Before running the redo generation test, I created two 32G redo logs the filesystem whose performance I wish to test. For redo testing you should use a large db_cache_size; I used 32G. I also increased log_buffer to reduce log buffer waits, but I have not been able to eliminate them entirely. SAP requires specific log_buffer settings so I don’t want to deviate from them too much, as I want this test to have at least some similarity to our production performance. Prior to each test run I performed N+1 log switches (where N is the number of configured redo logs) to clear out any pending redo.

One possible improvement others may wish to consider when testing redo generation performance would be to configure your UNDO tablespace to use the same filesystem type as your redo logs. I did not do so, so my max performance is somewhat constrained by writes to UNDO. Each test should be similarly constrained so I am not concerned about that at the moment.

#!/bin/sh
#
# Usage: ./testredo.sh FS RUNCOUNT
#
# Script assumes you have pre-created exactly two redo logs on the filesystem/device to be tested
# Script will create SLOB schema users in tablespace named $FS, and run each test $RUNCOUNT times

TYPE=$1
NUMRUNS=$2
BASEDIR=/oracle/SLOB
SLOBDIR=$BASEDIR/SLOB
DROPUSERS=$SLOBDIR/drop_users.sql
AWRDIR=$SLOBDIR/SLOBAWR
COUNTER=1

if [ -z "$TYPE" -o -z "$NUMRUNS" ]; then
    echo "Usage: $0 FS RUNCOUNT"
    exit 1
fi

mkdir $AWRDIR >& /dev/null

echo "Starting SLOB redo generation performance testing for $TYPE ($NUMRUNS runs)"
echo "Assuming two redo logs exist"

echo "Dropping existing users and bouncing database"

sqlplus -s / as sysdba<<EOF
@$DROPUSERS;
shutdown immediate;
startup;
exit;
EOF

echo "Setting up SLOB user schemata"
$SLOBDIR/setup.sh $TYPE 32

echo "Throwaway SLOB run to prime database"
$SLOBDIR/runit.sh 8 0

for i in 8 16 32; do (
    while [ $COUNTER -le $NUMRUNS ]; do
        echo "Running SLOB for $i writers (run #$COUNTER)"
        echo "Performing N+1 logswitch"
        sqlplus -s / as sysdba<<EOF2
        alter system switch logfile;
        alter system switch logfile;
        alter system switch logfile;
        exit;
EOF2

        $SLOBDIR/runit.sh $i 0 
        echo "Renaming AWR report"
        mv awr.txt $AWRDIR/SLOB-AWR-$TYPE-$COUNTER.$i.0
        COUNTER=$((COUNTER+1))
    done )
done

Read-Write Performance

Before running the read-write performance test, I replaced my redo logs with three 64M files on raw devices. I also decreased db_cache_size to 128M to help make sure reads are fulfilled from physical disk instead of cache. I left log_buffer as-is from the redo testing.

#!/bin/sh
#
# Usage: ./testrw.sh FS RUNCOUNT
#
# Script assumes you have pre-created exactly three small (64M) redo logs 
# Script will create SLOB schema users in tablespace named $FS, and run each test $RUNCOUNT times

TYPE=$1
NUMRUNS=$2
BASEDIR=/oracle/SLOB
SLOBDIR=$BASEDIR/SLOB
DROPUSERS=$SLOBDIR/drop_users.sql
AWRDIR=$SLOBDIR/SLOBAWR
COUNTER=1

if [ -z "$TYPE" -o -z "$NUMRUNS" ]; then
    echo "Usage: $0 FS RUNCOUNT"
    exit 1
fi

mkdir $AWRDIR >& /dev/null

echo "Starting SLOB read/write performance testing for $TYPE ($NUMRUNS runs)"
echo "Assuming three redo logs exist"

echo "Dropping existing users and bouncing database"

sqlplus -s / as sysdba<<EOF
@$DROPUSERS;
shutdown immediate;
startup;
exit;
EOF

echo "Setting up SLOB user schemata"
$SLOBDIR/setup.sh $TYPE 32

echo "Throwaway SLOB run to prime database"
$SLOBDIR/runit.sh 4 4

for i in 8 16 32; do (
    while [ $COUNTER -le $NUMRUNS ]; do
        echo "Running SLOB for $i read/write sessions (run #$COUNTER)"
        echo "Performing N+1 logswitch"
        sqlplus -s / as sysdba<<EOF2
        alter system switch logfile;
        alter system switch logfile;
        alter system switch logfile;
        alter system switch logfile;
        exit;
EOF2

        READERS=$((i/2))
        WRITERS=$READERS
        $SLOBDIR/runit.sh $WRITERS $READERS
        echo "Renaming AWR report"
        mv awr.txt $AWRDIR/SLOB-AWR-$TYPE-$COUNTER.$WRITERS.$READERS
        COUNTER=$((COUNTER+1))
    done )
done

Filed under: Database

Benchmarking filesystem performance for Oracle using SLOB (Part 2) – Results!

$
0
0

Update 20130212: Please consider this a work in progress. I am re-running most of these tests now using the same LUN for every filesystem to exclude storage-side variance. I am also using some additional init parameters to disable Oracle optimizations to focus the read tests more on db file sequential reads and avoid db file parallel reads. My posts so far document the setup I’ve used and I’m already noting flaws in it. Use this and others’ SLOB posts as a guide to run tests on YOUR system with YOUR setup, for best results.

Update 20130228: I have re-run my tests using the same storage LUN for each test.  I have updated the charts, text, and init parameters to reflect my latest results.  I have also used a 4k BLOCKSIZE for all redo log testing.  Further, instead of running each test three times and taking the average, I have run each test ten times, dropped the highest and lowest values, and averaged the remaining eight, in hopes of gaining a more accurate result.  Where I have changed my setup compared to the original post, I have left the old text in, struck out like this.

In Part 1 and Part 1.5 I covered the setup and scripts I’ve used to benchmark performance of various filesystems (and raw devices) using SLOB for Oracle datafiles and redo logs. In this post I will provide more details, including the results and database init parameters in use for my tests. This is an SAP shop so I have several event and _fix_control parameters in place that may have performance impacts. SAP says “use them” so I wanted my tests to reflect the performance I could hope to see through the SAP applications,

I do not intend these tests to demonstrate the maximum theoretical I/O that our hardware can push using each filesystem. I intend them to demonstrate what performance I can expect if I were to change the databases to use each listed filesystem while making minimal (preferably zero) other changes to our environment. I know, for example, that adding more network paths or faster ethernet or splitting the bonded paths will improve Oracle direct NFS performance, but I am testing with what we have available at the moment, not what we COULD have.

Now that 11g is certified on Oracle Linux 6, we are building out an OL6.3 box and I will be running identical tests on that server in a couple weeks. I’m expecting to see some significant improvement in BTRFS performance there.

With that said, some information on the test environment:

  • Server OS: SLES 11SP2 (kernel 3.0.51-0.7.9-default)
  • Oracle version: 11.2.0.3.2 (CPUApr2012 + SAP Bundle Patch May2012)
  • 2 CPU sockets, 8 cores per socket, with hyper-threading (Oracle sees CPU_COUNT = 32 but I have set CPU_COUNT to 2 to allow me to create a very small buffer cache)
  • All filesystems and raw devices created within a single volume in a single NetApp storage aggregate
  • I use only single instance databases so no effort was made to test cluster filesystems or RAC performance

Some notes about each test:

  1. All tests run in NOARCHIVELOG mode — obviously you should not do this in production
  2. Each filesystem was tested by creating SLOB users in a single tablespace containing a single 16GB 4GB datafile on the indicated filesystem
  3. All redo logs created with 4k blocksize
  4. All datafiles created with default parameters, I did not specify anything other than size and filename
  5. Read-only tests run with a 64M 32M db_cache_size to force PIO
  6. Redo generation tests run with a large log_buffer and two 32GB redo logs on the listed filesystem, with a 32G db_cache_size. I also set log_checkpoint_timeout to 999999999 to avert time-based checkpoints.
  7. XFS redo testing performed with filesystemio_options=none, as otherwise I could not create redo logs on XFS without using an underscore parameter and specifying the sector size when creating the logs (see http://flashdba.com/4k-sector-size/ for more information on this issue). All other tests used filesystemio_options=setall; only XFS redo required none.   All tests run with filesystemio_options=setall.
  8. Read-write tests run with three 64MB redo logs stored on raw devices and 128M db_cache_size
  9. Every test was run three ten times with 8 sessions, three ten times with 16 sessions, and three ten times with 32 sessions, and an overall average was taken after dropping high and low values. Prior to each batch of tests the database was bounced. Per Kevin Closson’s recommendation, I executed a throwaway SLOB run after each bounce, discarding the results.
  10. “NFS” refers to Oracle direct NFS  I dropped the NFS tests, as our ethernet setup is currently only 1Gb and does not perform as well as FC
  11. Automated tasks like the gather stats job, segment and tuning advisors were disabled
  12. The BTRFS volume was mounted with nodatacow to avoid copy-on-write

Results

Oracle’s standard license prohibits publication of customer publication of benchmark results without express written consent. To avoid any confusion that these results represent a benchmark, I am NOT publishing any of the absolute numbers. Instead I have scaled all results such that the performance of a RAW device is equal to 1, and all other results are reported relative to RAW. I make no claims that these results represent the best possible performance available from my software/hardware configuration. What they should accurately reflect, though, is what someone with a similar setup could expect to see if they were to create redo logs or datafiles on the indicated filesystem without performing any tuning to optimize the database, OS or filesystem.

Comments

I am disappointed with the performance of BTRFS. Our OS vendor is deprecating ext4 in favor of BTRFS so if we’re going to abandon ext3 due to fsck issues BTRFS is the path of least resistance. Ext4 appears to provide performance similar to ext3 and should reduce fsck runtimes so if we stay on cooked devices that looks like the way to go. Raw performance won overall (though storage write caching appears to have made ext3 look better than raw) but it has management issues such as the 255-device limit and inability to extend a datafile on a raw device. ASM should provide the same overall performance as raw without the limitations, but adds additional management overhead with the need to install Grid Infrastructure and that just gives one more target to keep up to date on patches. XFS had poor performance for redo logs in my environment but good performance elsewhere, and it should entirely avoid fsck-on-boot problems.

Direct NFS seemed to deliver the most consistent performance from test to test, with a smaller standard deviation than any of the others. This might be relevant for anyone who requires consistent performance more than maximal performance.

Read Performance

Scaled read performance

Scaled read performance

Redo Generation Performance

Scaled redo generation performance

Scaled redo generation performance

Read/Write Performance

Scaled read/write performance

Scaled read/write performance

Init Parameters

Below are the init parameters used for each test. Note that filesystemio_options had to be set to none for the XFS redo generation testing, but other than that these parameters are accurate for all the tests.

Read Performance

SLOB.__db_cache_size=34359738368
SLOB.__oracle_base='/oracle/SLOB'#ORACLE_BASE set from environment
SLOB.__shared_pool_size=704643072
*._db_block_prefetch_limit=0
*._db_block_prefetch_quota=0
*._db_file_noncontig_mblock_read_count=0
*._disk_sector_size_override=TRUE
*._fix_control='5099019:ON','5705630:ON','6055658:OFF','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6972291:ON','8937971:ON','9196440:ON','9495669:ON','13077335:ON'#SAP_112031_201202 RECOMMENDED SETTINGS
*._mutex_wait_scheme=1#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._mutex_wait_time=10#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optim_peek_user_binds=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_adaptive_cursor_sharing=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_extended_cursor_sharing_rel='NONE'#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_use_feedback=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*.audit_file_dest='/oracle/SLOB/saptrace/audit'
*.compatible='11.2.0.2'
*.control_file_record_keep_time=30
*.control_files='/oracle/SLOB/cntrl/cntrlSLOB.dbf','/oracle/SLOB/cntrlm/cntrlSLOB.dbf'
*.cpu_count=1
*.db_block_size=8192
*.db_cache_size=33554432
*.db_files=500
*.db_name='SLOB'
*.db_recovery_file_dest_size=8589934592
*.db_recovery_file_dest='/oracle/SLOB/fra'
*.db_writer_processes=2
*.diagnostic_dest='/oracle/SLOB/saptrace'
*.event='10027','10028','10142','10183','10191','10995 level 2','38068 level 100','38085','38087','44951 level 1024'#SAP_112030_201112 RECOMMENDED SETTINGS
*.filesystemio_options='SETALL'
*.local_listener='LISTENER_SLOB'
*.log_archive_dest_1='LOCATION=/oracle/SLOB/arch/SLOBarch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=14221312
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=1600
*.optimizer_dynamic_sampling=6
*.parallel_execution_message_size=16384
*.parallel_max_servers=0
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=10737418240
*.processes=800
*.query_rewrite_enabled='FALSE'
*.recyclebin='off'
*.remote_login_passwordfile='EXCLUSIVE'
*.remote_os_authent=TRUE#SAP note 1431798
*.replication_dependency_tracking=FALSE
*.resource_manager_plan=''
*.sessions=800
*.sga_max_size=10737418240
*.shared_pool_size=5242880000
*.star_transformation_enabled='TRUE'
*.undo_retention=432000
*.undo_tablespace='PSAPUNDO'

Redo Generation Performance

SLOB.__db_cache_size=34359738368
SLOB.__oracle_base='/oracle/SLOB'#ORACLE_BASE set from environment
SLOB.__shared_pool_size=704643072
*._db_block_prefetch_limit=0
*._db_block_prefetch_quota=0
*._db_file_noncontig_mblock_read_count=0
*._disk_sector_size_override=TRUE
*._fix_control='5099019:ON','5705630:ON','6055658:OFF','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6972291:ON','8937971:ON','9196440:ON','9495669:ON','13077335:ON'#SAP_112031_201202 RECOMMENDED SETTINGS
*._mutex_wait_scheme=1#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._mutex_wait_time=10#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optim_peek_user_binds=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_adaptive_cursor_sharing=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_extended_cursor_sharing_rel='NONE'#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_use_feedback=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*.audit_file_dest='/oracle/SLOB/saptrace/audit'
*.compatible='11.2.0.2'
*.control_file_record_keep_time=30
*.control_files='/oracle/SLOB/cntrl/cntrlSLOB.dbf','/oracle/SLOB/cntrlm/cntrlSLOB.dbf'
*.cpu_count=1
*.db_block_size=8192
*.db_cache_size=34359738368
*.db_files=500
*.db_name='SLOB'
*.db_recovery_file_dest_size=8589934592
*.db_recovery_file_dest='/oracle/SLOB/fra'
*.db_writer_processes=2
*.diagnostic_dest='/oracle/SLOB/saptrace'
*.event='10027','10028','10142','10183','10191','10995 level 2','38068 level 100','38085','38087','44951 level 1024'#SAP_112030_201112 RECOMMENDED SETTINGS
*.filesystemio_options='SETALL'
*.local_listener='LISTENER_SLOB'
*.log_archive_dest_1='LOCATION=/oracle/SLOB/arch/SLOBarch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=268427264
*.log_checkpoint_timeout=99999999
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=1600
*.optimizer_dynamic_sampling=6
*.parallel_execution_message_size=16384
*.parallel_max_servers=0
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=10737418240
*.processes=800
*.query_rewrite_enabled='FALSE'
*.recyclebin='off'
*.remote_login_passwordfile='EXCLUSIVE'
*.remote_os_authent=TRUE#SAP note 1431798
*.replication_dependency_tracking=FALSE
*.resource_manager_plan=''
*.sessions=800
*.sga_max_size=68719476736
*.shared_pool_size=5242880000
*.star_transformation_enabled='TRUE'
*.undo_retention=432000
*.undo_tablespace='PSAPUNDO'

Read/Write Performance

SLOB.__db_cache_size=34359738368
SLOB.__oracle_base='/oracle/SLOB'#ORACLE_BASE set from environment
SLOB.__shared_pool_size=704643072
*._db_block_prefetch_limit=0
*._db_block_prefetch_quota=0
*._db_file_noncontig_mblock_read_count=0
*._disk_sector_size_override=TRUE
*._fix_control='5099019:ON','5705630:ON','6055658:OFF','6399597:ON','6430500:ON','6440977:ON','6626018:ON','6972291:ON','8937971:ON','9196440:ON','9495669:ON','13077335:ON'#SAP_112031_201202 RECOMMENDED SETTINGS
*._mutex_wait_scheme=1#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._mutex_wait_time=10#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optim_peek_user_binds=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_adaptive_cursor_sharing=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_extended_cursor_sharing_rel='NONE'#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*._optimizer_use_feedback=FALSE#RECOMMENDED BY ORACLE/SAP FOR 11.2.0 - SAP note 1431798
*.audit_file_dest='/oracle/SLOB/saptrace/audit'
*.compatible='11.2.0.2'
*.control_file_record_keep_time=30
*.control_files='/oracle/SLOB/cntrl/cntrlSLOB.dbf','/oracle/SLOB/cntrlm/cntrlSLOB.dbf'
*.cpu_count=1
*.db_block_size=8192
*.db_cache_size=134217728
*.db_files=500
*.db_name='SLOB'
*.db_recovery_file_dest_size=8589934592
*.db_recovery_file_dest='/oracle/SLOB/fra'
*.db_writer_processes=2
*.diagnostic_dest='/oracle/SLOB/saptrace'
*.event='10027','10028','10142','10183','10191','10995 level 2','38068 level 100','38085','38087','44951 level 1024'#SAP_112030_201112 RECOMMENDED SETTINGS
*.filesystemio_options='SETALL'
*.local_listener='LISTENER_SLOB'
*.log_archive_dest_1='LOCATION=/oracle/SLOB/arch/SLOBarch'
*.log_archive_format='%t_%s_%r.dbf'
*.log_buffer=268427264
*.log_checkpoint_timeout=99999999
*.log_checkpoints_to_alert=TRUE
*.max_dump_file_size='20000'
*.open_cursors=1600
*.optimizer_dynamic_sampling=6
*.parallel_execution_message_size=16384
*.parallel_max_servers=0
*.parallel_threads_per_cpu=1
*.pga_aggregate_target=10737418240
*.processes=800
*.query_rewrite_enabled='FALSE'
*.recyclebin='off'
*.remote_login_passwordfile='EXCLUSIVE'
*.remote_os_authent=TRUE#SAP note 1431798
*.replication_dependency_tracking=FALSE
*.resource_manager_plan=''
*.sessions=800
*.sga_max_size=34359738368
*.shared_pool_size=5242880000
*.star_transformation_enabled='TRUE'
*.undo_retention=432000
*.undo_tablespace='PSAPUNDO'

Supplementary Data

I have removed the content of this section on 20130228 as it related to my previous results and not the updated results now contained in this posting.


Filed under: Database

Collection of links, tips and tools for running SLOB

$
0
0

This post contains a collection of various tips and tools I have found useful for running SLOB. Where possible I’ve provided attribution for where I first learned of each tip, but some items are simply common sense or have already been posted by multiple people, so I can’t be sure that I have linked every possible source. If you have a write up out there that I’ve missed that you would like linked, please let me know in the comments. I have only focused on running SLOB to test physical read performance, redo generation performance, and mixed workload read/write performance so these tips will only cover those use cases, not other potential uses like testing logical I/O performance. I would be happy to include LIO testing tips if anyone shares some in a comment.

Getting Started – Things You Need

  1. First you’ll need to have an installation of the Oracle database licensed to run AWR reports.
  2. Next go to the OakTable site and download the SLOB distribution, SLOB includes a simple database creation script in the misc/create_database_kit/ directory along with a README describing how to use it, or you can use your existing database. I extract the kit into my oracle user’s home directory, but you can put it anywhere.
  3. You should also get the simple SLOB init.ora file for read IOPS testing and start your database with those parameters to get a quick start on testing physical read performance. If you’re using Oracle 11.2.0.3 you really should use the underscore parameters in this init.ora for accurate results.
  4. Read the SLOB readme if you haven’t already.

General SLOB Tips For Best Results

  1. Disable automatic generation of AWR snapshots. SLOB relies on the differences between an AWR snapshot taken at the start of the test run and another snapshot taken at the end of the test run, so if an automated AWR snapshot occurs it will throw off your results.
  2. Disable backups, autotasks, resource manager, and so on. You want the database to be as idle as possible, other than SLOB. See Yury’s SLOB index page which includes these suggestions.
  3. Save the awr.txt file that SLOB generates after each run so you can compare performance with previous runs. Use the awr_info.sh script included with SLOB to summarize your collected AWR reports for easy reading.
  4. Review your results. Check for unexpected differences from run to run, don’t simply run it once and expect meaningful results; you want to see consistent performance to have confidence your tests accurately reflect your setup.
  5. A throwaway SLOB run after creating the users with setup.sh (or bouncing the DB to change init parameters) will help with repeatability.
  6. Start small, with 4 or 8 sessions, and then try again with a few more sessions to find the sweet spot with the best performance for your hardware. Once you hit a session count where performance starts to degrade, don’t bother running more sessions than that. On high-powered hardware you might be able to run 128 sessions, but a smaller server might work better with 32.
  7. If you aren’t trying to test variances in your storage, keep your storage consistent from test to test. To put this another way, if (like me) you are using SLOB to test the performance of various filesystems, don’t create all your filesystems on separate LUNs. You probably don’t know how each LUN might be striped across the storage backend. Use the same LUN for every test, running mkfs as necessary to change filesystems or fdisk to reformat for raw usage. (Kevin Closson suggested this in email, I can’t find a public posting to link to.)

Physical Read Testing Tips

  1. Disable Oracle IO optimization that turns your desired db file sequential reads into db file parallel reads. Kevin Closson has also recommended use of these parameters in the simple init.ora file for read testing. I consider this a requirement, not a recommendation. I already mentioned this above but it’s worth repeating.

Redo Generation Testing Tips

  1. When testing M writers with N redo logs configured, preface your test with (N*2)+1 log switches followed by a throwaway SLOB run using M readers. See this tweet from Kevin Closson. This will produce more consistent results from run to run.

Read/Write Testing Tips

  1. If your buffer cache is small enough (and it should be), running SLOB with zero readers and some writers will produce enough random reads to represent a mixed workload. See this comment from Kevin Closson. Including dedicated read sessions is not necessary.

SLOB Tools

  1. The SLOB distribution contains a script called awr_info.sh in the misc/ directory that will summarize the most important parts of your AWR report for easy reading.
  2. My SLOB.R script to interrogate SLOB awr_info.sh output in R.
  3. Yury Velikanov’s SLOB On Steroids, which I haven’t yet tried but is worth a look.

Advanced Tips

  1. SLOB is simple, yet powerful, and people are always finding new uses for it. So read what others write about SLOB, like Yury’s SLOB index linked above, my other SLOB posts, and posts about SLOB from Karl Arao, flashdba, Martin Bach and of course Kevin Closson himself.
  2. Make mistakes and learn from them.

Further Reading

  1. Added 20130220: Check out the new SLOB mind map!  Thanks to Martin Berger for the great idea.
  2. Added 20130226: Lies, Damned Lies, and I/O Statistics by Matt Hayward.  While not directly related to SLOB, this is worthwhile reading for anyone involved in any sort of I/O benchmarking.
  3. Added 20130320: Using SLOB for PIO Testing by FlashDBA, including a configuration-file driven SLOB harness that you will probably like more than the simple ones I put out there.


Filed under: Database

SLOB.R v0.6: An R script for analyzing SLOB results from repeated runs

$
0
0

This post covers usage of my SLOB.R script, used to analyze SLOB results from repeated test runs.  The contents of the SLOB.R script are at the bottom of this post, but first I will show the contents of a sample SLOB.R session, followed by how you can produce similar results.

A Sample SLOB.R Session

The first step is to start R.  On Windows you’ll double-click an icon, on Linux you’ll just type R and hit enter.  Once R has started, you will be in the interactive R interface and it will display some introductory text standard in R.

R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-w64-mingw32/i386 (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

Once you are in R, you need to load the SLOB.R script and then load your SLOB data.  SLOB data is produced by running SLOB repeatedly, saving the awr.txt output in between each run, then running the SLOB/misc/awr_info.sh script to summarize the collected AWR reports. I have saved the SLOB.R script in my Cygwin home directory, and saved the awr_info.sh output in a file called “awrinfo” in the same directory.

> source('C:/cygwin/home/brian/SLOB.R') 
> SLOB <- loadSLOB('C:/cygwin/home/brian/awrinfo')

Now you have the contents of your awr_info.sh output in an R variable called SLOB.  You can call this variable anything you wish, but I used SLOB.

To use SLOB.R you need to tell it a little bit about your AWR data: specifically which variables you are trying to test and how many sessions you used. In this example I am comparing performance of the XFS and EXT3 filesystems using the SLOB physical read, redo, and physical write models.  The variables to be tested (EXT3 and XFS) are embedded in the filenames I used when saving the awr.txt report between SLOB runs, and the session counts are the numbers I used when running SLOB’s runit.sh script to put load on the database.  We tell SLOB.R about this by setting a few R variables containing this data.  You can call them anything you wish, you just need to know their names.

> SLOB.sessions <- c(8, 16, 32) 
> SLOB.read.vars <- c('XFS', 'EXT3') 
> SLOB.redo.vars <- c('XFS', 'EXT3') 
> SLOB.readwrite.vars <- c('XFS', 'EXT3')

As you can see, variable assignment in R uses the <- operator.  I’m using the R built-in c() function (concatenate) to create vector variables that contain multiple values (you can think of them as like an array for now).  The SLOB.sessions variable contains three integer values: 8, 16, and 32; the other three variables each contain two string values: ‘XFS’ and ‘EXT3′.  For this demo I am only including two variables but it works fine with many more than that.  I have been using about 7 of them.  I am comparing filesystems, but you might be comparing storage vendor 1 vs storage vendor 2, or fibre channel vs dNFS, or Oracle 11.2.0.1 vs Oracle 11.2.0.3.  As long as the variables are identifiable from your AWR filenames, they will work.

You can view the contents of any R variable just by typing its name.

> SLOB.sessions
[1]  8 16 32
> SLOB.read.vars
[1] "XFS"  "EXT3"

With these variables set up you can now use the main SLOB.R driver functions to summarize the contents of your SLOB AWR reports.  First I’ll call the SLOBreads() function to summarize physical read performance.  This function summarizes the PREADS column from awr_info.sh output, by variable and session count.  To produce a better average it discards the lowest value and highest value from each combination of variable and session count.  Other SLOB.R driver functions are SLOBredo() and SLOBreadwrite().

> SLOBreads(SLOB, SLOB.reads.vars, SLOB.sessions)
            8       16       32  Overall
XFS  27223.89 46248.05 61667.21 44886.22
EXT3 30076.77 49302.59 59113.00 46094.39

So this indicates that for 8 reader sessions, the XFS filesystem gave me an average of 27,223.89 physical reads per second, while EXT3 gave me 30,076.77 physical reads per second.  The columns for 16 and 32 sessions have the same meaning as the 8 session column.  The ‘Overall’ column is an average of ALL the data points, regardless of session count.

You don’t have to use variables when calling the SLOB.R driver functions.  You can specify the variables or session counts directly in the call.  This is part of how R works.  Another example, showing how you can receive the same output by calling it with a variable or not:

> SLOBredo(SLOB, SLOB.redo.vars, SLOB.sessions)
             8        16        32   Overall 
XFS  326480426 336665385 321823426 326425272 
EXT3 304188997 325307026 326618609 317991362 
> SLOBredo(SLOB, c('XFS', 'EXT3'), c(8, 16, 32)) 
             8        16        32   Overall 
XFS  326480426 336665385 321823426 326425272 
EXT3 304188997 325307026 326618609 317991362

The numbers above would indicate that when storing my redo logs on XFS, SLOB could push 326,480,426 bytes of redo per second with 8 sessions.  On EXT3 with 8 sessions I saw 304,188,977 bytes of redo per second.  The 16, 32 and Overall columns have meanings similar to what I showed before.

The SLOBreadwrite() function reports the sum of physical reads and physical writes, with the columns all meaning the same as they do for the other functions.

> SLOBreadwrite(SLOB, SLOB.readwrite.vars, SLOB.sessions)
            8       16       32  Overall 
XFS  18520.44 20535.41 20728.07 19823.37 
EXT3 19568.04 21730.94 22641.14 21203.78

How To Create Output For SLOB.R

SLOB.R is smart enough to figure out which of your runs are testing reads, which are testing redo, and which are testing readwrite performance.  But for this to work you have to follow the naming convention defined in the SLOB/misc/README file when renaming your awr.txt files for processing by awr_info.sh:  [whatever].[number of SLOB writers].[number of SLOB readers] — SLOB.R expects your variables to be uniquely identifiable strings in the ‘whatever‘ part.

I recommend using scripts to run SLOB repeatedly and save the awr.txt output in between.  I provided some scripts in a prior post, but you can use your own scripts as long as your filenames match the required format.

Once you have your AWR output collected, run SLOB/misc/awr_info.sh on all the files, and save its output.  This is the file you will load into R.

SLOB.R Script (v0.6)

Save this as SLOB.R.  You may find it easier to use the pastebin version.

# SLOB.R version 0.6
#   BJP 2013 - Twitter @BrianPardy - http://pardydba.wordpress.com/
# See http://wp.me/p2Jp2m-4i for more information
#
#
# Assumes three possible SLOB test models: READS, REDO, WRITES
#   READS are readers-only
#   REDO and WRITES are writers only, differing in size of buffer_cache (small = REDO, large = WRITES)
#
#   Reports PREADS in SLOB.R READS model
#   Reports REDO in SLOB.R REDO model
#   Reports PWRITES in SLOB.R WRITES model
#   Use SLOB.R meta-model READWRITE to report PREADS+PWRITES in WRITES model
#
# Setup:
#   Run SLOB as usual, with at least three runs for each variable tested for each model
#   Save awr.txt between runs in filename matching [something]-variable.writers.readers
#     Example:  AWR-FCSAN-run1.8.0, AWR-FCSAN-run2.8.0 (...) AWR-FCSAN-run10.8.0
#               AWR-ISCSINAS-run1.8.0, AWR-ISCSINAS-run2.8.0 (...) AWR-ISCSINAS-run10.8.0
#       (In this case, the variables would be "FCSCAN" and "ISCSINAS", comparing fibrechannel SAN to NAS)
#   Run awr_info.sh from SLOB distribution against all AWR files at the end and save the output
#   Load awr_info.sh output into R with:  SLOB <- loadSLOB("filename")
#
# Hints:
#   Best results achieved with more SLOB runs - myavg() drops high and low values per set, averages remaining
# 
# Detailed example usage:
#   Testing SLOB read, redo and readwrite models 10 times each with 8, 16, and 32 sessions on EXT3 vs EXT4
#   Used a tablespace on EXT3 for EXT3 testing and a tablespace on EXT4 for EXT4 testing
#   Used redo logs on EXT3 for EXT3 REDO testing and redo logs on EXT4 for EXT4 REDO testing
#   Ran SLOB/misc/awr_info.sh on all awr.txt reports generated from these 60 SLOB runs
#   Saved awr_info.sh output as filename "awrinfo"
#
#  (Start R)
#  > source("SLOB.R")
#  > SLOB <- loadSLOB("awrinfo")
#  > SLOB.sesscounts < c(8, 16, 32)            # Specify the number of sessions used in tests
#  > SLOB.read.vars <- c('EXT3', 'EXT4')       # Specify the variables for READ testing: EXT3 vs EXT4
#  > SLOB.redo.vars <- SLOB.read.vars          # Same variables for REDO testing as for READ testing
#  > SLOB.readwrite.vars <- SLOB.read.vars     # Same variables for READWRITEtesting as for READ testing
#  > SLOB.reads <- SLOBreads(SLOB, SLOB.reads.vars, SLOB.sesscounts)
#  > SLOB.redo <- SLOBredo(SLOB, SLOB.redo.vars, SLOB.sesscounts)
#  > SLOB.readwrite <- SLOBreadwrite(SLOB, SLOB.readwrite.vars, SLOB.sesscounts)
### Previous three lines populate SLOB.reads, SLOB.redo and SLOB.readwrite variables with AWR results
### You can then interrogate those variables by typing their names
# > SLOB.reads
#             8       16       32  Overall
# XFS  27223.89 46248.05 61667.21 44886.22
# EXT3 30076.77 49302.59 59113.00 46094.39
#
#
# Usage variants for more detailed investigation.  Consider this advanced usage.
#   Most people should just use SLOBreads(), SLOBredo() and SLOBreadwrite()
#
#
#   Get average REDO bytes for variable 'foo' across all sessions: 
#     avg(SLOB, 'REDO', 'foo')
#     redoavg(SLOB, 'foo')
#
#   Get average REDO bytes for variable 'foo' with 8 sessions:
#     avg(SLOB, 'REDO', 'foo', 8)
#     redoavg(SLOB, 'foo', 8)
#
#   Get average PREADS (physical read) for variable 'foo' across all sessions:
#     avg(SLOB, 'READS', 'foo')
#     readavg(SLOB, 'foo')
#
#   Get average PWRITES (physical writes) for variable 'foo' with 16 sessions:
#     avg(SLOB, 'WRITES', 'foo', 16)
#     writeavg(SLOB, 'foo', 16)
#
#   Get sum of PREADS and PWRITES for variable 'foo' with 32 sessions:
#     avg(SLOB, 'READWRITE', 'foo', 32)
#     readwriteavg(SLOB, 'foo', 32)
#
#   Get average REDO bytes for multiple variables ('foo' and 'bar') across all sessions:
#     sapply(c('foo', 'bar'), redoavg, dat=SLOB)
#       or for 16 sessions:
#     sapply(c('foo', 'bar'), redoavg, dat=SLOB, sessioncount=16)
#     alternate:   sapply(c('foo', 'bar'), avg, dat=SLOB, sessioncount=16, model='READS')
#     (Note: This returns separate results for each variable, it does not combine and average them)
#
#   Get sum of PREADS and PWRITES for multiple variables ('XFS' and 'EXT3') across 16 sessions:
#     sapply(c('XFS', 'EXT3'), avg, dat=SLOB, sessioncount=16, model='READWRITE')
#
#   View the top waits in XFS READ model with 16 sessions:
#     waits(SLOB, str='XFS', model='READS', sessioncount=16)
#
#   View all data for a particular model with a specific session count:
#     SLOB[intersect(REDO(SLOB), sessions(16, SLOB)),]
#
#

getruns <- function(dat, model, str, sessioncount) {
  tmp <- intersect(grep(str, dat$FILE), eval(parse(text=paste(model, '(dat)', sep=''))))           
  if(missing(sessioncount)) { return(tmp)} 
  else { intersect(tmp, sessions(sessioncount, dat))}
}

loadSLOB <- function(filename) {
  read.table(file=filename, sep="|", header=TRUE)
}

# heuristically identify REDO model runs - if this sucks for you, comment it out
# and uncomment the alternate REDO function below.  It expects your filenames to include
# the string '-REDO-' when testing REDO performance.
REDO <- function(dat) {
  setdiff(which(dat$REDO/(dat$PWRITES*8192) > 2), READS(dat))
}

#REDO <- function(dat) {
#  grep('-REDO-', dat$FILE)
#}

WRITES <- function(dat) {
  setdiff(which(dat$WRITERS > 0), REDO(dat))
}

READS <- function(dat) {
  which(dat$READERS > 0)
}

READWRITE <- function(dat) {
  WRITES(dat)
}

sessions <- function(n, dat) {
  union(which(dat$WRITERS == n), which(dat$READERS == n))
}

myavg <- function(z) {
  mean(z[!z %in% c(min(z), max(z))])
}

getavg <- function(mode, dat) {
   myavg(eval(parse(text=paste('dat$', mode, sep=''))))
}

redoavg <- function(dat, ...) {
  getavg('REDO', dat[getruns(dat, 'REDO', ...),])
}

readavg <- function(dat, ...) {
  getavg('PREADS', dat[getruns(dat, 'READS', ...),])
}

writeavg <- function(dat, ...) {
  getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),])
}

readwriteavg <- function(dat, ...) {
  getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),]) + getavg('PREADS', dat[getruns(dat, 'WRITES', ...),])
}

avg <- function(dat, model, ...) {
  if(model=='REDO') {
    getavg('REDO', dat[getruns(dat, 'REDO', ...),])
  } else if(model=='READS') {
    getavg('PREADS', dat[getruns(dat, 'READS', ...),])
  } else if (model=='WRITES') {
    getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),])
  } else if (model=='READWRITE') {
    getavg('PWRITES', dat[getruns(dat, 'WRITES', ...),]) + getavg('PREADS', dat[getruns(dat, 'WRITES', ...),])
  }
}

waits <- function(dat, ...) {
  as.character(dat[getruns(dat, ...), 'TOP.WAIT'])
}

testdata <- function(dat, model, str, ...) {
  if(model=='REDO') {
    sapply(str, avg, dat=dat, model='REDO', ...)
  } else if(model=='READS') {
    sapply(str, avg, dat=dat, model='READS', ...)
  } else if(model=='WRITES') {
    sapply(str, avg, dat=dat, model='WRITES', ...)
  } else if (model=='READWRITE') {
    sapply(str, avg, dat=dat, model='READWRITE', ...)
  }
}

readdata <- function(dat, str, ...) {
  sapply(str, avg, dat=dat, model='READS', ...)
}

redodata <- function(dat, str, ...) {
  sapply(str, avg, dat=dat, model='REDO', ...)
}

readwritedata <- function(dat, str, ...) {
  sapply(str, avg, dat=dat, model='READWRITE', ...)
}

SLOBreads <- function(dat, strs, sessioncounts) {
  z <- data.frame()
  for(i in sessioncounts) { 
    z <- rbind(z, readdata(dat, strs, i))
  }
  z <- rbind(z, readdata(dat, strs))
  z <- t(z)
  colnames(z) <- c(sessioncounts, 'Overall')
  rownames(z) <- strs
  return(z)
}

SLOBredo <- function(dat, strs, sessioncounts) {
  z <- data.frame()
  for(i in sessioncounts) { 
    z <- rbind(z, redodata(dat, strs, i))
  }
  z <- rbind(z, redodata(dat, strs))
  z <- t(z)
  colnames(z) <- c(sessioncounts, 'Overall')
  rownames(z) <- strs
  return(z)
}

SLOBreadwrite <- function(dat, strs, sessioncounts) {
  z <- data.frame()
  for(i in sessioncounts) { 
    z <- rbind(z, readwritedata(dat, strs, i))
  }
  z <- rbind(z, readwritedata(dat, strs))
  z <- t(z)
  colnames(z) <- c(sessioncounts, 'Overall')
  rownames(z) <- strs
  return(z)
}

That’s it for now.  I welcome comments and questions.


Filed under: Database

Why your EM12cR2 FMW stack probably needs patch 13490778 to avoid OHS down/up events

$
0
0

MOS note 1496775.1 describes a situation with EM12cR2 where OEM will falsely report the Oracle HTTP Server instance (ohs1) as down, even though it is up.  This is due to some changes in FMW 11.1.1.6.  If you don’t have any incident rules or notifications set up that would catch this event, it’s easy to miss it and not know that it is happening.  I had run into this note a couple times before but ignored it, since I had never seen any open events complaining about OHS being down so I figured I just wasn’t hitting the bug.

This morning I caught one of the events.  I found myself wondering how often this had been happening — was it an issue once every couple days, every few hours, or what?

SQL> col msg format a45
SQL> select msg, count(*) from sysman.mgmt$events
  2  where closed_date >= sysdate - 1 and msg like '%HTTP Server instance%'
  3  group by msg;

MSG                                             COUNT(*)
--------------------------------------------- ----------
CLEARED - The HTTP Server instance is up             430
The HTTP Server instance is down                     430

Turns out it had been happening a LOT.  If you’ve followed Oracle’s recommendations and set up target lifecycle status priorities (see my post on doing so) you’ve probably set your OEM targets up with “MissionCritical” priority.  That means your OMS has been burning a lot of CPU to process all these up/down events on a mission critical target with high priority, potentially delaying processing of other events elsewhere in your events.

Applying patch 13490778, with ORACLE_HOME set to $MW_HOME/oracle_common should resolve this issue.  For best results, stop all OEM components prior to patch application and restart them when complete.

To convince yourself that applying the patch helped, re-run that query about 15 minutes after applying the patch and you should see the count decrease.


Filed under: Cloud Control

First Thoughts: BlueMedora’s Oracle Enterprise Manager Plugin for VMware

$
0
0

This post reviews the Oracle Enterprise Manager Plugin for VMware from BlueMedora.  This commercial (for-fee) plugin integrates into OEM 12c to provide visibility into your VMware environment for Cloud Control users.  I have only had the plugin installed for two days so this will serve as more of a “first thoughts” report than as a full review of the product and all of its capabilities.

Prerequisites

The plugin, license and support are available directly from BlueMedora.  They are also currently offering a fully-functional 30-day free evaluation copy.  You will need to meet the following prerequisites in your environment:

  1. An active, functional installation of Enterprise Manager 12c (I used 12.1.0.2.0 + PSU2, but any EM12c release should work)
  2. A functional emcli installation.  I run emcli out of the $OMS_HOME on my OMS server.
  3. An OEM user account with appropriate permissions to import and deploy plugins (I simply used SYSMAN)
  4. An Oracle Management Agent installed on some server other than your OMS server to host the plugin (per BlueMedora’s recommendation, which I followed)
  5. A login to your VMware environment with read-only access to vCenter, your cluster, datastores, ESX hosts, and virtual machines.  If you use an account with permission to start/stop VMs that functionality is available for use in EM12c via the plugin, but I am using a read-only account and have not tested that part of the product
  6. At install time you will also need the hostname of your vCenter server and the SDK port if it has been changed from the default 443.

Installation and Configuration

After downloading the plugin from BlueMedora, installation is as simple as any other plugin you may already be using.  Their support team will walk you through the install and configuration (over a webinar in my case), but the steps are just what you would expect: copy the plugin .opar file to a location accessible by your OMS and import it using the emcli import_plugin verb, then after the import completes, deploy the plugin to your OMS and finally deploy the plugin to a management agent.

Configuring the plugin (adding targets) is also simple.  For the first step you will run an  “Add Target Manually” step with the “VMware vSphere” target type to let the plugin know about your vSphere environment.  Here is where you will provide your VMware login credentials, hostname and port (if non-default) in order to begin monitoring.  After adding the vSphere target, access it through the All Targets view, and go to the “All Metrics” link to validate that your VMware login credentials worked and metrics are being collected.  As an aside, a particularly interesting feature here that I have not seen in other plugins is that there is a metric group called “Collection Trigger” that, when clicked, triggers a collection.  This is handy and I would like to see this implemented elsewhere; I find it much easier than going over to an agent and running emctl to force a metric collection.

Once that is done you will see a new auto-discovery module called “vSphere Discovery Module”, and after configuring that discovery module with the name of your vSphere target you will run auto-discovery on the management agent to which you have deployed the plugin.  Auto-discovery identified our VMware cluster, ESX hosts (hypervisors), datastores and all of our virtual machines.  From the auto-discovery results you then promote any targets you wish to monitor through Enterprise Manager.  You may not want to promote all of your VMs (for management reasons, or to comply with your license terms) but you should promote all clusters, hypervisors and datastores, for the best overall view of your environment.

As with the other plugins I’ve used, after promoting targets you’ll need to wait a little while before metrics are collected and screens populate with data.  In my case I had useful data appearing on screen within about 10 minutes after promoting the first VM.

What’s In It For Me?

For a good high-level overview of what the plugin can do, take a look at the white paper from BlueMedora, the product overview PDF and the product datasheet.  I’ll only cover some of the additional items that these sources do not go into in detail.

Visuals

The plugin provides several nice visualizations.  On top of the “CPU and memory usage over time” graphs you would expect to find, I particularly like the bar graphs that group your hypervisor CPU and memory load into quantiles, making it easy to see, for example, that one host has 75-100% CPU usage while the other hosts are all in the 0-25% bucket, indicating that you may want to allocate your VM load a bit more evenly.  The datastore visualization showing the fill percentage on each datastore is also nice. Samples of these can be found in the product overview PDF.

Another useful visual is the integrated view of an Oracle database along with the VM on which it runs, the datastore(s) assigned to that VM, and the hypervisor on which the VM is running.  You can quickly see if, for example, the hypervisor is under memory pressure even if the VM does not appear to be, along with the executions and IOPS per second and average active sessions metric for the database.

Centralized Data

I find it hard to click through the vSphere client to find information I’m looking for.  I’m always in the wrong inventory view, or seeing only a subset of the data since I have a host selected rather than the entire cluster, and so on.  This plugin provides an easy to use centralized view of information across the VMware environment.  By going to the “Virtual Machines” view from the main vSphere target I can see a data grid showing each VM’s power status, provisioned disk, consumed CPU and memory, guest memory usage percentage, Even better, the grid includes a column indicating whether or not VMware tools are installed and running. There’s also an uptime column but I’m not sure how to parse it.  I think it represents the VM’s uptime on the specific hypervisor currently running it, but I’ll be asking support to clarify that for me.

New Job Types

These will only be useful to you if you want to automate your VMware environment from within OEM, and if you grant permissions beyond read-only to your monitoring user.  I do not expect to make use of this feature. But if you choose to do so, the plugin adds several new job types you can use for OEM jobs:

  • vSphere Hypervisor Enter Maintenance
  • vSphere Hypervisor Exit Maintenance
  • vSphere VM Power Off
  • vSphere VM Power On
  • vSphere VM Reset
  • vSphere VM Restart Guest
  • vSphere VM Shutdown Guest
  • vSphere VM Suspend

Metrics/Alerts

What use is an OEM plugin without metrics and alerting?  Very little.  This plugin provides a ton of metrics for your VMware environment.  The list is too long to include here, but see this on pastebin for a quick view produced from the MGMT_METRICS table.  You can also set warning and critical thresholds for many (although not all) metrics, and those alerts will go through the normal EM12c event framework to create incidents and/or notifications if configured to do so through incident rules. You can also view the same metric-over-time graphs as you can with the out-of-the-box EM metrics.

Other Items

The overview states that the plugin includes some integration with BI Publisher for reports.  I do not have BI Publisher installed with my EM12c environment so I can’t speak to this feature.

Disclaimer

I am not employed by BlueMedora, VMware or Oracle and neither I nor my employer received any consideration or compensation from those vendors.


Filed under: Cloud Control

Using EM12c to set up a Data Guard physical standby database

$
0
0

This post will cover using EM12cR2 to create a Data Guard configuration including one primary database and one physical standby server, making use of the Data Guard broker.  The application software we use does not support logical standby databases so I have not attempted to do so and will not document that here.

Prerequisites

As this post focuses specifically on creating a new Data Guard configuration, I will assume you have an existing functional EM12c environment in place along with two servers to use for Data Guard and an existing database which you wish to protect running on one of those servers.

Both servers should exist as promoted targets within EM12c.  The existing database (along with its listener and ORACLE_HOME) should exist as promoted targets within EM12c. The standby server should have a software-only installation of the same version and patch level of the Oracle Database (Enterprise Edition only) as exists on the primary server. For simplicity I suggest using the same filesystem paths on both servers, although Data Guard does allow rewrite rules if necessary.

Note that Data Guard is a feature included with an Enterprise Edition license but running a physical standby will require a license for the database software on the standby server.  Contact your sales representative for full details.  For the purposes of this post I will assume you are using a copy of the database downloaded from OTN for prototyping purposes.

Configure the database as you wish.  One point I recommend is to make sure that your redo logs are appropriately sized and that you have enough of them, as adding or resizing redo logs after Data Guard is operational requires some special care.

Adding A Physical Standby

Now that your environment is set up with a working EM12c installation and one active database that you wish to protect with a Data Guard physical standby, you can proceed. Start by going to the database home page and select ‘Add Standby Database’ from the drop-down menu under ‘Availability’.

Step 1

Step 1

On the next page, select ‘Create a new physical standby database’ and click continue.

Data Guard - Step 2

On the next page you select a method to instantiate the physical standby database.  Select ‘Online Backup’ and ‘Use Recovery Manager (RMAN) to copy database files’, then click Next.

Data Guard - Step 3

On the next page you specify the degree of parallelism for the RMAN backup, provide operating system credentials for the user owning the Oracle installation on the primary server (oradgd, in my case) and define the locations of the standby redo logs.  The degree of parallelism is up to you and depends on how many CPUs you have and how quickly you need the backup to run.  I specify new named credentials here and save them as preferred database host credentials.  I recommend clicking the ‘Test’ button to validate the supplied credentials.  I do not use Oracle-Managed Files so I have unchecked the box to use them for standby redo log files, which allows me to specify a location for the standby redo logs if I do not like the default.  I left these locations at their default.  After making your entries on this page, click Next.

Data Guard - Step 4

On the next page you will specify configuration details for the standby database.  All entries on this page relate to the STANDBY server, not the primary.  Enter the hostname of the standby server and the path to the Oracle home installation you will use for the standby database.  Enter credentials for the Oracle home’s software owner, and again I recommend saving them as preferred credentials and clicking the Test button.  The instance name you provide must not already exist on the standby server.  I used the same instance name on the standby as on the primary.  Click Next after entering all required information.

Data Guard - Step 5

The next page allows you to select the file locations for the standby database and define the listener parameters.  I want to keep things simple with my standby using the same file paths as the primary so I select the radio button labeled “Keep file names and locations the same as the primary database“.  If you wish, you can click the ‘Customize’ button and specify alternate file locations for data files, control files, temp files, directories and external files, but keeping everything identical between the two servers will simplify things greatly. I will also use the default listener name and port.  Click Next once you have made your selections here.

Data Guard - Step 6

On this page you specify some final parameters for the standby database such as the DB_UNIQUE_NAME, the name EM12c will use for the standby database target, the location for archived redo log files received from the primary, the size of your FRA and the deletion policy for archived redo log files.  For the best monitoring experience, check the ‘Use SYSDBA monitoring credentials’ box.  I also suggest you leave the option checked to use the Data Guard Broker. Click Next once you have made your selections here.

Data Guard - Step 7

The final page you see here will show a review of the settings you have selected through the process.  You can see here that I am setting up a standby on Oracle Enterprise Linux while my primary runs on SUSE; this is not a problem for Data Guard. Double check everything to make sure it is all correct, and once you are satisfied, click the Finish button.

Data Guard - Step 8

As soon as you click Finish on the previous screen, EM12c will start setting up your standby database.  You should quickly see a screen like the one below showing that a job has been submitted to create the Data Guard configuration.

Data Guard - Step 9

If you click on the ‘View Job’ text, you will see the execution log from the job run.

Data Guard - Step 10

To monitor the job as it proceeds, you can click on the ‘Expand All’ text and then set an auto-refresh interval from the drop-down at the top right. Depending on the size of your database and your server performance, and assuming everything went well, you should soon see that the job has completed successfully.

Data Guard - Step 11

Validating Data Guard Configuration

Once you see the setup job has succeeded, your Data Guard physical standby is now up and running, actively processing redo from your source database.  You can verify this by returning to the primary database’s home page and clicking the ‘Availability’ menu, which now has additional options such as ‘Data Guard Administration’, ‘Data Guard Performance’ and ‘Verify Data Guard Configuration’.  Click on ‘Data Guard Administration’

Data Guard - Step 12

The Data Guard administration page shows a summary of your setup.  You can see the host running the primary database, the status of your standby(s) and various metrics like the current and last-applied archived log numbers.  The various links on this page can then be used to change the protection mode, enable/disable fast start failover and so on.  You can also use the ‘Failover’ and ‘Switchover’ buttons to initiate a role transition.  Read the documentation so that you understand the difference and know which to use in which situations.

Data Guard - Step 13

To help convince yourself that all is working properly, set the auto-refresh interval to 30 seconds and leave this page up.  Open a sqlplus session on your primary database as sysdba and run “alter system switch logfile”.  You should see the log numbers increment once the refresh interval has passed, as shown below.

Data Guard - Step 14

As a final test, attempt a switchover operation.  This will leave your current primary database running as a standby, while your current standby database takes over the primary role.  Click on the ‘Switchover’ button. Here you are prompted for credentials on the standby database, which is why I suggested saving them as preferred credentials during the setup process.  If you did not do so then, provide appropriate credentials now, then click Continue.

Data Guard - Step 15

Next you’ll be prompted for credentials for the primary server.  Provide those credentials, if necessary, and then click Continue.

Data Guard - Step 16

Next you will have one final screen to click through to start the switchover process.  There is a checkbox to choose whether or not you want to swap monitoring settings between the primary and standby databases.  I check the box as this is a good thing, but as the text says you have the option to NOT swap the monitoring settings and instead use your own detailed monitoring templates for each system and apply them after the switchover.  I prefer to keep it simple. Once you are ready to go, click Yes, but be aware this will disconnect any sessions active in your primary database.

Data Guard - Step 17

You will see a progress screen as the switchover occurs.

Data Guard - Step 18

Once the switchover completes EM12c will return you to the Data Guard Administration page, where you should see that your primary and standby servers have switched roles.

Data Guard - Step 19

Conclusion

If you have been following along, you now have a functional Data Guard system with a physical standby and have successfully completed one switchover operation.  You can repeat this process to add another physical standby database on a third server if you wish.  As you look around you’ll also notice a few other changes, such as the additional targets that EM12c added for the standby database, or that the Databases list view has some extra text added that indicates which instance is running as primary and which is running as a standby.  Now it’s time to research your needs for Data Guard and get all the remaining bits configured to best support your users. Good luck!


Filed under: Cloud Control, Database

Stale EM12c patch recommendations? Get patch 14822626

$
0
0

I don’t use the automated patching functionality provided by EM12c.  I do, however, get value out of the patch recommendations since they serve as a good reminder when I’ve missed a patch that should be applied to one of my targets.  For this reason I was disappointed when, after upgrading from EM12cR1 to EM12cR2, the patch recommendations it gave me became stale and stopped getting updated when I loaded the em_catalog.zip file.

If you DO use the automated patching functionality, you have probably already followed all of the advice and installed the required patches documented in MOS note 427577.1, “Enterprise Manager patches required for setting up Provisioning, Patching and Cloning (Deployment Procedures)”.  In that case you already have this patch installed and don’t need to read any further, but if not, read on.

After upgrading to EM12cR2, I also upgraded several databases from 10gR2 to 11gR2.  Months passed, and yet the patch recommendations EM12c gave me continued to refer to 10gR2 patches which I knew weren’t applicable as I was running 11gR2.  I tried several things, like setting EM12c to offline mode, to online mode, loading em_catalog.zip, re-running the various “Refresh From My Oracle Support” jobs, all without ever receiving fresh patch recommendations.

So to sort this out, I did what I usually do, and asked about it on Twitter.  Big thanks to Sudip Datta, Vice President of Product Management at Oracle, who pointed me to bug 14822626 and its associated patch.  The bug does not appear to be public, but MOS note 1522918.1, “12C – Patch Recommendations Not Updating After Upgrade To 12.1.0.2 Cloud Control – ‘…Patch Recommendations Computation is disabled … skipping …’” documents the problem as a known issue after upgrading from 12.1.0.1 to 12.1.0.2 and clearly matches the behavior I saw.

As soon as I applied patch 14822626, the old stale patch recommendations were cleared out, and once I loaded the current em_catalog.zip file, I had accurate patch recommendations for my environment that I can now use to make decisions going forward.

Thank you, Sudip!


Filed under: Cloud Control

How to get started with genetic genealogy

$
0
0

This is a departure from what I usually write about, but technically it’s also about databases: GEDCOMs and genetic ones. This post will cover a general strategy to get started doing your own genetic genealogy work. I appreciate any comments you may have. If anyone is interested, I may write future posts on suggested tools and other tips.

Briefly, genetic genealogy is the act of supplementing traditional paper genealogy with genetic information. By doing so you can extend your family tree further, find distant (sometimes extremely distant) relatives and help confirm the details found in your genealogy research. If you were adopted or have known NPEs in your line back a few generations, this may be the only way to track down your real ancestors.

Overview

  1. Do as much genealogy as you can on paper
  2. Get yourself, and possibly other close relatives tested, by one of the well known companies whose tests enable this work
  3. Make contact with your matches as identified by those companies
  4. Compare family trees with your matches
  5. Share your genetic ancestry data in other places to broaden the scope of potential matches
  6. Extend your tree with the results of research done by your matches on your shared lines
  7. Make more contacts and use your previously confirmed ancestors to triangulate on your unknown matches

Step 1: Do Genealogy

So many others have written so much about getting started with and getting better at genealogy that I’m not going to cover this step in very much detail here. Do a few web searches, read what others have to say, and check for “how to” articles on any commercial genealogy sites you join.

The best way, in my opinion, to get started with genealogy is to stand on the shoulders of giants. Someone in your family, maybe a grandparent or second cousin probably already does genealogy research and would be happy to share their data. But in case you can’t find someone like that or just want to get started on your own, here’s a little advice.

Make an account on ancestry.com. They simply have one of the best, easiest to use archives of vital records, wills, immigrant entries, military records, newspaper articles and so on. You can start with a 14 day free full access subscription and try to nail down as much as possible, then choose to subscribe or not depending on how much progress you’re making.

The mid term goal of this genealogy work is to produce a GEDCOM file, which is a database of people, their relationships, and source citations back to primary documents that confirm the relationship claims made in the file. You will then upload this file to various sites to share your research and help others find their match to you. You can optionally privatize the file so that people born after 1900 have their names hidden to avoid revealing information about other people that may not share your enthusiasm for finding your roots.

While you work on your genealogy, proceed with DNA testing, the next step, because it takes a while and you’ll be spending a while waiting for your results.

Step 2: Get tested

You have several choices for testing. The big three companies are 23andMe, FamilyTreeDNA and AncestryDNA, but several other options exist for specialized use. I highly recommend 23andMe, for reasons I’ll explain below, but I’ll give some information about each. All three are based in the USA so the longer your family has been in the US, the more matches you will find (see digression below).

23andMe

Simply your best choice. For the same price, $99, you will receive genetic information about your health at the same time you receive information useful for genetic genealogy. 23andMe has busy community forums covering health, ancestry and genealogy, but the best part for our purposes is that they test more markers than the other options (since the other companies specifically do not test anything implicated in human health) and you can download your raw genetic data and have it processed by FamilyTreeDNA for a lower fee than having FTDNA test you directly.

The 23andMe test is a saliva test. They will send you a kit including a tube, into which you spit about a teaspoon of saliva, close the top, snap the paraffin seal to release the stabilization/lysing buffer solution and then send it back in a prepaid package. Totally painless unless you have trouble producing saliva or you are trying to test an infant.

FamilyTreeDNA

The strong point of FTDNA is 23andMe’s weak point. You only sign up for FTDNA if you are interested in genealogy, but many of 23andMe’s users are only there for health information and have zero interest in genealogy or helping you to research yours. The other strong point is their “transfer family finder” service which allows you to upload your 23andMe data file to FTDNA for a better price than testing directly with them. You’ll still receive all the same matches and benefits as if you had tested there directly.

Further, FTDNA has some test offerings the others don’t provide. While 23andMe will test enough single nucleotide polymorphisms (SNPs) on your Y-DNA and mitochondrial DNA to assign a high level haplogroup, FTDNA provides full mitochondrial sequencing and Y-DNA short terminal repeat (STR) testing. The Y-DNA test can help confirm genealogy along your direct male ancestor line, but the mitochondrial sequence is relatively useless for this kind of genealogy. I’ve had a 67-marker Y-DNA STR test done along with a full mitochondrial sequence, plus the family finder transfer of my 23andMe data.

FamilyTreeDNA does provide a way for you to download your raw test results. Their test is done by scraping a cotton swab on the inside of your cheek.

AncestryDNA

They are the most recent new provider of these tests. I have not used their testing service so I have no first hand knowledge of it. As I understand it they will scan your tree to find genealogical matches with your DNA matches and simplify the process of identifying your common ancestors. This sounds great, and it may be the best choice for those who can’t invest much time in this work, but the downside is that Ancestry has many users who aren’t as careful about validating and sourcing the data in their trees as a serious genealogist needs to do. You really have to doublecheck your match’s work more carefully than on other sites. Being new, their database is currently the smallest of the big three, but it is growing rapidly.

AncestryDNA does support user download of their raw test result data file. As with 23andMe, their test is performed with a saliva sample.

A Digression

The quality and number of matches you will find on any of these sites depends significantly on your family background and the backgrounds of others who have elected to test. The majority of users on these sites are American, so if you are the second generation of an immigrant family, new to the US, you will find only a few matches. But if you can trace your lines to ancestors in the early US, you’re going to have hundreds or even thousands of matches. Or if you come from a highly endogenous population like the Ashkenazi Jews, you will have a lot of matches but they will be so far back in time you’ll have a lot difficulty finding on-paper genealogical links.

Step 3: Make contact

I should call this step “wait”, since no matter which company you use, it will take a few weeks or months to get your results back. Use this time to work on your family tree some more.

Once you do receive your results, the fun starts. If you don’t check your email very frequently or your results have been in a while, you may already have matches starting to contact you. FTDNA contacts are generally made directly through email to the address you share when signing up. For 23andMe users, you can send or receive a “sharing request”, which if accepted allows you and your match to compare your results to each other and your other matches with whom you have an accepted sharing request.

How do you find your matches? On FTDNA you go to the Family Finder Matches tool and review the list of names, their family trees, and the significance of your match. I’ll cover significance later. On 23andMe you go to the DNA Relatives tool and do the same thing, except most of your matches will have chosen not to reveal their name and family tree, so you’ll need to send them one of the sharing requests I mentioned and hope they accept. I imagine the process on AncestryDNA is both similar to and different from the way it works elsewhere.

Discuss your background with your matches and find out what surnames, locations, or other details your families may have in common. You may find a connection immediately, or there may be nothing obvious. File all this information away for later because you never know when you or they will update their family tree and your connection will suddenly be staring you in the face.

What does a match mean anyway?

The simple answer is that they share a portion of your DNA, based on both of you having inherited that portion from a common ancestor. The significance of the match is generally evaluated in terms of four variables:

  1. How many segments? A person with you match five segments on five different chromosomes is likely to be a much closer relative than someone with whom you match one segment on one chromosome.
  2. How long is the match? You measure the length of a match by examining the start and end positions on the chromosome where a segment matches. A match may be, for example, from position 16 million to position 50 million on chromosome 12. The longer the match, the closer it generally is, but see below.
  3. How densely tested is the match region? This is reported as a SNP count, the number of consecutive polymorphisms you share with your match on a segment. The more SNPs tested on a matching segment, the closer it generally is, but see below.
  4. How variable is the genomic region where you matching segment exists? Fortunately you don’t have to calculate this yourself. 23andMe and FTDNA will give you a number to represent this value for your matches. The variability of the region, combined with the length of a match and the number of tested SNPs all combine to give you a number of centiMorgans (cM) representing the significance of your match. Researchers disagree on how many cM a matching segment should have to be useful for genealogy, but bigger is definitely better. 5cM and 7cM are common minimum cutoffs. Anything larger than 10cM is quite useful in my opinion.

Long Technical Digression

The detailed answer is much more complex. Feel free to skip this part. I’m skipping over some details but what I’ve described below is accurate enough for genetic genealogy.

Each of our DNA sequences is unique, unless you have an identical twin. Our DNA is composed of 23 chromosomes, and we all have two of each (except in cases like trisomy where an individual has a third copy of a chromosome). One copy of each chromosome is inherited from your father and the other copy is inherited from your mother. Chromosomes 1-22 are the autosomes, while chromosome 23 is the sex chromosome. Women have two copies of the X sex chromosome, designated XX, while men have one copy of the X and one copy of the Y chromosome, designated XY.

Now, when you inherit one copy of each autosome from your two parents, you don’t inherit an exact copy. The autosomes split and recombine. To give an example, you have two copies of chromosome 1. One copy may have only one third of the genetic sequence come from your father’s chromosome 1, with two thirds of your mother’s chromosome 1. But your other copy of chromosome 1 may then have one third from your mother and two thirds from your father. Which of those two copies your child inherits will determine how much they received on chromosome 1 from your mother versus your father. Repeat this over many generations, and sequences break up and rejoin repeatedly over time. Because of this, the fundamental unit of genetic genealogy is the “half IBD segment”, which means “half identical by descent”. The half signifies that half of the segment — the half from one of your chromosomes, but not the other — is identical to one of someone else’s chromosomes, and that the segments being identical is due to both of you having inherited them from a common ancestors. The alternative is an “IBS”, or “identical by state” segment, in which case you and this other individual happened to randomly inherit sequences that match, but did NOT come from a common ancestor. You can’t easily identify these false positives in advance, so some proportion of your matches will be type 1 errors like this. You won’t ever find that match.

It gets even more complicated though. The commercial testing companies generally do not phase your genetic data. Instead they report the results of your SNP test at a position from both copies of your chromosomes, but they cannot tell if a given sequence of consecutive SNPs came from copy A or copy B of your chromosome. This will also contribute to false positive matches. There are ways around this, and if you phase your data you will have much better results with genetic genealogy. To phase your data you need to have both of your parents tested with the same test you take. That will allow comparison of your father’s DNA to yours, and your mother’s to yours, and you will have a much more accurate vision of your DNA. There are tools online to automate the process for you (such as GEDMatch), but you need to have at least one parent tested. Two are even better.

Unlike the autosomes, the sex chromosomes (X and Y) are inherited nearly unchanged from each parent. With detailed Y-DNA testing you can compare your direct male ancestor line back thousands of years. My Y-DNA test helped confirm that my male line descends from Pierre Paradis (1604 – 1675), of Montagne-au-Perche, France, who immigrated to Quebec in 1651, even though my genealogy on that line hits a brick wall with my fifth great grandfather Henry H Paradis, born around 1847 in Riviere-du-Loupe, Quebec. See this link on Paradis history if you’re interested in the line.

For various reasons, particularly the fact that women inherit one X from their mother and one X from their father, the X chromosome is not as useful for genetic genealogy as the Y chromosome. It does not travel an unbroken line of the same sex like the Y does.

Mitochondrial DNA on the other hand is passed only along the maternal line. Whether male or female, you inherited it from your mother. Unfortunately mitochondrial DNA changes so slowly that even if someone has an exact match to your full mitochondrial sequence, that could still be 20 generations back and extremely difficult to find. My mitochondrial haplotype, U2e1* points to early European ancestry and then further back to the Indian subcontinent but this is somewhere along the lines of 5000+ years ago and not useful for what I’m trying to do.

Complicating this further, we’re all related to each other somewhere. The hope is that you find people related closely enough that you can identify your genealogical link. But if, for example, you are of European descent, there’s a better than 95% chance that you descend from Charlemagne, probably along several lines (he was my 38th, 39th, and 40th great grandfather — yours too). Or if you trace back to early Quebec settlers, then you are probably related to 95% of French-Canadians.

Step 4: Compare family trees

I believe AncestryDNA does this for you automatically which is a huge point in their favor. Otherwise you need to review your matches’ surname lists and compare them to yours to find your common link. Sometimes this is easy, if you’ve both done a lot of genealogy work, and sometimes it’s difficult, like if one of you was adopted or has large gaps in their tree, or simply hasn’t done much genealogical research. There are some third party ways to simplify this process which I will get to later.

Step 5: Share your ancestry information

The easiest thing to do here is make sure you fully fill out your user profile on the testing site you use. This will help your matches to do some of the matching work for you, and make them more likely to get in contact with you.

The best thing you can do, though, is upload your raw data to GEDMatch. This is a third party tool run by volunteers for free (they accept donations if you find it useful) that allows users from 23andMe, FTDNA and AncestryDNA to all put their data in one place so that you can compare across vendors. Otherwise you can never be sure if this one guy on FTDNA that you match also matches this one woman from 23andMe and so on.

I can’t reiterate enough how useful GEDMatch is, and how much you’ll help other genetic genealogists by uploading your data there. The service they provide is in many ways superior to that offered by the commercial testing companies. They also support uploading your GEDCOM and doing the family tree matching for you, but that feature is unavailable for now due to the huge influx of data submitted recently. It will be back someday. Once you’ve used it it is tough to do this work without it.

Step 6: Extend your tree

If you’re lucky you’ve been able to identify common ancestors with some of your matches by now. Look through their trees, and if they have any details about your ancestors that you don’t, add them to your tree. If they have the line traced back farther, extend the line in your tree. Add the other descendants of your common ancestors to your tree. You’re related to them, if only distantly, and having those surnames in your tree may help you track down your other matches.

I’ve confirmed via paper genealogy matches as close as third cousins and as far back as ninth cousins. I have documented ancestors going back to early New World settlers so that means I have a LOT of matches and finding the link with other people that have old confirmed lineages eventually gets quite easy. But there are many more people who descend from these early settlers than there are people that can document their ancestry back to them, so sometimes it can be frustrating.

My easiest matches go back to colonial days in the US, particularly some of the early Connecticut settlers like Eleazer Beecher and Phebe Prindle. Early Quebec settlers like Nicholas Pelletier and Jean de Vouzy are another great source for confirmed matches. I also have some large clusters from early French settlers in Louisiana, as well as Quebec French who immigrated to Louisiana later.

As a reference point, I am sharing with nearly 100 matches on 23andMe. I have confirmed genealogical ancestry with somewhere around ten of them. Your results will vary. One of my most recent matches had a detailed family tree and I found our ancestors in 1780s Louisiana after only about ten minutes of work. I was the first person she shared with, so while I only have a 10% success rate she’s at 100%.

Step 7: Triangulate!

The only way to do this is to share with as many people as possible on 23andMe, manually collate your matches from FTDNA or use GEDMatch. Share with people even if you see no obvious connection besides your matching segment. As you accumulate matches, you will eventually discover multiple people that you match in the same region of the same chromosome.

Once you have a list of two or more people you match in the same region, compare them to each other. If you match person A at a particular region, and you match person B at the same spot, compare A to B. If they match each other at the same spot, congratulations. All three of you very likely share a common ancestor. If A and B do not match each other, then most likely you match A on the copy of the chromosome you inherited from your mother and you match B on the other copy, inherited from your father, so that can help you track down the common ancestor you have with each, even though A and B are not related.

Where it gets really interesting is when you have a cluster of several people that all match you and each other but stubbornly resists identification. Then you find a new match who matches all of them, and you find your common ancestor with this new match based on the quality of their genealogical research. That allows you to positively assign a spot in history to the rest of your cluster and may help with future identification. This was the case for me with the recent Louisiana match I mentioned. This match was on a cluster including a woman in Italy that had only one known ancestor who went to the US. We were quite sure our match was somewhere along this American immigrant’s line, but since my new match places a portion of this segment in 1788 Louisiana, that means my match with the Italian woman is back older than that, likely somewhere in France, Germany or Luxembourg in the 1600s or earlier, based on the ancestors of this specific Louisiana settler family.

I’m planning another blog post later on ways to leverage the clusters you’ve identified using 23andMe’s Ancestry Finder tool and GEDMatch. The method will be obvious to anyone who has done this a while but I haven’t seen anybody wrote it up yet.

Additional Resources

Here are links to the companies and sites I’ve mentioned along with a few other reference materials on genetic genealogy.

Disclaimer

Other than the 23andMe referral link, I have no employment relationship with any of the sites mentioned or linked, nor have I received any compensation for this post. I am a happy user/member/reader of many of the sites and I will get only the indirect benefit of having your DNA tested and potentially matched to mine.


Filed under: Genetic Genealogy

Using EM12c Compliance Rules, Standards, and Frameworks

$
0
0

I recently reviewed SAP note 740897 and discovered that the application-specific full use license SAP customers receive when they purchase the Oracle database through SAP includes the Database Lifecycle Management Pack.  This means I can make use of, among other things, the compliance checking capabilities provided by Oracle Enterprise Manager 12c.

Many of the posts I put up here serve as “how to” documents, explaining how I do something so that others can decide how they would like to do something.  This post is slightly different.  I will be describing how I currently use the compliance rules, but in addition to simply providing a “how to”, this is more of a plea for anyone who finds this to tell me how this can be done more easily and efficiently.  The compliance functionality in EM12c appears to be much more configurable than that provided by EM11g, but one key piece that existed in EM11g appears to be gone. That key piece is the ability to ignore/suppress a particular key value from a compliance check. I would love to have someone tell me that I’m just not finding that function in EM12c.

As I recall, in EM11g, when you had compliance checks enabled you could ignore a single key value.  As an example, perhaps you had the rule to flag users with access to select from DBA_* views. That is great, except that my account has the DBA role, so my account appeared as a violation.  But I had the ability to ignore any violations on that rule where the key value was my account name.  This does not seem to be the case with EM12c.  Hence this post, where I describe how I’m achieving similar functionality in a very different way, hoping someone else knows a better way to do it.

Getting Started

The first step to using the EM12c compliance functionality for your databases is to have a license for the Database Lifecycle Management Pack.  If you don’t have one already, contact your Oracle sales representative.  Note that if you purchased your licenses before Oracle 11g was released, you may have a license to some retired management packs such as the Configuration Management Pack, Change Management Pack, or the Provisioning and Patch Automation Pack.  These three legacy packs combined seem to provide most/all of the functionality included in the Database Lifecycle Management Pack and according to the EM12c documentation grant you a license to use the functionality provided by the Database Lifecycle Management Pack.  Don’t take my word for it, review the Oracle Enterprise Manager Licensing Information document, particularly sections 2.3, 2.6, 2.7 and 2.8, then consult with your sales contact if you have questions.

Once you have confirmed your entitlement to use this feature, enable the Database Lifecycle Management Pack in EM12c as follows:

  1. Login to EM12c as the repository owner (SYSMAN)
  2. Navigate to the Management Pack Access screen via the Setup menu, then the Management Packs submenu
  3. If not selected already, select the “Target Based” Pack Access radio button
  4. If not selected already, select “Database” from the search drop-down
  5. Click the Go button
  6. Check the box in the Database Lifecycle Management Pack column for each database where you have this pack licensed and then click the Apply button
Management Pack Access screen

Management Pack Access screen

This setup step enables the compliance functionality, but to make use of it you will need to first enable collection of some additional information about your databases, then “attach” your database targets to a “compliance standard”.

Collecting Data Needed For Compliance Monitoring

Presumably to reduce load on systems where people don’t use the compliance functionality, EM12c does not collect the information needed to make full use of the compliance standards out of the box.  You need to enable this collection this.  To do so:

  1. Click on the Enterprise menu, then the Monitoring submenu, then Monitoring Templates
  2. Check the box next to “Display Oracle Certified templates”
  3. Click the Go button
  4. Select the radio button next to “Oracle Certified-Enable Database Security Configuration Metrics”
  5. Click the Apply button
  6. On the next page, click the Add button to select the database targets for which you will use the compliance functionality
  7. Click the OK button
  8. Repeat these steps for the “Oracle Certified-Enable Listener Security Configuration Metrics” and your listener targets if you intend to monitor listener compliance
Applying out-of-box templates to enable security configuration metrics

Applying out-of-box templates to enable security configuration metrics

Compliance Frameworks vs Compliance Standards vs Compliance Rules

EM12c uses a three-tier approach to compliance monitoring.  For a full understanding of how this works you should read the Oracle Enterprise Manager Cloud Control Oracle Database Compliance Standards documentation, but to summarize it briefly a compliance rule checks a particular compliance item (like permissions on a certain file, or a specific database role), while a compliance standard groups multiple compliance rules into a set to which you then attach the targets you want to have monitored.  A compliance framework then groups multiple compliance standards into a superset for reporting/auditing purposes.  This gives you a single view of your overall compliance when you have multiple compliance standards applying to different target types, as a compliance standard only applies to one target type — that is, you use a separate compliance standard for your listeners than for your databases, but you then include both standards in your compliance framework for a view of your entire environment.  EM12c comes with a large number of pre-buitl compliance rules, standards and frameworks which you can use as-is if you wish, but read on to find out why I prefer to customize them.

Working With Compliance Standards

To get started with compliance standards, click the Enterprise menu, then the Compliance submenu, and then click on Library.  This will take you to a screen with tabs to move between compliance frameworks, standards and rules.  For your first foray into compliance checking, start with one of the simpler Oracle-provided templates, like the “Storage Best Practices for Oracle Database” applicable to Database Instance targets.  To find it, click on the Compliance Standards tab, then the little triangle next to the word “Search” at the top of the screen.  Type “Storage Best Practices” into the Compliance Standard field, and select Database Instance from the Applicable To drop down, then click the Search button.  Once you see that standard on your screen, click on that row of the table (NOT the name of the standard), then click the “Associate Targets” button.  This will bring up a screen where you can then click the ‘Add’ button to select one or more of your database instances to attach to the standard.  After adding a target, click the OK button.  One more pop up window will appear asking you to confirm that you are ready to deploy the association, go ahead and click Yes on this screen.

Searching for a compliance standard and associating targets

Searching for a compliance standard and associating targets

You now have at least one target associated to a compliance standard.  So what now?

Viewing Compliance Results

Once you have a target associated to a compliance standard, the main Enterprise Summary page will show an overview of the compliance check results along with a list of your least compliant targets.

Compliance region on Enterprise Summary page

Compliance region on Enterprise Summary page

The Compliance Summary region also has a Compliance Frameworks tab which provides another way of viewing the same information — further down I will cover how to set up a framework.

Compliance Summary region, Compliance Framework tab on Enterprise Summary page

Compliance Summary region, Compliance Framework tab on Enterprise Summary page

For another view, you can also use the Compliance Dashboard, through the Enterprise Menu, Compliance sub-menu, and then clicking on Dashboard.

Compliance Dashboard

Compliance Dashboard

Compliance violations are grouped into minor warnings, warnings, and critical violations, based on the configuration of each compliance rule contained in a standard. Depending on your needs, you can change the significance of a violation as appropriate for your environment.  I will cover this later as well.

To get some more information about the specific violations Enterprise Manager has found, click on the name of your compliance standard from one of those screens and you will see some more details about what is contained in the compliance standard and the status of your targets.  For further detail, click on the name of a compliance rule on the left-hand side.  Pardon the blurred text in these images, I have already customized some rules and standards and included my employer name, which I highly recommend doing to distinguish your customizations from the out-of-the-box configuration.

View of compliance standard check details

View of compliance standard check details

Drill down into compliance rule details

Drill down into compliance rule details

This page shows that of the three database instances I have associated with this compliance standard, I have only one violation, and that violation is a minor warning associated with the “Non-System Data Segments in System Tablespaces” compliance rule.  Because SAP requires that users create some particular segments in the SYSTEM tablespace, this is a good one to work through as an example to show how to customize compliance monitoring to fit your environment.

Customizing Compliance Monitoring

There are a few different ways to customize your compliance monitoring beyond the high-level decision of which specific targets you associate to each specific standard.  One way is to create your own compliance standards, selecting and excluding the compliance rules that are not relevant in your environment — this way, for example, you can complete disable the check for “Non-System Data Segments in System Tablespaces” if you choose to (I wouldn’t, but you might want to).  Another way is to customize the specific compliance rules contained in your compliance standards.  I do both.

I highly recommend not attempting to edit any of the Oracle-provided compliance frameworks, standards, or rules.  The “Create Like” button in the compliance library will be very helpful to you here.

The "Create Like..." button is your friend

The “Create Like…” button is your friend

First create your own compliance standard by selecting an existing one (I’ll continue to demonstrate this with the “Storage Best Practices for Oracle Database” standard) and clicking on the “Create Like…” button.  EM will prompt you to provide a name for the new standard.  For simplicity I prefer to use some indicator like my employer’s name followed by the name of the original standard.  Click Continue once you have named your new standard and you will proceed to the compliance standard editing page.

Here you specify the rules to include or exclude from your compliance standard

Here you specify the rules to include or exclude from your compliance standard

From this page you can add or remove compliance rules from your newly-created compliance standard.  To remove a rule, right-click on it in the region on the left and choose “Remove Rule Reference”, then click OK.

You can remove individual rules or groups of rules from this screen

You can remove individual rules or groups of rules from this screen

The rules in the pre-defined standards are grouped into “rule folders”.  Instead of removing a single rule, you can remove an entire rule folder if you wish by right-clicking and selecting “Remove Rule Folder” and then clicking OK.  You can also create a new rule folder by right-clicking on the name of the compliance standard on the left and selecting “Create Rule Folder”, providing a name, then clicking OK.

Add or remove rule folders to group compliance rules

Add or remove rule folders to group compliance rules

The compliance standard we’re working with has only a few rules.  If you wish, you can add one of the many other rules that are contained in other compliance standards.  Right-click on the compliance standard name or a rule folder, and select “Add Rules”.  A screen will appear allowing you to select one or more rules to add to the standard.  You can scroll through to select your rules or search by name or keyword.  Once you click OK, the selected rule(s) will be added to your compliance standard.

Select as many rules to add to your standard as you wish

Select as many rules to add to your standard as you wish

The compliance standard editing screen is also where you can change the importance of a compliance rule violation.  To change the importance of the “Insufficient Redo Log Size” rule from “Normal” to “High”, click on that rule, then the drop-down box next to “Importance” and select a new value.

I guess "Low", "Medium" and "High" correspond to "Minor Warning", "Warning" and "Critical"

I guess “Low”, “Normal” and “High” correspond to “Minor Warning”, “Warning” and “Critical”

Finally, click the Save button to save your new compliance standard.  At this point your new standard will not have any targets associated with it, so you should click on it and then on the “Associate Targets” button to do so.  You may also wish to remove the association of those targets with the original standard you used to create this new standard.  Once you finish in this screen, you can return to the Enterprise Summary or Compliance Dashboard, refresh the page, and you should see the results of the checks run by this new rule.

Changing A Compliance Rule

That is all useful, but what if you want to change the actual details behind a rule?  I want to get eliminate the complaints about non-system data segments in the system tablespace so that I don’t see any more violations for the SAP-required segments I have in there, but I don’t want to remove the entire rule because I do want to be notified if other segments show up in there that I wasn’t aware of.  The solution is create a new rule based on the rule you want to change, edit it (finally we get to write some SQL) and then remove the old rule from your compliance standard and replace it with the new rule.

Go back to the Compliance Dashboard and click the Compliance Standard Rules tab.  Open up the search widget and search for “Non-System Data Segments” for target type “Database Instance”.  Click on the offending rule and then the “Create Like” button.

The lock icon shows that you can't edit the default rules but you can duplicate them

The lock icon shows that you can’t edit the default rules but you can duplicate them

Provide a title for your new rule following whatever scheme you like.  I will call it “DEMO Non-System Data Segments in System Tablespaces”.  Click Continue and you will see the edit screen for Compliance Standard Rules.

You can change the text here if you wish, or add keywords

You can change the text here if you wish, or add keywords

Click Next to go to step 2 where you can edit the rule SQL.

Finally, SQL!

Finally, SQL!

This screen allows you to edit the rule SQL.  If you aren’t familiar with the EM12c repository, this can be difficult.  I recommend pulling up a SQL*Plus window connected to your repository database as SYSMAN, then copy/pasting the SQL text into the query window so that you can see the results that it returns.  In my case I want to exclude violations for the “SAPUSER” table that SAP requires us to create in the SYSTEM tablespace, so I just add the text “and OBJECT_NAME not like ‘%SAPUSER%’” to the end of the SELECT statement.

Anything you can do in SQL, you can do here

Anything you can do in SQL, you can do here

Click Next once you have edited the SQL to your liking.  This will bring you to a new screen where you specify the key values and violation conditions.  This is one of the clunky parts of working with compliance rules, because the pre-defined violation condition is lost when you “Create Like” on a built in rule.

What now?

What now?

If you just proceed with finishing the rule from here, you’ll have a problem.  Every single segment in the SYSTEM and SYSAUX tablespaces will be flagged as a violation.  You need a where clause.  But what should it be?  What was it in the original rule?  Here I typically open up a second browser window, navigate to the original rule in the Compliance Library, click the “Show Details” button and then scroll down to the bottom, which brings up the following screen:

At least there's a way to get the configuration of the original rule

At least there’s a way to get the configuration of the original rule

The lucky part here is that, even though the area is grayed out, you can select and copy the text from the original rule’s where clause, then paste that into your new rule’s where clause, as shown below.  I’ve also checked the “Key” checkboxes for TABLESPACE_NAME, OBJECT_OWNER, and OBJECT_TYPE, because I suspect (but haven’t yet confirmed) that these key values determine how many individual violation events you will receive.

You can always re-edit this later if you don't get it perfectly right the first time

You can always re-edit this later if you don’t get it perfectly right the first time

Once you click Next on that screen you’ll be presented with step 4, where you can test your new compliance rule against a specific target.  You can type in the target’s name or click the magnifying glass to select the target, as with the other target selection screens in EM12c.  Click Run Test after you have selected and target and confirm that the results you see are the results you wanted.

Run tests against all your targets one at a time to see what will happen

Run tests against all your targets one at a time to see what will happen when your rule goes live

If you are satisfied with the test results, click Next.  Otherwise click Back and try again with your SQL code and where clause.  Once you click Next you will see step 5, which is just a summary page displaying your rule’s details.  Click Finish when you are done.

All done, can I go home now?

All done, can I go home now?

Now that you clicked Finish, your new compliance standard rule is saved in the repository and available for use.  You will need to attach it to a compliance standard, as described above, before it will do anything useful, and you probably want to detach the original rule that you used as the source to create this one.

Repeat these steps for every rule you wish to edit.  This is the part I referred to at the beginning of the post where I hoped someone can suggest a better way.  As I recall, in EM Grid Control 11g, an admin could simply select a specific compliance violation and choose to suppress it for that key value with a couple of clicks, as compared to this long process needed to duplicate and edit a rule.  EM12c compliance rules are very customizable, just not quite as easy to work with — sort of like incident rules and notifications.  You need to learn a new way of doing things, but it can do a lot.

Creating A Compliance Framework

Finally, you should create a custom compliance framework.  This follows essentially the same process as creating a standard and attaching rules, but instead you create a framework and attach standards.  Go to the Compliance Frameworks tab on the Compliance Library page and click “Create”.  Give your framework a name and click Continue, and the Compliance Framework edit screen should look familiar.

Where have I seen this before?

Where have I seen this before?

Right-click on the compliance framework’s name in the left bar, and select “Add Standards”.  A screen will pop up from which you can select the standards you created previously, just like when you add a rule.  You can also add standard subgroups, which work much like rule folders.  Click on your new standards and then OK.

Easy enough, right?

Easy enough, right?

Click Save and you’ll be returned to the framework tab.  At this point your new framework is in “Development” state, and you will NOT see it in the Enterprise Summary page.  Click on the framework, then click “Edit”.  Change the Compliance Framework State to Production and click Save.

Finally done!

Finally done!

You’re done!  You now have a custom compliance framework, one or more custom compliance standards within that framework, and several rules in your standards, including some you have edited to meet your needs.  Go back to the Enterprise Summary page, wait a minute or two, click the refresh button and then admire your work.

Time for a cold beer...

Time for a cold beer…

Conclusion

The compliance functions in EM12c are extremely customizable and capable.  There are a some rough spots where I prefer EM11g’s functionality, and a couple spots where I need to open another browser window or SQL*Plus connection to get things set up the way I want, but that’s a small inconvenience compared to their power.

So now that you have these compliance evaluations staring you in the face every time you visit the Enterprise Summary page, get to work fixing those violations!


Filed under: Cloud Control, Database

My Production EM12c Upgrade From R2 (12.1.0.2) to R3 (12.1.0.3)

$
0
0

This post covers my production upgrade from EM12c R2 to EM12c R3 on Linux x86-64 (SLES11 SP2).  To stress test the upgrade and keep it interesting, this system also has BI Publisher integrated into OEM, and also has the plugin from NetApp (version 12.1.0.1.0), the VMware plugin from BlueMedora (version 12.1.0.5.0), and the MySQL plugin from Pythian (version 12.1.0.1.1).  The repository database is running on version 11.2.0.3.

I’m feeling lucky today.  So this is just going straight into production.  Famous last words…

Preparation

  1. Go to edelivery and search for the Oracle Enterprise Manager product pack and platform Linux x86-64
  2. Follow the link titled “Oracle Enterprise Manager Cloud Control 12c Release 3 (12.1.0.3) Media Pack for Linux x86-64
  3. Download the three files needed for EM12c R3: V38641-01.zip, V38642-01.zip, and V38643-01.zip
  4. Download patch 13349651 for WebLogic, you will need it during the post-upgrade steps
  5. View the digest and run md5sum against each downloaded file to confirm that the files downloaded correctly
  6. Transfer the EM12c R3 files to a staging area on your Oracle Enterprise Manager server and unzip all three of them
  7. Delete the three downloaded .zip files if you are short on space (but don’t just “rm *.zip” or you’ll remove the necessary WT.zip file at the top level of your staging directory)
  8. Review the upgrade guide
  9. Create backups of your current OMS home, agent home, software library and Oracle inventory
  10. Create a backup of your current repository database that you can restore from if necessary
  11. If you use a dedicated filesystem for your Oracle Management Agents, make sure that dedicated filesystem has enough free space.  I used a small, 2GB filesystem, and this was barely large enough, except for the one sandbox server where it was too small and I was not able to complete the agent upgrade without adding space
  12. Stop the BI Publisher WebLogic Server (if you have it installed) via the WebLogic Admin Console
  13. Make sure your repository database does not have snapshots created on any tables by running “select master, log_table from all_mview_logs where log_owner = ‘SYSMAN’” — if any snapshots are found, follow the instructions in the upgrade guide to drop them
  14. If your repository database is version 11.1.0.7 or 11.2.0.1, follow the steps in the upgrade guide to apply the prerequisite patches needed to proceed
  15. Copy the emkey from the existing OMS to the existing management repository by running “$OMS_HOME/bin/emctl config emkey -copy_to_repos” and enter the SYSMAN password when prompted
  16. Confirm the emkey was copied by running “$OMS_HOME/bin/emctl status emkey” and enter the SYSMAN password when prompted
  17. Stop every OMS in your environment by running “$OMS_HOME/bin/emctl stop oms -all”
  18. Stop the management agent monitoring the management services and repository target by running “$AGENT_HOME/bin/emctl stop agent”

Running The Upgrade

Due to some issues I had with the upgrade from EM12c R1 to R2, I highly recommend that you do NOT use Cygwin to ssh to your OMS host and display the installer over ssh using Cygwin’s X server.  Use VNC instead.  I’m not going to try Cygwin this time through.

First, I start vncserver on the OMS host.  Then I connect to it to using TightVNC from my desktop machine.

  1. Navigate to the staging directory where you unzipped the EM12c R3 distribution files and run “./runInstaller” as your oracle software owner
  2. As an SAP customer, we are not allowed to use OCM, so I skipped the steps involving entering my My Oracle Support credentialsStep 1
  3. I also skipped the search for updates as this is a new enough release at the moment there should not be any necessaryStep 2
  4. The prerequisite checks run.  I received a warning about libstdc++43-4.3 not being found, but libstdc++43-devel-4.3.3-x86_64 fulfills this need, so I click ignore and then Next to continueStep 3
  5. Only the one system upgrade is supported when upgrading from EM12cR2, so I selected “Upgrade an existing system”, “One system upgrade”, and my existing middleware home.  Click NextStep 4
  6. This out of place upgrade will go into a new middleware home.  I am using /oracle/oem/Middleware12cR3.  Click Next and the installer will confirm that you have enough free space availableStep 5
  7. Here I had to pause and request more space from my storage admin, as the installer wants at least 14.0GB of free space.  Once that was done, I proceeded
  8. Enter the SYS and SYSMAN passwords and check the box to confirm that you have backed up the repository (you should have your OMS, agent, etc all backed up as well), then click NextStep 6
  9. The installer will check various parameters on your repository database and offer the chance to fix them if any need to be changed.  I accept the fixes and click YesStep 6(b)
  10. The installer checks some additional settings and notes that they should be reviewed after the installation or fixed now.  I explicitly granted execute on DBMS_RANDOM to DBSNMP and then clicked OKStep 6(c)
  11. The installer lists the plugin versions that will change and the plugins that will migrate. Confirm this all looks right and then click NextStep 7
  12. The installer lists additional plugins you can choose to deploy at install time.  I do not use any of these so I left them all unchecked and clicked NextStep 8
  13. The installer requests the password for your WebLogic adminserver and confirmation of the hostname, port and username.  Provide the password and click Next.  You may be able to change the OMS instance directory here but I do not suggest doing soStep 9
  14. You now have a chance to review your settings, then click Install to proceed with the upgradeStep 10
  15. Installation proceedsStep 11
  16. You are then prompted to run the allroot.sh file.  Login to the server and execute it as root or via sudoallroot.sh
  17. Once the install/upgrade is complete, the installer will display an installation summary.  Review it, save the URLs it gives you for the OMS and adminserver, then click CloseUpgrade Summary Report

Overall, the upgrade installation steps took 1 hour and 15 minutes in my environment.  This was on a physical server with 126GB RAM, 16 dual core processors and 200 managed targets.  This does not include the post-upgrade steps shown below.

Post Upgrade Steps

Now that the upgrade is complete, return to the upgrade guide to complete post upgrade steps.  Your environment may differ from mine, but these are the steps I had to follow.

  1. Start your central agent by running “$AGENT_HOME/bin/emctl start agent”.  At this point the load on my system went up very high and began responding very slowly.  I walked away for 10 minutes to let things settle down
  2. Open your web browser and go to the URL provided at the end of the installation and login as SYSMAN.  When I first tried to do so using Firefox I received an error indicating an invalid certificate.  I had to delete the old certificates and authorities from my previous installation and restart Firefox before it would allow me in.  MSIE worked fine though
  3. Update the central agent (the management agent installed on the OMS host) by clicking on the Setup menu, then Manage Cloud Control, then Upgrade Agents.  Click the Add button and select your central agent.  I choose “Override preferred credentials” since I have not configured sudo.  Click Submit to continue, then OK when warned that you may have to run root.sh manuallyUpgrade Central Agent
  4. My first upgrade attempt on the central agent failed due to a prerequisite check for package libstdc++-43.  The easy thing to do here is expand the Additional Inputs region and provide “-ignorePrereqs” as an additional parameter, but I chose to complete this agent upgrade using emcli and describe that process in the next two steps
  5. First run “$OMS_HOME/bin/emcli login -username=sysman” and enter the SYSMAN password when prompted.  Then run “$OMS_HOME/bin/emcli sync”
  6. Upgrade the agent by running “$OMS_HOME/bin/emcli upgrade_agents -additional_parameters=”-ignorePrereqs” -agents=”hostname.domain.com:3872″
  7. Wait while the upgrade proceeds.  You can view upgrade progress by running “$OMS_HOME/bin/emcli get_agent_upgrade_status”, or in the GUI by clicking on “Agent Upgrade Results” in the “Upgrade Agents” pageAgent Upgrade Progress
  8. Click Done once the central agent upgrade completes.  Go to the new agent home and run root.sh as root or via sudo
  9. Then repeat this process for the rest of your agents.  Try to install them first WITHOUT using the “-ignorePrereqs” flag, because if there are missing prerequisites you need to identify the issue and find out if it is something that it is appropriate to ignore, as the libstdc++ version was in my case.  Execute root.sh for each agent afterwards, unless you have sudo configured in which case it will happen automatically
  10. Two of my agents that run on different platforms could not be upgraded right away.  The new versions of the agent software needed to be downloaded from Self Update.  I am skipping them for now
  11. Next, apply patch 13349651 to WebLogic, following the instructions in the README file.  I attempted to do so, but the patch was already installed so I skipped this step
  12. There are a few other optional, post-upgrade steps like deleting obsolete targets.  These are documented in the upgrade guide and I will not note them here
  13. As a final step, make sure you update your Oracle user’s environment variables to reflect the new middleware home, OMS home, agent home, and so on

Conclusion

At this point my EM12cR3 production upgrade is complete!  Everything I have checked so far appears fully functional.  The only problem I had was a small filesystem for the management agent on my sandbox server causing the agent upgrade to run out of space, forcing manual intervention to resolve.  Don’t be stingy with space like I am and you should be fine.

I haven’t taken any time to investigate the new features yet, but I will be now.

(EDITED TO ADD: I forgot to mention the steps to get BI Publisher working again.  Please refer to the EM12cR3 Advanced Installation and Configuration Guide, chapter 15.  Essentially you will need to perform a software-only installation into the new middleware home, then execute the configureBIP script with an -upgrade flag to complete the BIP setup.)


Filed under: Cloud Control

How to connect to the default EM12c R3 self-signed WebLogic SSL port with WLST

$
0
0

After upgrading to Oracle Enterprise Manager 12c R3, I decided it was time to get roles configured properly for BI Publisher so that I can use it under my regular account rather than only permitting SYSMAN to access it.  Adeesh Fulay (@AdeeshF) helpfully provided me with a link to the documentation about setting up BI Publisher for EM12c.  The first step to perform the configuration involves connecting to the secured WebLogic adminserver via wlst.sh, but I immediately encountered an error:

wls:/offline> connect('weblogic', 'password', 't3s://host.domain.com:7103')
Connecting to t3s://host.domain.com:7103 with userid weblogic ...
<Jul 19, 2013 9:41:15 AM EDT> <Warning> <Security> <BEA-090542> <Certificate chain received from host.domain.com - x.x.x.x was not trusted causing SSL handshake failure. Check the certificate chain to determine if it should be trusted or not. If it should be trusted, then update the client trusted CA configuration to trust the CA certificate that signed the peer certificate chain. If you are connecting to a WLS server that is using demo certificates (the default WLS server behavior), and you want this client to trust demo certificates, then specify -Dweblogic.security.TrustKeyStore=DemoTrust on the command line for this client.> 
Traceback (innermost last):
  File "<console>", line 1, in ?
  File "<iostream>", line 22, in connect
  File "<iostream>", line 648, in raiseWLSTException
WLSTException: Error occured while performing connect : Error getting the initial context. There is no server running at t3s://host.domain.com:7103 
Use dumpStack() to view the full stacktrace

I could not find any obvious reference in the documentation on how to add the “-Dweblogic.security.TrustKeyStore=DemoTrust” options on the command line.  I attempted to just run wlst.sh with that parameter but I also received an error.

After a little searching I found a fix and figured I would post it.

In the documentation for the WebLogic 10.3.6 Oracle WebLogic Scripting Tool, section “Invoking WLST”, an example is included where it shows how to provide a different command line option to the WLST tool, by setting the environment variable CONFIG_JVM_ARGS. (EDITED 20130719: Adeesh has let me know that the preferred environment variable to use for this string is WLST_PROPERTIES, not CONFIG_JVM_ARGS.  Both work at the moment, but the documentation will be updated to refer to WLST_PROPERTIES so I advise you to use that one.)

I tried that before making my wlst.sh call, and everything worked successfully:

oracle@host:~> export WLST_PROPERTIES=-Dweblogic.security.TrustKeyStore=DemoTrust
oracle@host:~> /oracle/oem/Middleware12cR3/oracle_common/common/bin/wlst.sh 
[...]
Initializing WebLogic Scripting Tool (WLST) ...

Welcome to WebLogic Server Administration Scripting Shell

Type help() for help on available commands

wls:/offline> connect('weblogic', 'password', 't3s://host.domain.com:7103')
[...]
Successfully connected to Admin Server 'EMGC_ADMINSERVER' that belongs to domain 'GCDomain'.wls:/GCDomain/serverConfig>

So if you are having trouble connecting to your WebLogic admin server using the default self-signed certificate via wlst.sh, this environment variable is the answer.  I was now able to proceed with granting my account access to BI Publisher, and now I am able to access BI Publisher features as needed without using the SYSMAN account.

wls:/GCDomain/serverConfig> grantAppRole(appStripe="obi",appRoleName="EMBIPViewer",principalClass="weblogic.security.principal.WLSUserImpl",principalName="USERNAME")    
Location changed to domainRuntime tree. This is a read-only tree with DomainMBean as the root. 
For more help, use help(domainRuntime)

wls:/GCDomain/serverConfig> grantAppRole(appStripe="obi",appRoleName="EMBIPAdministrator",principalClass="weblogic.security.principal.WLSUserImply", principalName="USERNAME")                                                
Already in Domain Runtime Tree

wls:/GCDomain/serverConfig> grantAppRole(appStripe="obi",appRoleName="EMBIPScheduler",principalClass="weblogic.security.principal.WLSUserImply", principalName="USERNAME")
Already in Domain Runtime Tree

wls:/GCDomain/serverConfig> grantAppRole(appStripe="obi",appRoleName="EMBIPAuthor",principalClass="weblogic.security.principal.WLSUserImply", principalName="USERNAME")
Already in Domain Runtime Tree

wls:/GCDomain/serverConfig> exit()

Exiting WebLogic Scripting Tool.

Filed under: Cloud Control

How to migrate EM12c R3 OMS and repository to a new host

$
0
0

In order to save power in our data center, I need to migrate my EM12c R3 environment from the host where it currently runs to a new host.  I have a simple configuration, with a single OMS, no load balancer, and the repository database runs on the same host as EM12c R3 itself.  I also have BI Publisher installed and integrated with EM12c, and a few third party plugins as I’ve detailed elsewhere on this blog.  If you use an OS other than Linux x86-64 I suggest you research thoroughly as this procedure may or may not apply to your environment.  Further, if you have a multi-OMS setup or use a load balancer, you must read the documentation and adapt the process accordingly to match your system’s needs.  Note that I wrote this as I did the migration, live, on my production system, so I have text in a few places showing where I would have done things differently if I knew what to expect in the first place.  It all ended up working, but it could have been simpler.

Oracle documents the procedure for this migration in the EM12c Administrator’s Guide, Part VII, section 29, “Backing Up and Recovering Enterprise Manager“.  As a first step, my system administrator installed SLES 11 SP3 on the new server and created an account for me along with the ‘oracle’ account for EM12c. I have a 70GB volume to use for the database and OEM binaries, a 1GB volume for the DB control files and a 2GB volume for redo logs supplemented with a 15GB FRA volume to support flashback.  Due to our tape backup strategy I use the FRA only for flashback, which we don’t wish to backup, and use a separate volume for RMAN backupsets.  To avoid a backup/restore cycle, the volumes holding the database datafiles will just be moved over to the new host on the storage side.

First I will relocate the management repository database to the new host, then complete the process by relocating the OMS.

Relocating the Management Repository Database

I run Oracle Database 11.2.0.3, Enterprise Edition, plus PSU Jul 2013.  Rather than installing the database software from scratch and patching it, I will clone the existing Oracle home to the new server.  Unfortunately I cannot use EM12c to do the cloning, as cloning via EM12c requires a management agent on the new host.  The software-only install of EM12c that I will run later installs a management agent as part of the process and I do not wish these two to conflict, so I do not want to install an agent on the new host at this time.

I will clone the database home according to the procedure in Appendix B of the 11gR2 database documentation.  You should review the documentation for full details.

Cloning the Database Home

Stop the OMS, database and management agent before cloning the existing Oracle home.

oracle$ $OMS_HOME/bin/emctl stop oms -all ; $AGENT_HOME/bin/emctl stop agent ; $ORACLE_HOME/bin/dbshut $ORACLE_HOME

Create a zip file of the existing database home.  Run this step as root (or using sudo) to make sure that you get all the files.

oracle$ sudo zip -r dbhome_1.zip /oracle/oem/product/11.2.0/dbhome_1

Now I will start the original database back up so that OEM continues running while I prepare the cloned Oracle home.  I will perform this migration over a few days, as I have time, so I need to keep OEM up and running as much as possible to support and manage my other databases.

oracle$ $ORACLE_HOME/bin/dbstart $ORACLE_HOME ; sleep 10 ; $OMS_HOME/bin/emctl start oms ; sleep 10 ; $AGENT_HOME/bin/emctl start agent

Copy this zip file to the new host.

oracle$  scp dbhome_1.zip oracle@newhost:/oracle/oem

On the new host, extract this zip file to the target directory.

oracle@newhost$ unzip -d / dbhome_1.zip

Remove all “*.ora” files from the extracted $ORACLE_HOME/network/admin directory.

oracle@newhost$  rm /oracle/oem/product/11.2.0/dbhome_1/network/admin/*.ora

Execute clone.pl from $ORACLE_HOME/clone/bin.

oracle@newhost$ export ORACLE_HOME=/oracle/oem/product/11.2.0/dbhome_1
oracle@newhost$ $ORACLE_HOME/perl/bin/perl clone.pl ORACLE_BASE="/oracle/oem" ORACLE_HOME="/oracle/oem/product/11.2.0/dbhome_1" OSDBA_GROUP=dba OSOPER_GROUP=oper -defaultHomeName

Unfortunately this creates an oraInventory directory in the oracle user’s home directory.  I prefer to keep oraInventory under ORACLE_BASE, so I moved it and edited the generated files to change the path from /home/oracle/oraInventory to /oracle/oem/oraInventory.  Most likely some environment variable, or a previously existing /etc/oraInst.loc would have prevented this optional step.

oracle@newhost$ cp -a ~/oraInventory /oracle/oem
oracle@newhost$ cd /oracle/oem/oraInventory
oracle@newhost$ perl -pi.bak -e 's#/home/oracle#/oracle/oem#' oraInst.loc orainstRoot.sh

Complete the cloning steps by running the orainstRoot.sh and root.sh scripts.

oracle@newhost$ sudo /oracle/oem/oraInventory/orainstRoot.sh
Changing permissions of /oracle/oem/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /oracle/oem/oraInventory to dba.
The execution of the script is complete.
oracle@newhost$ sudo /oracle/oem/product/11.2.0/dbhome_1/root.sh
Check /oracle/oem/product/11.2.0/dbhome_1/install/root_newhost_2013-08-27_13-04-51.log for the output of root script

I do not want to use netca to configure the listener, so I will just copy the $ORACLE_HOME/network/admin/*.ora files back over from the original server to the new server, and edit them accordingly.

oracle$ scp *.ora oracle@newhost:/oracle/oem/product/11.2.0/dbhome_1/network/admin/ 

oracle@newhost$ cd $ORACLE_HOME/network/admin
oracle@newhost$ perl -pi.bak -e 's#oldhost#newhost#' *.ora

This completes the database cloning.

Start Management Repository Database On New Host

At this point you will probably use RMAN to create a backup of your original repository database, then restore that backup onto the new host.  Instead, I will cheat a bit, shut down OEM and the database, and ask my sysadmin to move the repository database’s datafile LUN over to the new host and mount it at the same location.

Before moving the LUN, create directories that the database needs for a successful startup.  These include the admin/SID/adump directory, and in my case, the /oracle/mirror/SID/cntrl and /oracle/mirror/SID/log directories where I keep the multiplexed copies of my redo logs and controlfiles.

oracle@newhost$ mkdir -p /oracle/oem/admin/emrep/adump
oracle@newhost$ mkdir -p /oracle/mirror/emrep/cntrl ; mkdir -p /oracle/mirror/emrep/log

As a sanity check, you should try starting up the listener on the new server and starting the database in NOMOUNT mode before proceeding.  This will help catch any issues that may exist in your environment before you start the outage on your original server.  Investigate and resolve any issues found before proceeding.

Shutdown the OMS, agent and database on the original server.

oracle$ $OMS_HOME/bin/emctl stop oms -all ; $AGENT_HOME/bin/emctl stop agent ; $ORACLE_HOME/bin/dbshut $ORACLE_HOME

Copy the controlfiles and redo logs from the original server to the new server.

oracle$ scp /oracle/oem/cntrl/control01.ctl oracle@newhost:/oracle/oem/cntrl/control01.ctl
oracle$ scp /oracle/mirror/emrep/cntrl/control02.ctl oracle@newhost:/oracle/mirror/emrep/cntrl/control02.ctl
oracle$ scp /oracle/oem/log/redo* oracle@newhost:/oracle/oem/log
oracle$ scp /oracle/mirror/emrep/log/redo* oracle@newhost:/oracle/mirror/emrep/log

Back on the new server, start up the listener, then the database.  I probably should have disabled flashback first.

oracle@newhost$ lsnrctl start LISTENER
oracle@newhost$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Wed Aug 28 10:09:01 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup;
ORACLE instance started.

Total System Global Area 9620525056 bytes
Fixed Size                  2236488 bytes
Variable Size            6241128376 bytes
Database Buffers         3355443200 bytes
Redo Buffers               21716992 bytes

Database mounted.
ORA-38760: This database instance failed to turn on flashback database
SQL> select open_mode from v$database;

OPEN_MODE
--------------------
MOUNTED

SQL> alter database flashback off;

Database altered.

SQL> alter database open;

Database altered.

Reconfigure Existing OMS For New Repository Database

Start the OMS and agent on the original server.  OMS startup will fail, as you have not yet reconfigured the repository.

oracle$ $OMS_HOME/bin/emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Starting Oracle Management Server...
Starting WebTier...
WebTier Successfully Started
Oracle Management Server is not functioning because of the following reason:
Failed to connect to repository database. OMS will be automatically restarted once it identifies that database and listener are up.
Check EM Server log file for details: /oracle/oem/gc_inst/user_projects/domains/GCDomain/servers/EMGC_OMS1/logs/EMGC_OMS1.out
oracle$ $AGENT_HOME/bin/emctl start agent

Reconfigure the OMS repository database connection.  Provide SYSMAN’s password when prompted.

oracle$ $OMS_HOME/bin/emctl config oms -store_repos_details -repos_conndesc "(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=newhost)(PORT=1521)))(CONNECT_DATA=(SID=emrep)))" -repos_user sysman
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Enter Repository User's Password : 
Successfully updated datasources and stored repository details in Credential Store.
If there are multiple OMSs in this environment, run this store_repos_details command on all of them.
And finally, restart all the OMSs using 'emctl stop oms -all' and 'emctl start oms'.
It is also necessary to restart the BI Publisher Managed Server.

Stop, then restart the OMS.

oracle$ $OMS_HOME/bin/emctl stop oms -all ; sleep 5 ; $OMS_HOME/bin/emctl start oms
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Stopping WebTier...
WebTier Successfully Stopped
Stopping Oracle Management Server...
Oracle Management Server Successfully Stopped
AdminServer Successfully Stopped
Oracle Management Server is Down
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Starting Oracle Management Server...
Starting WebTier...
WebTier Successfully Started
Oracle Management Server Successfully Started
Oracle Management Server is Up

Login to OEM and confirm proper operation of the system.  I had a lot of alerts for failed backup jobs since my repository database hosts my RMAN catalog.  These can wait for now.  Also expect your repository target to show as down, since you have not yet updated the monitoring configuration.  Reconfigure it now, providing the SYSMAN password when prompted.

oracle$ $OMS_HOME/bin/emctl config emrep -conn_desc "(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=newhost)(PORT=1521)))(CONNECT_DATA=(SID=emrep)))"
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Please enter repository password:                                    Enter password :                                               Login successful
Target "Management Services and Repository:oracle_emrep" modified successfully
Command completed successfully!

At this point you have successfully moved your repository database.  Don’t worry about any errors for now, though if you rely on an RMAN catalog and stored scripts for your backups, and these all live in your OEM repository database, you should go through now and update the monitoring configuration for the repository database and listener so that backups of your other databases do not fail.  I had to edit the recovery catalog and specify the host, port, and SID manually, since for some reason when I told it to use the repository database it kept trying to use the old hostname.  I will fix this after I complete the rest of the migration.

IMPORTANT NOTE: Since you have not yet migrated the repository database target to an agent local to that machine, backups of your repository database may not run.  Monitor your archived log directory on this system until you complete the rest of the migration, and manually run backups when necessary.

Installing OMS On A New Host

To install the OMS on a new host, perform a software-only installation from the same EM12c R3 installer that was used to install on the original host.  You will need to identify and retrieve all of the plugins that you have installed on the current OMS, as well as any patches that are currently installed on the OMS.  You must also make sure to use the same directory layout as on the original OMS.

Identifying Installed Patches

oracle$ $OMS_HOME/OPatch/opatch lsinv -oh $OMS_HOME
[...]
Interim patches (1) :

Patch  13983293     : applied on Thu Jul 11 09:56:16 EDT 2013
Unique Patch ID:  14779750
   Created on 25 Apr 2012, 02:18:06 hrs PST8PDT
   Bugs fixed:
     13587457, 13425845, 11822929

This patch gets installed by the EM12c R3 installer, so no need to bother with it any further.  If you have other patches installed, go fetch them, and install them after you have completed the plugin installation (see below).

Identifying Installed Plugins

Identify all plugins installed on your system using the query provided in the documentation, run as SYSMAN against your repository database.

SELECT epv.display_name, epv.plugin_id, epv.version, epv.rev_version,decode(su.aru_file, null, 'Media/External', 'https://updates.oracle.com/Orion/Services/download/'||aru_file||'?aru='||aru_id||chr(38)||'patch_file='||aru_file) URL
FROM em_plugin_version epv, em_current_deployed_plugin ecp, em_su_entities su
WHERE epv.plugin_type NOT IN ('BUILT_IN_TARGET_TYPE', 'INSTALL_HOME')
AND ecp.dest_type='2'
AND epv.plugin_version_id = ecp.plugin_version_id
AND su.entity_id = epv.su_entity_id;

Oracle-provided plugins will show a URL from which you must download the plugin.  Third-party plugins will not; you will need to make sure you have the appropriate downloaded plugin install .opar file from when you initially installed it.  Gather up all of these plugin files into a single directory on your NEW OMS host, changing the “.zip” filename extension to “.opar” for the Oracle-provided plugins.  You need EVERY plugin returned by this query or else your installation will NOT work.  I placed mine in /oracle/oem/migration/plugins.

You also need to copy over the three .zip files containing the OEM 12cR3 distribution: V38641-01.zip, V38642-01.zip and V38643-01.zip.  Save them into a convenient staging area on the new server (I use /oracle/oem/stage).

Perform Software-Only Installation Of EM12c R3

Go to the staging area on the new server and extract the three .zip files containing the EM12c R3 distribution, then start the installer.

oracle@newhost$ unzip V38641-01.zip ; unzip V38642-01.zip ; unzip V38643-01.zip 
[...]
oracle@newhost$ ./runInstaller

You can follow my previous post about upgrading EM12c R2 to R3 for more information about the installation process, just make sure you run it as a software only install and use the exact same path names as configured on the original OMS.  In my case this means a middleware home of /oracle/oem/Middleware12cR3 and an agent base directory of /oracle/oem/agent12c.

While the software installation proceeds, you should run an exportconfig on your current OMS to produce the configuration backup file you will need to use to reconfigure the new one.  Enter the SYSMAN password when prompted.

oracle$ $OMS_HOME/bin/emctl exportconfig oms
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Enter Enterprise Manager Root (SYSMAN) Password : 
ExportConfig started...
Machine is Admin Server host. Performing Admin Server backup...
Exporting emoms properties...
Exporting secure properties...

Export has determined that the OMS is not fronted 
by an SLB. The local hostname was NOT exported. 
The exported data can be imported on any host but 
resecure of all agents will be required. Please 
see the EM Advanced Configuration Guide for more 
details.

Exporting configuration for pluggable modules...
Preparing archive file...
Backup has been written to file: /oracle/oem/gc_inst/em/EMGC_OMS1/sysman/backup/opf_ADMIN_20130828_120424.bka

The export file contains sensitive data. 
 You must keep it secure.

ExportConfig completed successfully!

Copy that backup file to the new server.

oracle$  scp /oracle/oem/gc_inst/em/EMGC_OMS1/sysman/backup/opf_ADMIN_20130828_120424.bka oracle@newhost:/oracle/oem

Once the software-only install finishes, it will prompt you to run allroot.sh.  Do so.

oracle@newhost$ sudo /oracle/oem/Middleware12cR3/oms/allroot.sh 

Starting to execute allroot.sh ......... 

Starting to execute /oracle/oem/Middleware12cR3/oms/root.sh ......
Running Oracle 11g root.sh script...

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /oracle/oem/Middleware12cR3/oms

Enter the full pathname of the local bin directory: [/usr/local/bin]: 
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n) 
[n]: 
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n) 
[n]: 
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n) 
[n]: 

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
/etc exist

Creating /etc/oragchomelist file...
/oracle/oem/Middleware12cR3/oms
Finished execution of  /oracle/oem/Middleware12cR3/oms/root.sh ......

Starting to execute /oracle/oem/agent12c/core/12.1.0.3.0/root.sh ......
Finished product-specific root actions.
/etc exist
/oracle/oem/agent12c/core/12.1.0.3.0
Finished execution of  /oracle/oem/agent12c/core/12.1.0.3.0/root.sh ......

After running allroot.sh, you need to run the PluginInstall.sh script with the path where you saved the .opar files.  Make sure you select every plugin listed when you ran the query to retrieve the plugin list earlier, then hit install.

oracle@newhost$ /oracle/oem/Middleware12cR3/oms/sysman/install/PluginInstall.sh -pluginLocation /oracle/oem/migration/plugins
This must match the list you generated previously

This must match the list you generated previously

Prepare the Software Library

Go to the original server, and copy the contents of the software library to the new server.

oracle$ scp -r /oracle/oem/software_library/ oracle@newhost:/oracle/oem

Recreate the OMS with OMSCA

Shut everything down on your old server.

oracle$ $OMS_HOME/bin/emctl stop oms -all ; sleep 5 ; $AGENT_HOME/bin/emctl stop agent

Run OMSCA using the exportconfig backup file you generated earlier.  Enter the administration server, node manager, repository database user and agent registration passwords when prompted.

oracle@newhost$ $OMS_HOME/bin/omsca recover -as -ms -nostart -backup_file /oracle/oem/opf_ADMIN_20130828_120424.bka
Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.3.0
Copyright (c) 1996, 2013, Oracle. All rights reserved.

OS check passed.
OMS version check passed.
Performing Admin Server Recovery...
Retrieved Admin Server template.
Source Instance Host name where configuration is exported : [deleted]
Populated install params from backup...
Enter Administration Server user password:
Confirm Password:
Enter Node Manager Password:
Confirm Password:
Enter Repository database user password:
Enter Agent Registration password:
Confirm Password:
Doing pre requisite checks ......
Pre requisite checks completed successfully

Checking Plugin software bits
Proceed to recovery
Setting up domain from template...
Setup EM infrastructure succeeded!
Admin Server recovered from backup.
Now performing cleanup of OMS EMGC_OMS1...
Now launching DeleteOMS...
OMS Deleted successfully

Delete finished successfully
Now launching AddOMS...
Infrastructure setup of EM completed successfully.

Doing pre deployment operations ......
Pre deployment of EM completed successfully.

Deploying EM ......
Deployment of EM completed successfully.

Configuring webtier ......
Configuring webTier completed successfully.

Importing OMS configuration from recovery file...

If you have software library configured 
please make sure it is functional and accessible 
from this OMS by visiting:
 Setup->Provisioning and Patching->Software Library

Securing OMS ......
Adapter already exists: emgc_USER
Adapter already exists: emgc_GROUP
Post "Deploy and Repos Setup" operations completed successfully.

Performing Post deploy operations ....
Total 0 errors, 78 warnings. 0 entities imported.
pluginID:oracle.sysman.core
Done with csg import
pluginID:oracle.sysman.core
Done with csg import
No logging has been configured and default agent logging support is unavailable.
Post deploy operations completed successfully.

EM configuration completed successfully.
EM URL is:https://newhost:7803/em

Add OMS finished successfully
Recovery of server EMGC_OMS1 completed successfully
OMSCA Recover completed successfully

Start the OMS on the new server.

oracle@newhost$ $OMS_HOME/bin/emctl start oms

Configure the central agent on the new server, then run the root.sh script.

oracle@newhost$ /oracle/oem/agent12c/core/12.1.0.3.0/sysman/install/agentDeploy.sh AGENT_BASE_DIR=/oracle/oem/agent12c AGENT_INSTANCE_HOME=/oracle/oem/agent12c/agent_inst AGENT_PORT=3872 -configOnly OMS_HOST=newhost EM_UPLOAD_PORT=4902 AGENT_REGISTRATION_PASSWORD=password
[...]
oracle@newhost$ sudo /oracle/oem/agent12c/core/12.1.0.3.0/root.sh

Relocate the oracle_emrep target to the new OMS host.

oracle@newhost$ $OMS_HOME/bin/emcli login -username=sysman
Enter password : 

Login successful
oracle@newhost$ $OMS_HOME/bin/emcli sync
Synchronized successfully
oracle@newhost$ $OMS_HOME/bin/emctl config emrep -agent newhost:3872
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Please enter repository password: 
Enter password :                                                               
Login successful
Moved all targets from oldhost:3872 to newhost:3872
Command completed successfully!
Enter password :                                                               
Login successful
Moved all targets from oldhost:3872 to newhost:3872
Command completed successfully!

Step through each of your existing agents to re-secure them against the new OMS.  Provide the OMS HTTP port (not HTTPS) in this command, and enter the agent registration password when prompted.

$ $AGENT_INSTANCE_DIR/bin/emctl secure agent -emdWalletSrcUrl "http://newhost:4890/em"
Oracle Enterprise Manager Cloud Control 12c Release 3  
Copyright (c) 1996, 2013 Oracle Corporation.  All rights reserved.
Agent successfully stopped...   Done.
Securing agent...   Started.
Enter Agent Registration Password : 
Agent successfully restarted...   Done.
EMD gensudoprops completed successfully
Securing agent...   Successful.

Start the agent on the old OMS server.  You should not need to do this, but I could not update the WebLogic Domain monitoring configuration without doing so first.  Also re-secure this agent to point to the new OMS.

oracle$ $AGENT_HOME/bin/emctl start agent
oracle$ $AGENT_INSTANCE_DIR/bin/emctl secure agent -emdWalletSrcUrl "http://newhost:4890/em"

Login to the OEM GUI running on the new server and navigate to the WebLogic Domain target for the Cloud Control domain.  In the Target Setup -> Monitoring Credentials section, update the Administration server host value to the new server name, then hit OK.  Then execute a Refresh WebLogic Domain, selecting Add/Update Targets, to move all WebLogic targets to the new central agent.

I use third-party plugins to monitor VMWare targets, NetApp storage and MySQL servers.  I had many of them set up to run from the OMS agent (except for the VMWare ones, since Blue Medora helpfully advised not to use the OMS agent for this — great advice).  I now need to relocate each of these targets to the new central agent using emcli.  You won’t need to do this step unless you also have things set up this way.  If I had to do this again, I would not use the OMS agent for these targets, since I would not need to change anything if I just had these on some other agent.

oracle@newhost$ ./emcli relocate_targets -src_agent=oldhost:3872 -dest_agent=newhost:3872 -copy_from_src -target_name=nameoftarget -target_type=typeoftarget

Final Cleanup Steps

By now you have completed the bulk of the work necessary to migrate your EM12c stack to a new server.  Only a few steps remain.  If you use any utility scripts on the old server, go ahead and copy those over now.  I have scripts to automate starting/stopping the OMS and agent, so I’ve copied those over.  Also make sure the oracle user on the new server has all the environment variables set up in their shell initialization files.

oracle$ scp ~/bin/CCstart ~/bin/CCstop oracle@newhost:bin/

The GCDomain Oracle WebLogic Domain target did not get moved to my new agent.  If this happened to you, go to the target home page and select the Modify Agents menu item.  Click Continue, then find GCDomain in the list, scroll to the right, and assign the new OMS server’s agent as the monitoring agent for this target, then click the Modify Agents button.

Reinstall BI Publisher

Since I had BI Publisher installed on the old server, I need to install it again on the new one.  Retrieve the 11.1.1.6.0 BI Publisher installation files used previously, and copy them to your staging area.  Run the “runInstaller” program from bishiphome/Disk1, and perform a software-only installation with the middleware home set to your EM12c installation middleware home, and leave the Oracle home as Oracle_BI1.

Instead of running the configureBIP script as you normally would to integrate BI Publisher with EM12c, just go to the WebLogic administration console after the software-only install completes, and navigate to the BIP server configuration page.  Lock the configuration for editing, and edit the configuration to change the listen address to reference the new server’s hostname and change the machine to the machine name where the admin server runs (in my case it showed up as EMGC_MACHINE2).  Save and activate the changes, then start the BIP server.

After the server has started, return to the WebLogic Domain page and re-run the Refresh WebLogic Domain step, again with Add/Update targets, to move BIP to your new OMS agent.

I actually had to do the Refresh WebLogic Domain step here twice.  I may have simply not waited long enough after starting BIP before I ran it, but I do not know for sure.

Update EM Console Service

I have only one target showing down at this point, the EM Console Service.  Go to the target, and click on the Monitoring Configuration tab.  Click on Service Tests and Beacons.  Select the EM Console Service Test, and click the Edit button.  Make sure you have the “Access Login page” step selected, and click Edit.  Change the URL to reflect your new OEM server, and save the changes.

Remove Previous OMS Server From OEM

Stop the agent on your original OMS server.

oracle$ $AGENT_HOME/bin/emctl stop agent

Remove the host target where your original OMS ran.  Then remove the agent target.

One Last Bounce

Finally, bounce the whole thing one last time, then start it back up.  All green.

Conclusion

I would prefer a simpler process to migrate the EM12c stack to a new server, but this works.  If you find yourself in a similar position to mine, I hope this helps you.  I’ve spent a lot of time working in EM12c so I feel capable to diagnose and resolve issues encountered during the process, but if you run into problems do not hesitate to contact Oracle Support and file a service requests.  If you want your system to stay supportable, stick with the experts and just use blogs as a guide to get started.  Good luck.


Filed under: Cloud Control

SQL to query table size and DBMS_REDEFINITION progress

$
0
0

Like so many other Oracle DBAs, I need a script to query the total disk space used by an individual table, including the data, indexes and LOBs, that works whether or not the table uses partitioning.  I also wanted a script to monitor the progress of DBMS_REDEFINITION actions.  Here I provide a single script that does both.

Sample output during a DBMS_REDEFINITION run, with my SAP system name redacted:

SQL> @s
Enter value for segment: reposrc

ACTION          TARGET                              REMAINS  PROGRESS
--------------- ----------------------------------- -------- ---------------
Table Scan      SAP***.REPOSRC                      00:08:45 4.89%

SEGTYPE         SEGMENT                               SIZEMB TABLESPACE
--------------- ----------------------------------- -------- ---------------
1-TABLE         SAP***.REPOSRC                          3230 PSAP***702
                SAP***.REPOSRC#$                         160 PSAP***702
***************                                     --------
sum                                                     3390

2-INDEX         SAP***.REPOSRC^0                         136 PSAP***702
                SAP***.REPOSRC^SPM                       136 PSAP***702
***************                                     --------
sum                                                      272

3-LOBDATA       DATA:SAP***.REPOSRC                     3365 PSAP***702
                DATA:SAP***.REPOSRC#$                    192 PSAP***702
***************                                     --------
sum                                                     3557

4-LOBINDEX      DATA:SAP***.REPOSRC                        0 PSAP***702
                DATA:SAP***.REPOSRC#$                      0 PSAP***702
***************                                     --------
sum                                                        0

                                                    --------
sum                                                     7219

The first result block shows the current action (a table scan, in this instance), the name of the table, time remaining in hours:minutes:seconds format and the completion percentage from V$SESSION_LONGOPS.  As a side benefit, if you run this against a table that has some other long operation running against it, you will see that here as well.  It works for more than just table redefinitions.

The second result block displays the space used by the original table (REPOSRC) and the intermediate table used during DBMS_REDEFINITION (REPOSRC#), along with all segment types in use by both tables (table data, indexes, LOB data and LOB indexes).  For the LOB data and indexes, the “SEGMENT” column shows the LOB column name followed by the table name.

Another example of output from the same script, this time for a partitioned table with no LOBs and no redefinition running, from my EM12c repository database:

SQL> @s
Enter value for segment: em_metric_values_daily

SEGTYPE         SEGMENT                               SIZEMB TABLESPACE
--------------- ----------------------------------- -------- ---------------
1-TABLE         SYSMAN.EM_METRIC_VALUES_DAILY            327 MGMT_TABLESPACE
***************                                     --------
sum                                                      327

2-INDEX         SYSMAN.EM_METRIC_VALUES_DAILY_PK          48 MGMT_TABLESPACE
***************                                     --------
sum                                                       48

                                                    --------
sum                                                      375

The script:

SET PAGES 30
SET VERIFY OFF
SET FEEDBACK OFF

COLUMN ACTION FORMAT A15
COLUMN TARGET FORMAT A35
COLUMN PROGRESS FORMAT A15
COLUMN REMAINS FORMAT A8

SELECT
  OPNAME ACTION,
  TARGET,
  TO_CHAR(TO_DATE(TIME_REMAINING, 'sssss'), 'hh24:mi:ss') REMAINS,
  TO_CHAR(TRUNC(ELAPSED_SECONDS/(ELAPSED_SECONDS+TIME_REMAINING)*100,2))
  || '%' PROGRESS
FROM
  V$SESSION_LONGOPS
WHERE
  TIME_REMAINING != 0
AND TARGET LIKE UPPER('%&&segment%');

COLUMN SEGTYPE FORMAT A15
COLUMN SEGMENT FORMAT A35
COLUMN SIZEMB FORMAT 9999999
COLUMN TABLESPACE FORMAT A15

BREAK ON SEGTYPE SKIP 1 ON REPORT

COMPUTE SUM OF SIZEMB ON SEGTYPE
COMPUTE SUM OF SIZEMB ON REPORT

SELECT
  SEGTYPE,
  SEG SEGMENT,
  SIZEMB,
  TABLESPACE_NAME TABLESPACE
FROM
  (
    SELECT
      '1-TABLE' SEGTYPE,
      S.OWNER
      || '.'
      || S.SEGMENT_NAME SEG,
      TRUNC(SUM(BYTES)/1024/1024) SIZEMB,
      S.TABLESPACE_NAME
    FROM
      DBA_SEGMENTS S
    WHERE
      (
        S.SEGMENT_NAME = UPPER('&&segment')
      OR S.SEGMENT_NAME LIKE UPPER('&&segment#%')
      )
    AND S.SEGMENT_TYPE LIKE 'TABLE%'
    GROUP BY
      S.OWNER
      || '.'
      || SEGMENT_NAME,
      TABLESPACE_NAME
    UNION
    SELECT
      '2-INDEX' SEGTYPE,
      S.OWNER
      || '.'
      || S.SEGMENT_NAME SEG,
      TRUNC(SUM(S.BYTES)/1024/1024) SIZEMB,
      S.TABLESPACE_NAME
    FROM
      DBA_SEGMENTS S,
      DBA_INDEXES I
    WHERE
      S.SEGMENT_NAME = I.INDEX_NAME
    AND S.SEGMENT_TYPE LIKE 'INDEX%'
    AND S.OWNER = I.OWNER
    AND
      (
        I.TABLE_NAME = UPPER('&&segment')
      OR I.TABLE_NAME LIKE UPPER('&&segment#%')
      )
    GROUP BY
      S.OWNER
      || '.'
      || S.SEGMENT_NAME,
      S.TABLESPACE_NAME
    UNION
    SELECT
      '3-LOBDATA' SEGTYPE,
      L.COLUMN_NAME
      || ':'
      || S.OWNER
      || '.'
      || L.TABLE_NAME SEG,
      TRUNC(SUM(S.BYTES)/1024/1024) SIZEMB,
      S.TABLESPACE_NAME
    FROM
      DBA_SEGMENTS S,
      DBA_LOBS L
    WHERE
      S.SEGMENT_NAME = L.SEGMENT_NAME
    AND
      (
        S.SEGMENT_TYPE = 'LOBSEGMENT'
      OR S.SEGMENT_TYPE LIKE 'LOB %'
      )
    AND S.OWNER = L.OWNER
    AND
      (
        L.TABLE_NAME = UPPER('&&segment')
      OR L.TABLE_NAME LIKE UPPER('&&segment#%')
      )
    GROUP BY
      L.COLUMN_NAME
      || ':'
      || S.OWNER
      || '.'
      || L.TABLE_NAME,
      S.TABLESPACE_NAME
    UNION
    SELECT
      '4-LOBINDEX' SEGTYPE,
      L.COLUMN_NAME
      || ':'
      || S.OWNER
      || '.'
      || L.TABLE_NAME SEG,
      TRUNC(SUM(S.BYTES)/1024/1024) SIZEMB,
      S.TABLESPACE_NAME
    FROM
      DBA_SEGMENTS S,
      DBA_LOBS L
    WHERE
      S.SEGMENT_NAME   = L.INDEX_NAME
    AND S.SEGMENT_TYPE = 'LOBINDEX'
    AND S.OWNER        = L.OWNER
    AND
      (
        L.TABLE_NAME = UPPER('&&segment')
      OR L.TABLE_NAME LIKE UPPER('&&segment#%')
      )
    GROUP BY
      L.COLUMN_NAME
      || ':'
      || S.OWNER
      || '.'
      || L.TABLE_NAME,
      S.TABLESPACE_NAME
  )
ORDER BY
  SEGTYPE,
  SEG ;
UNDEFINE segment;

I based this on a script I initially found at stackoverflow.


Filed under: Database

Improving security in your web browsers: Firefox

$
0
0

Your web browsers implement poor security by default.  They do this, in large part, for interoperability reasons; if your just-downloaded new browser can’t connect to the sites you like to use, you either won’t use the browser or you’ll complain to the developers, and they don’t want to spend the time walking you through how to disable the specific security settings keeping you from using some random website that hasn’t upgraded their SSL implementation since 2002.

With effort and testing, you can significantly improve your security.  Don’t hold me responsible if this breaks your favorite site or eats all the food in your fridge, but if you want to step up and accept that security and convenience don’t go together, consider trying some or all of these steps to secure your Firefox browser.  I have Windows in front of me at the moment, but if you use a real operating system you can figure out how to perform the appropriate changes there.  Consider the fact that using Windows represents a greater security threat than almost anything else you can do.

Do note that even if you follow every suggestion I make on this page, you have not guaranteed security for yourself.  These steps cannot protect you from foolish decisions.  If, after doing all of this, you then proceed to visit some shady site and download a cracked version of some commercial software product, then execute it, you will get hacked, you will get compromised, you will get malware.

Why Security?

Only you know the adversaries you may have.  The malware spewed across the internet presents a risk to us all and these steps can help protect you from it.  But beyond that point, if you want to protect yourself from a determined adversary, then please only consider the steps I describe as a start.  If you work with confidential corporate documents, or if you work to promote human rights in repressive countries, or if you write news articles disclosing secret government projects, or if you run a hidden site selling drugs for bitcoins, you have a threat model much more complex than the average user.

Security Defined

One could write a book to define the word security.  Many have.  For the purposes of this post, I define security as protection against your own accidental mistakes, protection against common malware techniques and protection against an attacker with access to your network or the internet path between you and the sites you visit.  Further, I consider security to include not leaking unnecessary information about yourself or your browsing habits to third parties that want that information, such as advertisers.

Run A Current Browser

Using an old browser begs for trouble.  Just don’t do it.  For now I have Firefox 25 installed and everything I write here applies to this version and hopefully future versions.  Go to the Tools menu, select Options, then click on Advanced and select the Update tab.  Enable the radio button next to “Automatically install updates”.

Simple Steps

The steps described here shouldn’t significantly degrade your web browser experience but will improve your security quite a bit.  Everything in this section lives in the Tools->Options dialog box.  Open it up now.

Options: Tabs

If checked, uncheck the box next to “Show tab previews in the Windows taskbar”.  Windows has a history of buffer overflows in graphics handlers, and a specially crafted tab preview could potentially exploit this.  I do not know of this ever happening but no need to take the risk simply for some eye candy.

Options: Content

Check the box next to “Block pop-up windows”.  Compromised or otherwise malicious sites love to put up confusing pop-up windows saying “your computer has a virus” and other such nonsense.  The next time you go to a site that attempts to raise a pop-up window, Firefox will ask if you wish to allow an exception for that site.  If this happens on a site you need, allow the exception.  If a bad site can’t pop up a window to attempt to fool you, you won’t click on their shady links.

Click the “Choose…” button next to “Choose your preferred language for displaying pages”.  Make sure the contents of the language dialog box reflect only those languages you wish to read.

Options: Applications

Click through every row of this screen and use the drop-down menu on the right-hand side to select “Always ask”, so that Firefox will prompt to ask how (and more importantly, if) you wish to access embedded content like videos, music, PDF documents, etc.  This may get inconvenient over time if you access a lot of media, so later on, when prompted to select an application to view media, you may choose to select the “Do this automatically for files like this from now on” checkbox in the prompt but know that this reduces your overall security slightly.

Options: Privacy

Enable the radio button next to “Tell sites that I do not want to be tracked”.  This will cause your browser to send the Do-Not-Track header. Few webservers will respect this setting, but some will, so you get some small value here.

In the History section, select “Use custom settings for history” from the “Firefox will:” dropdown menu.  For the sake of convenience, go ahead and leave the checkboxes enabled for “Remember my browsing and download history” and “Remember search and form history”.  I recommend disabling them, but the convenience of having recently visited sites available outweighs the risk of having to search for a site repeatedly and possibly clicking on a malicious search engine result.

Go ahead and leave the checkbox enabled for “Accept cookies from sites”, or very few websites will work.  Set the “Accept third-party cookies” dropdown menu to “From visited”, NOT to “Always”.  Many sites will not work if you set it to “Never”, nearly every site will still work fine with it set to “From visited”.  ”Always”, in this case, begs to be tracked by marketers.

In the “Keep until:” dropdown menu, select “they expire”.  Some people would recommend deleting cookies every time the browser closes, but you will lose the convenience of having sites recognize you when you want them to.  If you can tolerate that loss of convenience go ahead and select “I close Firefox”.

Check out the “Exceptions…” button near the “Accept cookies from sites” checkbox.  Here you can add exceptions to specify sites always allowed to set cookies, or never allowed to set cookies.  I love this feature.  I coded this feature into the text-based Lynx web browser back in 1999 and it pleases me that the GUI browsers picked it up.

Options: Security

Check the checkboxes next to “Warn me when sites try to install add-ons”, “Block reported attack sites” and “Block reported web forgeries”.

Uncheck the “Remember passwords for sites” checkbox.  If you permit the browser to store your passwords, anyone with access to your browser can retrieve your passwords.  I suggest only enabling this if you have taken the further step of encrypting your hard drive.  If you do enable it, make sure you also enable the “Use a master password” option and select a strong password.

Options: Sync

Do not use Firefox Sync.  This will simply spread your information out over more devices, increasing your risk.

Options: Advanced

On the “General” tab, check the box next to “Warn me when websites try to redirect or reload the page”.

On the “Data Choices” tab, uncheck everything.  All of these options share information with Mozilla and you do not want that to happen.

On the “Network” tab, check the box next to “Tell me when a website asks to store data for offline use”.  Most likely you do not actually want any sites to do this.

On the “Certificates” tab, click the “Validation” button and enable the checkboxes to use the Online Certificate Status Protocol to confirm certificate validity and to treat certificates as invalid when an OCSP server connection fails.  While not foolproof, this can help protect against invalid or compromised server certificates.

Intermediate Steps

If you have followed everything so far, you have improved your browser security.  Not enough, in my opinion, but perhaps enough if you plan to hand this browser off to your tech-challenged grandparents to use to look up recipes and email pictures of their grandkids.  If you have a decent comfort level with basic internet and browser concepts, continue on.

Install Add-Ons

Numerous add-ons available for Firefox can further enhance your security.  Here I will list the ones I consider most critical, along with some comments on configuration/usage for each of them.

Adblock Plus

Install Adblock Plus.  Ads on webpages may not represent an obvious security issue, but I still consider blocking them appropriate for a secured browser.  When your browser loads an ad from a page the advertiser will know that somebody from your IP address viewed a page containing that ad, and depending on how the ad gets served up they may also learn the page you intended to view at the same time.  Further, traffic analysis of specially placed ads may reveal information about the sites you visit as ads typically do not use https connections, and if somebody with access to your network sees that you repeatedly load some specific ad that only appears on a particular site, they would then have strong evidence that you visit that site repeatedly.

Within the Adblock Plus options, subscribe to EasyList, and uncheck the “Allow some non-intrusive advertising” checkbox.  If you live outside the USA, subscribe to some of the additional filter lists dedicated to your region.

BetterPrivacy

Install BetterPrivacy. This add-on removes persistent Flash cookies, for which browsers generally provide no control mechanism.  Within the options screen, select the radio button for “Delete Flash cookies on Firefox exit”.  Select the checkboxes for “Auto protect LSO sub-folders” and “Notify if new LSO is stored”.  Check the box for “Disable Ping Tracking”.

Certificate Patrol

Install Certificate Patrol. This add-on stores all SSL certificates you encounter when accessing https sites, and notifies you when a site you connect to has changed certificates since your last visit.  A changed certificate may indicate an attempted man-in-the-middle attack that would compromise your encrypted session.  I receive a lot of false positives with this add-on, which defeats its utility somewhat, but I review every single change.  If you want to skip one of these add-ons, make it this one.  I haven’t convinced myself that I take enough care to actually identify a man-in-the-middle attack, and I can’t exactly call someone at Google every time their cert changes to confirm they meant to do so.

Ghostery

Install Ghostery. This add-on identifies and blocks various web trackers embedded throughout the sites you visit.  Mostly analytics and marketing, rather than anything truly security related, but you don’t want any part of those either.  Unfortunately some sites will not function properly with Ghostery installed, but it provides options to whitelist those sites or temporarily pause blocking so that you can easily determine if Ghostery has caused the page to fail.  I end up having to whitelist bank sites, WordPress, a few others, but for just clicking through search results, I love it.  It also has the ability to block advertising cookies.

Long URL Please Mod

Install Long URL Please Mod.  Shortened URLs suck.  You don’t know where they will lead, and if you take security seriously you probably won’t click on them.  This add-on expands short URLs for you so that you know where they lead and can make an educated decision as to whether or not you want to follow that link.

NoScript

Install NoScript. Perhaps the most important add-on to use. This add-on provides the ability to permit or reject active scripting to run on a per-domain or per-host basis.  It will, initially, block all JavaScript on every site, which will break large portions of the web for you.  In this case, as you find sites that don’t work, you use the button it adds to the browser bar to enable scripting (temporarily or permanently) for that particular site, reload the page, and everything should then function as intended.  Sites get classified into trusted (whitelisted), untrusted, and those that you haven’t yet evaluated.

As a bonus, it also provides protection against cross-site-scripting and clickjacking (where a malicious site overlays an invisible object over a page element, intercepting a click on that element as a click directed at the malicious site, allowing it to load a page/code/etc).

NoScript has numerous configuration options.  I recommend the following:

Do NOT check the “Scripts Globally Allowed” box, as this essentially disables the add-on and leaves you back in the usual situation of freely running all JavaScript submitted to your browser.

On the “Embeddings” tab, you can specify restrictions for untrusted sites that do not apply to whitelisted sites.  This gives you a chance to use paranoid settings, as you can always whitelist a site later.  I don’t want to make them so restrictive that I end up whitelisting every other site, so I don’t block frames, but I do block: Java, Flash, Silverlight, other plugins, audio/video tags, and font-face, and I also block every object coming from sites marked as untrusted.  I also enable “Show placeholder icon”, “No placeholder for objects coming from sites marked as untrusted”, “Ask for confirmation before temporarily unblocking an object” and “Collapse blocked objects”.  I also check the box for ClearClick (clickjacking) protection on untrusted pages.  Some whitelisted pages don’t work if I enable ClearClick protection for trusted pages, so I leave that one off.

In the “Advanced” tab, on the “Untrusted” sub-tab, check “Forbid <a ping…>”, “Forbid META redirections inside <NOSCRIPT> elements”, “Forbid XSLT” and “Attempt to fix JavaScript links”.  On the “XSS” tab, I check “Sanitize cross-site suspicious requests” and “Turn cross-site POST requests into data-less GET requests”.

NoScript can do even more than this, and you should look into the other options.  The configuration set I have described works well for my browsing habits.

WOT

Install WOT. This add-on uses a crowdsourced set of website rankings to provide you with a simple red (bad) / yellow (maybe bad) / green (good) ranking for every site you visit and all sites that appear in search results from Google.  It further takes advantage of blacklists published by anti-virus vendors and other independent sources to identify malicious sites.  You do not have to do so, but if you choose to create an account with them you can submit your own ratings.  WOT uses a complex reputation mechanism to determine how much weight to give your ratings when compiling them with others’ to determine a site’s overall rating; this helps prevent malicious individuals from installing the add-on and voting up a bunch of malware infested sites.

Expert Steps

Doing everything, or even some of the things, that I’ve listed to this point will greatly improve your browser security.  But you can do more.  At this point I will get into the weeds a bit and make some significant changes to browser operation.  These changes may (and probably will) cause problems accessing poorly configured sites, but if you use sites configured so poorly, maybe you shouldn’t.  I recommend, if you follow these suggestions, that you implement them one at a time, and test all the sites you consider most important.  If you change a dozen things and suddenly some page stops working, you won’t know what to undo to restore it to functionality.  As an example, while writing up this post I noticed that addons.mozilla.org started to throw intermittent SSL errors when I tried to connect to it.  Hitting reload would usually load the page just fine.  It turned out that disabling RC4 cipher suites for SSL negotiation caused that problem: apparently not all of the servers behind their load balancer have the same configuration, and some of them just don’t work if the client browser does not accept RC4.

about:config

Everything else happens in the about:config screen.  If you haven’t used it before, type “about:config” into your address bar and hit enter.  Click through the warning that says it might break stuff, but recognize they put it there for a reason.

Disable RC4

The RC4 symmetric cipher contains significant failings.  You should not use it.  In fact, if you admin any webservers, leave this blog now and go figure out how to disable RC4 on them.  Then come back and finish securing your browser.  If you need convincing, read this: “Attack of the week: RC4 is kind of broken in TLS“.

In the about:config page, type “rc4″ into the search bar and press enter.  You will see several cipher suites listed (with names like “security.ssl3.rsa_rc4_128_sha”).  Double-click on each of them so that the value field on the right reads “false”.  Your browser will no longer advertise willingness to accept RC4 as a component in an SSL connection.

Require TLS

Type “tls” into the about:config search bar and press enter.  Find the “security.tls.version.min” key, which defaults to 0, and change it to 1.  Set the “security.tls.version.max” key, which defaults to 1, to 3. [EDIT 20131112: I previously recommended 2 here, for TLS 1.1, thinking it would cause fewer connection failures than 3 for TLS 1.2. This won't be a problem once Firefox has fallback code from TLS 1.2. But if you are following these steps you should know how to debug and fix any connection problems you have.] For more information on these settings and what they do, see this link.

Other Settings

Type “security” into the about:config search bar and press enter.  Find the “security.ssl.enable_false_start” key and double-click it to set the value to true.  Do the same for “security.ssl.false_start.require-forward-secrecy”, “security.ssl.require_safe_negotiation”, and “security.ssl.treat_unsafe_negotiation_as_broken”.  Read this link for more information about these settings.

Conclusion

If most of your web browsing still works after configuring all this stuff, congratulations.  You probably browse safely enough that you don’t have much to worry about.  If you run into sites that don’t work with these settings, consider whether or not you really need to visit them.  Good luck!


Filed under: Security
Viewing all 64 articles
Browse latest View live