Yann Neuhaus

Subscribe to Yann Neuhaus feed
dbi services technical blog
Updated: 2 hours 41 min ago

What you can do when your Veritas cluster shows interfaces as down

Sat, 2018-02-17 07:40

Recently we had the situation that the Veritas cluster (InfoScale 7.3) showed interfaces as down on the two RedHat 7.3 nodes. This e.g. can happen when you change hardware. Although all service groups were up and running this is a situation you usually want to avoid as you never know what happens when the cluster is in such a state. When you have something like this:

[root@xxxxx-node1 ~]$ lltstat -nvv | head
LLT node information:
Node State Link Status Address
  * 0 xxxxx-node1 OPEN
      eth3 UP yy:yy:yy:yy:yy:yy
      eth1 UP xx:xx:xx:xx:xx:xx
      bond0 UP rr:rr:rr:rr:rr:rr
    1 xxxxx-node2 OPEN
      eth3 UP ee:ee:ee:ee:ee:ee
      eth1 DOWN tt:tt:tt:tt:tt:tt
      bond0 DOWN qq:qq:qq:qq:qq:qq

… what can you do?

In our configuration eth1 and eth3 are used for the interconnect and bond0 is the public network. As you can see above the eth1 and bond0 are reported as down for the second node. Of course, the first check you need to do is to check the interface status on the operating system level, but that was fine in our case.

Veritas comes with a tiny little utility (dlpiping) you can use to check connectivity on the Veritas level. Using the information from the lltstat command you can start dlpiping in “send” mode on the first node:

[root@xxxxx-node1 ~]$ /opt/VRTSllt/dlpiping -vs eth1

When that is running (will not detach from the terminal) you should start in “receive” mode on the second node:

[root@xxxxx-node1 ~]$ /opt/VRTSllt/dlpiping -vc eth1 xx:xx:xx:xx:xx:xx
using packet size = 78
dlpiping: sent a request to xx:xx:xx:xx:xx:xx
dlpiping: received a packet from xx:xx:xx:xx:xx:xx

This confirms that connectivity is fine for eth1. When you repeat that for the remaining interfaces (eth3 and bond0) and all is fine then you you can proceed. If not, then you have another issue than what we faced.

The next step is to freeze all your service groups so the cluster will not touch them:

[root@xxxxx-node1 ~]$ haconf -makerw
[root@xxxxx-node1 ~]$ hagrp -freeze SERVICE_GROUP -persistent # do that for all service groups you have defined in the cluster
[root@xxxxx-node1 ~]$ haconf -dump -makerw

Now the magic:

[root@xxxxx-node1 ~]$ hastop -all -force 

Why magic? This command will stop the cluster stack on all nodes BUT it will leave all the resources running. So you can do that without shutting down any user defined cluster services (Oracle databases in our case). Once the stack is down on all the nodes stop gab and ltt on both nodes as well:

[root@xxxxx-node1 ~]$ systemctl stop gab
[root@xxxxx-node1 ~]$ systemctl stop llt

Having stopped llt and gab you just need to start them again in the correct order on both systems:

[root@xxxxx-node1 ~]$ systemctl start llt
[root@xxxxx-node1 ~]$ systemctl start gab

… and after that start the cluster:

[root@xxxxx-node1 ~]$ systemctl start vcs

In our case that was enough to make llt work as expected again and the cluster is fine:

[root@xxxxx-node1 ~]$ gabconfig -a
GAB Port Memberships
===============================================================
Port a gen f44203 membership 01
Port h gen f44204 membership 01
[root@xxxxx-node1 ~]#

[root@xxxxx-node1 ~]$ lltstat -nvv | head
LLT node information:
   Node State Link Status Address
    * 0 xxxxx-node1 OPEN
      eth3 UP yy:yy:yy:yy:yy:yy
      eth1 UP xx:xx:xx:xx:xx:xx
      bond0 UP rr:rr:rr:rr:rr:rr
    1 xxxxx-node2 OPEN
      eth3 UP ee:ee:ee:ee:ee:ee
      eth1 UP qq:qq:qq:qq:qq:qq
      bond0 UP tt:tt:tt:tt:tt:tt 

Hope that helps …

 

Cet article What you can do when your Veritas cluster shows interfaces as down est apparu en premier sur Blog dbi services.

CPUs: Cores versus Threads on an Oracle Server

Sat, 2018-02-17 06:49

When doing a performance review I often do talk with the DBA about the CPU utilization of the server. How reliable is the server CPU utilization with tools like top or the host CPU utilization in the AWR-report? E.g. on an Linux Intel x86-64 server with 8 Cores and 16 logical CPUs (Intel Hyperthreading), what does a utilization of 50% mean?
As I had an ODA X7-M in a test lab available, I thought I’ll do some tests on that.

In my old days at Oracle Support we used a small script to test the CPU single thread performance of an Oracle DB-server:


set echo on
set linesize 120
set timing on time on
with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 60 )
select /*+ ALL_ROWS */ count(*) from t,t,t,t,t
/

The SQL just burns a CPU-Core for around 20 seconds. Depending on your CPU single thread performance it may take a bit longer or completes faster.

On the ODA X7-M I have 16 Cores enabled and as hyperthreading enabled I do get 32 CPUs in /proc/cpuinfo:


oracle@dbi-oda01:/home/oracle/cbleile/ [CBL122] grep processor /proc/cpuinfo | wc -l
32
oracle@dbi-oda01:/home/oracle/cbleile/ [CBL122] lscpu | egrep "Thread|Core|Socket|Model name"
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
Model name: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz

The CPU-speed was at 2.3 GHZ all the time:


[root@dbi-oda01 ~]# for a in `ls -l /sys/devices/system/cpu/cpu*/cpufreq | grep cpufreq | cut -d "/" -f6 | cut -d "u" -f2`; do echo "scale=3;`cat /sys/devices/system/cpu/cpu${a}/cpufreq/cpuinfo_cur_freq`/1000000" | bc; done
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301
2.301

The CPU is capable of running up to 3.7 GHZ, but that did not happen on my machine.

Running my SQL-script on the ODA X7-M actually took 17.49 seconds:


18:44:00 SQL> with t as ( SELECT rownum FROM dual CONNECT BY LEVEL <= 60 )
18:44:00 2 select /*+ ALL_ROWS */ count(*) from t,t,t,t,t
18:44:00 3 /
 
COUNT(*)
----------
777600000
 
Elapsed: 00:00:17.49

I continued to do the following tests (a job means running above SQL-script):
– 1 Job alone
– 2 Jobs concurrently
– 4 Jobs concurrently
– 8 Jobs concurrently
– 16 Jobs concurrently
– 24 Jobs concurrently
– 32 Jobs concurrently
– 40 Jobs concurrently
– 50 Jobs concurrently
– 60 Jobs concurrently
– 64 Jobs concurrently
– 128 Jobs concurrently

Here the result:


Jobs Min Time Max Time Avg Time Jobs/Cores Jobs/Threads Avg/Single-Time Thread utilization
 
1 17.49 17.49 17.49 0.06 0.03 1.00 1.00
2 17.51 17.58 17.55 0.13 0.06 1.00 1.00
4 17.47 17.86 17.62 0.25 0.13 1.01 0.99
8 17.47 17.66 17.55 0.50 0.25 1.00 1.00
16 17.64 21.65 18.50 1.00 0.50 1.06 0.95
24 18 27.38 24.20 1.50 0.75 1.38 0.72
32 32.65 34.57 33.21 2.00 1.00 1.90 0.53
40 34.76 42.74 40.31 2.50 1.25 2.30 0.54
50 48.26 52.64 51.21 3.13 1.56 2.93 0.53
60 52.4 63.6 60.63 3.75 1.88 3.47 0.54
64 54.2 68.4 64.27 4.00 2.00 3.67 0.54
128 119.49 134.34 129.01 8.00 4.00 7.38 0.54

When running with 16 Jobs top showed a utilization of around 50-52%. However running more than 16 Jobs showed an increase of the average time a job takes. I.e. with 16 Jobs the 16-Cores-Server is already almost fully utilized. Running with 32 Jobs results in an average elapsed time of 1.9 times compared to running 16 jobs (or less) concurrently. As it is 1.9 times and not 2 times I can conclude that there is an advantage of running with hyperthreading enabled, but it’s only around 5-10%.

So when calculating the utilization of your server then base it on the number of cores and not on the number of threads. When looking at your host CPU-utilization in top or in the AWR-report on an hyperthreaded-enabled server then it’s a good idea to multiply the server-utilization by 1.9.

 

Cet article CPUs: Cores versus Threads on an Oracle Server est apparu en premier sur Blog dbi services.

Duplex RMAN backups between disk and tape

Fri, 2018-02-16 09:46

Below a workaround is shown how to “duplex” archivelog backups between disk and tape:

Backup on disk (normal way):

backup device type disk archivelog all;

 

Immediately  backup on tape:

backup device type sbt archivelog until time 'sysdate' not backed up 2 times;

 

This backup command backs up all archivelogs, that are not backed up twice, so all which are backed up with the first command. As in the first backup command a logfile switch is included, between the two backup commands, no logfile switch should occur, otherwise “duplexing” does not work. The until time clause in the second command is added to prevent RMAN from another logfile switch, which would lead to different contents of the backups. And this clause does not filter anything, because sysdate means date and time when issuing the command.

 

Cet article Duplex RMAN backups between disk and tape est apparu en premier sur Blog dbi services.

SQL Server Management Studio 17.4: Vulnerability Assessment

Thu, 2018-02-15 10:01

SQL Server Management Studio is a well know integrated environment used to manage SQL Server infrastructure.
This new version 17.4 can support SQL Server from 2008 up to 2017. It enhances existing features like Showplan, XE Profiler (complete list here) but also add an interesting one which is the Vulnerability Assessment.

Vulnerability Assessment will scan a database in order to help you to track security holes and deviations. Rules to define those deviations are based on Microsoft SQL Server best practices.
Let’s have a look to this new feature.

Once SSMS 17.4 installed, just choose the database you want to scan, right click on the database and select Task, Vulnerability Assessment and Scan For Vulnerabilities…:

VAss1

Select where you want to save the scan report and click OK:

VAss2

Once you clicked OK the scan is starting:

VAss3

At the end of the scan a Vulnerability Assessment Results is displayed:

VAss4

The report is displayed in a Management studio pane with the number of checks that have been run, how many issues have been found with different level of risk from Low to High and provide also some links about SQL Server security best practices.
Review all failed checks to validate that there are really security issues for your environment and go through results.
For each failed issue you will have a description of the issue, the impact, also the rule query applied and a possible remediation script:

VAss5

There is also a possibility to accept results even if there are considered as Potential Risk as a baseline. This will validate results that match the baseline.

VAss6

Once issues are solved or Baseline settled, the Vulnerability Assessment can be run again to see the result of the performed actions:

VAss7

This new feature integrated in Management Studio gives the ability to check that all your databases have a good level of security but also to keep this level.
Great new feature ;-)

 

Cet article SQL Server Management Studio 17.4: Vulnerability Assessment est apparu en premier sur Blog dbi services.

Backup and Restore PostgreSQL with PgBackRest II

Thu, 2018-02-15 09:41

In a precedent blog I shown a basic utilization of PgBackRest which is a tool to backup and restore PostgreSQL databases. In this blog I am going to talk some useful features of this tool. In practical examples we will see some tasks we can do with this tool. Of course the official documentation remains the best source of knowledges.

Encryption
Nowadays encryption of backups is very critical and is mandatory for many companies. PgBackRest allows us to encrypt the repository where backups are stored. A passphrase is used to encrypt/decrypt files of the repository. As you may already know, it is recommended to use a strong passphrase. In the following demonstration we use the openssl to generate a passphrase.

[postgres@pgserver ~]$ openssl rand -base64 48
FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
[postgres@pgserver ~]$

Once the passphrase generated, we can update the PgBackRest configuration file with
2 options: repo-cipher-pass and repo-cipher-type

[postgres@pgserver clustpgserver]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2

The next step is to create the stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 stanza-create
2018-02-13 13:54:50.447 P00 INFO: stanza-create command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:55:04.520 P00 INFO: stanza-create command end: completed successfully
[postgres@pgserver ~]$

As we can see the system automatically detect that the repository is encrypted and then will rewrite the command including the –repo-cipher-pass and the –repo-cipher-type options. After the creation of the stanza we can check the status of our stanza

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 check
2018-02-13 13:56:08.999 P00 INFO: check command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-13 13:57:08.026 P00 INFO: WAL segment 00000002000000000000004C successfully stored in the archive at '/var/lib/pgbackrest/archive/clustpgserver/10-1/0000000200000000/00000002000000000000004C-f5ced60cd351d74a91c9ce2e913b761144165e28.gz'
2018-02-13 13:57:08.030 P00 INFO: check command end: completed successfully

Everything seems fine, so let’s run a backup. Note that outputs are truncated

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-13 14:01:40.012 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
WARN: no prior backup exists, incr backup has been changed to full
2018-02-13 14:01:54.118 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-13 14:01:52": backup begins after the next regular checkpoint completes
...
type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-13 14:35:08.281 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-13 14:35:08.801 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

In a non-encrypted repository, file backup.info can be read. Now with encryption the if we try to read the file backup.info in the repository, we cannot.

[postgres@pgserver clustpgserver]$ less /var/lib/pgbackrest/backup/clustpgserver/backup.info
"/var/lib/pgbackrest/backup/clustpgserver/backup.info" may be a binary file. See it anyway?

And using the command strings, we can see that the file is encrypted.

[postgres@pgserver clustpgserver]$ strings /var/lib/pgbackrest/backup/clustpgserver/backup.info
Salted__Fx
.;Ru
cz4@
do:t
\pi3"E
VUSO
}a.R*
Wx5M
,?,W
3CXWB
[postgres@pgserver clustpgserver]$

From now, backups cannot be used unless the password is provided.

Restore in another location
PgBackRest allows to restore to another location. This can be useful if we want to duplicate our cluster on the same server or to another server. In the following demonstration, let’s duplicate on the same server.
The data directory of the source cluster is /var/lib/pgsql/10/data

postgres=# show data_directory;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
postgres=#

To duplicate to a new data directory /u01/devdata for example, the option –db-path is used

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata restore


2018-02-14 09:40:05.755 P01 INFO: restore file /u01/devdata/base/1/13657 (0B, 100%)
2018-02-14 09:40:05.773 P01 INFO: restore file /u01/devdata/base/1/13652 (0B, 100%)
2018-02-14 09:40:05.811 P01 INFO: restore file /u01/devdata/base/1/13647 (0B, 100%)
2018-02-14 09:40:05.983 P01 INFO: restore file /u01/devdata/base/1/13642 (0B, 100%)
2018-02-14 09:40:06.067 P00 INFO: write /u01/devdata/recovery.conf
2018-02-14 09:40:14.403 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2018-02-14 09:40:30.187 P00 INFO: restore command end: completed successfully

After the duplicate don’t forget to change the port (as we are in the same server) and then start your new cluster

postgres=# show data_directory ;
data_directory
----------------
/u01/devdata
(1 row)
postgres=#

Restore specific databases
With PgBackRest, we can restore specific user databases. Note that built-in databases (template0, template1 and postgres) are always restored.
Let’s show an example. In our source cluster we actually have two databases test and sandbox.

sandbox=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

In sandbox we have a table mytab with 2 rows

sandbox=# \c sandbox
You are now connected to database "sandbox" as user "postgres".
sandbox=# table mytab;
id
----
1
2
(2 rows)

Now let’s restore the cluster but only with test database, the option –db-include will be used.

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-path=/u01/devdata --db-include=test restore
2018-02-14 10:11:00.948 P00 INFO: restore command begin 1.28: --db-include=test=1 --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-14 10:11:05.137 P00 INFO: restore backup set 20180214-095439F_20180214-100446I
2018-02-14 10:11:25.110 P00 INFO: remap $PGDATA directory to /u01/devdata
...

After the restore completed, let’s start the new cluster and let’s verify present databases.

[postgres@pgserver devdata]$ psql -p 5436
psql (10.1)
Type "help" for help.
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
sandbox | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
(5 rows)

What!! the sandbox is still present despite the use of option –include-db=test. But if we try to connect to sandbox database. We get an error.

postgres=# \c sandbox
FATAL: relation mapping file "base/24581/pg_filenode.map" contains invalid data
Previous connection kept
postgres=#

And if we compare at OS level the size of files of the database at the source cluster and at the target

[postgres@pgserver log]$ du -sh /var/lib/pgsql/10/data/base/24581
7.8M /var/lib/pgsql/10/data/base/24581
[postgres@pgserver log]$ du -sh /u01/devdata/base/24581
16K /u01/devdata/base/24581
[postgres@pgserver log]$

We can see that at the target cluster, sandbox uses less disk space during the selective restore than it would have if the entire database had been restored. To finish the selective restore, we have to manually drop the sandbox database. Indeed PgBackRest cannot automatically drop the database because the cluster is not accessible until the recovery process finishes.

postgres=# drop database sandbox;
DROP DATABASE
postgres=#

Automatic cleanup of expired backups
Another nice feature of PgBackRest is that expired backups are automatically removed.
If we check our pgbackrest.conf file, we see that the retention-full is set to 2. This means that 2 full backups will be maintained. So if we do a third full backup, the first full backup and all corresponding incremental and differential backups will be expired and removed

[postgres@pgserver log]$ cat /etc/pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
repo-cipher-pass=FhXg7oW2pZb9UICZ4iYZPn3X4I6fF0ni7IL6QjaB1IL8qz4LIrP+GW+XqCZqIi3w
repo-cipher-type=aes-256-cbc
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2
[postgres@pgserver log]$

Let’s do a quick demonstration. Actually we have 2 full backups

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 00000002000000000000004E / 000000020000000000000056
full backup: 20180213-140152F
timestamp start/stop: 2018-02-13 14:01:52 / 2018-02-13 14:32:00
wal start/stop: 00000002000000000000004E / 00000002000000000000004E
database size: 577MB, backup size: 577MB
repository size: 28.8MB, repository backup size: 28.8MB
incr backup: 20180213-140152F_20180213-152509I
timestamp start/stop: 2018-02-14 09:31:03 / 2018-02-14 09:33:17
wal start/stop: 000000020000000000000052 / 000000020000000000000052
database size: 30.7MB, backup size: 285.3KB
repository size: 3.6MB, repository backup size: 24.3KB
backup reference list: 20180213-140152F
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
[postgres@pgserver log]$

And we can confirm by executing a simple ls in the repository where backups are stored

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 13 14:32 20180213-140152F
drwxr-x---. 3 postgres postgres 69 Feb 14 09:33 20180213-140152F_20180213-152509I
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info
-rw-r-----. 1 postgres postgres 2992 Feb 14 10:08 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 33 Feb 14 10:08 latest -> 20180214-095439F_20180214-100446I
[postgres@pgserver clustpgserver]$ ls -ld
drwxr-x---. 7 postgres postgres 4096 Feb 14 10:08 .
[postgres@pgserver clustpgserver]$ ls -ld *

Now let’s do a third full backup

[postgres@pgserver clustpgserver]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 --type=full backup
2018-02-14 10:55:52.250 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver --type=full

2018-02-14 11:19:02.001 P00 INFO: backup command end: completed successfully
2018-02-14 11:19:02.107 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-14 11:19:02.928 P00 INFO: expire full backup set: 20180213-140152F, 20180213-140152F_20180213-152509I
2018-02-14 11:22:08.759 P00 INFO: remove expired backup 20180213-140152F_20180213-152509I
2018-02-14 11:22:09.000 P00 INFO: remove expired backup 20180213-140152F

2018-02-14 11:22:49.387 P00 INFO: expire command end: completed successfully
[postgres@pgserver clustpgserver]$

We can see that at the end of backups, some old backups are expired and removed. We can also confirm this by listing files in the repository

[postgres@pgserver clustpgserver]$ ls -ld *
drwxr-x---. 3 postgres postgres 69 Feb 14 09:59 20180214-095439F
drwxr-x---. 3 postgres postgres 69 Feb 14 10:07 20180214-095439F_20180214-100446I
drwxr-x---. 3 postgres postgres 69 Feb 14 11:13 20180214-105603F
drwxr-x---. 3 postgres postgres 17 Feb 13 14:33 backup.history
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:19 backup.info
-rw-r-----. 1 postgres postgres 2320 Feb 14 11:20 backup.info.copy
lrwxrwxrwx. 1 postgres postgres 16 Feb 14 11:14 latest -> 20180214-105603F
[postgres@pgserver clustpgserver]$

Point-in-Time Recovery
PgBackRest can also do a point-in-time recovery. Let’s drop table article in the database test

test=# table article;
nom
---------
printer
(1 row)
.
test=# select now();
now
-------------------------------
2018-02-14 11:39:28.024378+01
(1 row)
.
test=# drop table article;
DROP TABLE
.
test=# table article;
ERROR: relation "article" does not exist
LINE 1: table article;
^
test=#

And now let’s restore until just before we drop the table let’s say 2018-02-14 11:39:28.
But as we have many backup sets we have to restore from a backup done before the table was dropped.
If we check our backups, we have to restore from the full backup: 20180214-105603F
which was taken before table article was dropped.

[postgres@pgserver devdata]$ pgbackrest --stanza=clustpgserver --log-level-console=info info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 000000020000000000000054 / 00000002000000000000005A
full backup: 20180214-095439F
timestamp start/stop: 2018-02-14 09:54:39 / 2018-02-14 09:58:53
wal start/stop: 000000020000000000000054 / 000000020000000000000054
database size: 30.7MB, backup size: 30.7MB
repository size: 3.6MB, repository backup size: 3.6MB
incr backup: 20180214-095439F_20180214-100446I
timestamp start/stop: 2018-02-14 10:04:46 / 2018-02-14 10:07:43
wal start/stop: 000000020000000000000056 / 000000020000000000000056
database size: 38.3MB, backup size: 7.6MB
repository size: 4.5MB, repository backup size: 928.5KB
backup reference list: 20180214-095439F
full backup: 20180214-105603F
timestamp start/stop: 2018-02-14 10:56:03 / 2018-02-14 11:12:26

wal start/stop: 000000020000000000000058 / 000000020000000000000058
database size: 38.3MB, backup size: 38.3MB
repository size: 4.5MB, repository backup size: 4.5MB
incr backup: 20180214-105603F_20180214-121044I
timestamp start/stop: 2018-02-14 12:10:44 / 2018-02-14 12:15:14
wal start/stop: 00000002000000000000005A / 00000002000000000000005A
database size: 38.3MB, backup size: 1.1MB
repository size: 4.5MB, repository backup size: 140.8KB
backup reference list: 20180214-105603F
[postgres@pgserver devdata]$

For the restore we use the option –set which allows us to specify the backup set we want to use for the restore. Note also the use of –type=time and –target

[postgres@pgserver log]$ pgbackrest --stanza=clustpgserver --log-level-console=info --type=time "--target=2018-02-14 11:39:28.024378+01" --db-path=/u01/devdata --set=20180214-105603F restore
2018-02-14 13:36:50.848 P00 INFO: restore command begin 1.28: --db1-path=/u01/devdata --log-level-console=info --repo-cipher-pass= --repo-cipher-type=aes-256-cbc --repo-path=/var/lib/pgbackrest --set=20180214-105603F --stanza=clustpgserver "--target=2018-02-14 11:39:28.024378+01" --type=time
2018-02-14 13:37:03.406 P00 INFO: restore backup set 20180214-105603F
...

At the end of the restore let’s see the contents of the recovery.done file

[postgres@pgserver devdata]$ cat recovery.conf
restore_command = '/usr/bin/pgbackrest --db1-path=/u01/devdata --log-level-console=info --stanza=clustpgserver archive-get %f "%p"'
recovery_target_time = '2018-02-14 11:39

If we start our new cluster, we can see in log files that PITR is starting
2018-02-14 13:54:23.824 CET [10049] LOG: starting point-in-time recovery to 2018-02-14 11:39:28.024378+01
And once the recovery finished, we can verify that the table article is present

postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# \d article
Table "public.article"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
nom | character varying(50) | | |
.
test=# table article;
nom
---------
printer
(1 row)
test=#

Conclusion:
In this blog I talked about some features about PgBackRest. But as already specified, there are many, many other options with this wonderful tool. The official documentation can give more information. In future blogs we will explore more with advanced configuration.

 

Cet article Backup and Restore PostgreSQL with PgBackRest II est apparu en premier sur Blog dbi services.

(DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=NO)

Thu, 2018-02-15 00:02

Do you have complex connection strings with DESCRIPTION_LIST, DESCRIPTION, ADDRESS_LIST, ADDRESS and a nice combination of FAILOVER and LOAD_BALANCE? You probably checked the documentation, telling you that FAILOVER=YES is the default at all levels, but LOAD_BALANCE=YES is the default only for DESCRIPTION_LIST. But when disaster recovery and availability is concerned, the documentation is not sufficient. I want to test it. And here is how I do it.

I don’t want to test it with the real configuration and stop the different instances. And I don’t need to. My way to test an address list is to define a tnsnames.ora with the connection string, such as the following:

NET_SERVICE_NAME=
NET_SERVICE_NAME=
(DESCRIPTION_LIST=
(DESCRIPTION=
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

I used localhost because I know it’s there and I don’t want to wait for the TCP timeout. But I use fake ports, which do not exist. So finally, a connection will never be established but I will be able to see all that are tried. I check them with strace on the connect() system call, with the following script:


for i in {1..10}
do
TNS_ADMIN=/tmp strace -T -e trace=connect sqlplus -s -L sys/oracle@NET_SERVICE_NAME as sysdba <<< "" 2>&1 | awk '
/sa_family=AF_INET, sin_port=htons/{
gsub(/[()]/," ") ; printf "%s ",$5
}
END{
print ""
}
'
done | sort | uniq

So, I used meaningful numbers for my fake ports: 101 and 102 for the addresses in the first description of the description list, and 201 and 202 for the address list in the second description. The awk script shows the sequence that was tried. And, because of the random round robin, I run them in a loop several times to see all patterns, aggregated by sort|uniq

So here is the result from the connection string above using the defaults for load balancing and failover:

101 102 201 202
201 202 101 102

The sequence within the address list is always in order (101,102 and 201,202) because LOAD_BALANCE=NO is the default there. But I have two combinations for the descriptions because LOAD_BALANCE=YES is the default in DESCRIPTION_LIST. Finally, all adresses are tried because FAILOVER=YES is the default at all levels.

LOAD_BALANCE

If I define LOAD_BALANCE at all levels, such as:

NET_SERVICE_NAME=
(DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(DESCRIPTION=(FAILOVER=YES)(LOAD_BALANCE=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=(FAILOVER=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

The result shows that all combinations can be tried in any order:

101 102 201 202
101 102 202 201
102 101 201 202
102 101 202 201
201 202 101 102
201 202 102 101
202 201 101 102
202 201 102 101

By running it in a large loop you will confirm that any address will be tried at most once.

FAILOVER

Now, If I set FAILOVER=NO within the first description:

NET_SERVICE_NAME=
(DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE= NO)
(DESCRIPTION=(FAILOVER= NO)(LOAD_BALANCE=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=(LOAD_BALANCE=NO )
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

the first attempt can be 101 or 102 (because of LOAD_BALANCING) but only one will be tried in this address list, because of no failover. Then, the second description is attempted (because FAILOVER=YES at description list level) and with all addresses there (because of LOAD_BALANCING=YES). The result of all possible combinations is:


101 201 202
102 201 202
102 202 201

So here it is. You can test any complex connection description to check what will be the possible connections and in which order they will be tried. From this, you can infer what will happen with a real configuration: the wait for TCP timeout for addresses tested on hosts that are not up, and the load balancing given be the different possible combinations.

 

Cet article (DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=NO) est apparu en premier sur Blog dbi services.

Full page logging in Postgres and Oracle

Wed, 2018-02-14 14:13

In my opinion, the volume of logging (aka redo log, aka xlog, aka WAL) is the most important factor for OLTP performance, availability and scalability, for several reasons:

  • This is the only structure where disk latency is a mandatory component of response time
  • This is a big part of the total volume of backups
  • This is sequential by nature, and very difficult to scale by parallelizing

In this post, I look at the volume of logging generated by some DML in Postgres and Oracle. I know Oracle quite well and just start to look at Postgres. The comparison here is not a contest but a way to better understand. For example, the default behavior of Postgres, with full_page_writes=on, is very similar to Oracle ‘begin backup’ mode. The comparison makes no sense for most of Postgres DBAs, but probably helps Oracle DBAs to understand it.

Measure WAL segment writes

Here is how I measured the volume of transaction log written: start the Postgres server with ‘strace -f’ and parse with ‘awk’ the open(), write() and close() calls:

sudo su postgres <<'END'
export PGDATA=/u01/pgdata
/usr/pgsql-10/bin/pg_ctl stop
strace -e trace=open,close,write,recvfrom -f /usr/pgsql-10/bin/pg_ctl start 2>&1 | awk '
/^[^[]/{
$0="[pid MAIN] "$0
}
/strace: Process [0-9][0-9]* attached/{
sub(/^.*strace: /,"strace: ") ; "ps -o cmd -hp " $3 |& getline proc[$3"]"] ; print "" ; print $0,proc[$3"]"] }
/open[(].*pg_wal[/].* = [0-9]*$/{
z=$0 ; gsub(qq," ") ; fd_wal[$2 $NF]=$4
}
/checkpoint;/{
total_written_wal=0
}
/write[(]/{
#pid=$2 ; sub("]","",$2) ; "ps -o cmd -hp " p |& getline proc[p"]"] z=$0 ; gsub("[(,]"," ") ; if ( fd_wal[$2 $4]>0 ) { written_wal[$2 $4]=written_wal[$2 $4]+$NF ; total_written_wal=total_written_wal+$NF } next
}
/close[(]/{
pid=$2 ; sub("[^0-9]","",pid) ;
z=$0 ; gsub("[()]"," ") ; if ( ( fd_wal[$2 $4]!="" ) && ( written_wal[$2 $4] > 0 ) ) {
printf " ( written %d bytes to %s -> total WAL segments: %.2f MB ) cmd=%s\n",written_wal[$2 $4],fd_wal[$2 $4],total_written_wal/1024/1024 , proc[$2] ; fd_wal[$2 $4]=""
} next
}
' qq='"'
END

Do not do that in production. This is experimentation in a lab. Do not attach strace to a critical process in production.

There’s probably an easier way to get the same information, maybe with postgres activity statistics, or through a size counting archive_command, so please don’t hesitate to comment. Anyway, from the ‘write()’ calls I am sure that I’m counting exactly what I want: the volume of logging written to disk. As an Oracle DBA used to LogWriter and its slave threads managing all writes, I started to trace only the WAL writer process but quickly realized that part ot the logging is directly written by my server process.

Postgres: insert

I create a table with some numbers and a 100 bytes character string.

create table demo as select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(0,0);

The first operation I test is the insert of 1 million rows.

insert into demo select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);
( written 4349952 bytes to pg_wal/000000010000000A0000005F -> total WAL segments: 4.16 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 8192 bytes to pg_wal/000000010000000A0000005F -> total WAL segments: 9.00 MB ) cmd=postgres: wal writer process
( written 17735680 bytes to pg_wal/000000010000000A00000060 -> total WAL segments: 20.07 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 3309568 bytes to pg_wal/000000010000000A00000060 -> total WAL segments: 31.40 MB ) cmd=postgres: wal writer process
( written 33783808 bytes to pg_wal/000000010000000A00000061 -> total WAL segments: 36.03 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 3997696 bytes to pg_wal/000000010000000A00000061 -> total WAL segments: 39.80 MB ) cmd=postgres: wal writer process
( written 49676288 bytes to pg_wal/000000010000000A00000062 -> total WAL segments: 51.19 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 65273856 bytes to pg_wal/000000010000000A00000063 -> total WAL segments: 66.06 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 79364096 bytes to pg_wal/000000010000000A00000064 -> total WAL segments: 82.04 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 6660096 bytes to pg_wal/000000010000000A00000064 -> total WAL segments: 82.39 MB ) cmd=postgres: wal writer process
( written 88285184 bytes to pg_wal/000000010000000A00000065 -> total WAL segments: 98.02 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 14491648 bytes to pg_wal/000000010000000A00000065 -> total WAL segments: 106.82 MB ) cmd=postgres: wal writer process
( written 101703680 bytes to pg_wal/000000010000000A00000066 -> total WAL segments: 113.99 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 17825792 bytes to pg_wal/000000010000000A00000066 -> total WAL segments: 117.19 MB ) cmd=postgres: wal writer process
( written 115769344 bytes to pg_wal/000000010000000A00000067 -> total WAL segments: 128.20 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 18661376 bytes to pg_wal/000000010000000A00000067 -> total WAL segments: 135.09 MB ) cmd=postgres: wal writer process
( written 19824640 bytes to pg_wal/000000010000000A00000068 -> total WAL segments: 144.17 MB ) cmd=postgres: wal writer process
( written 131350528 bytes to pg_wal/000000010000000A00000068 -> total WAL segments: 148.16 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 27435008 bytes to pg_wal/000000010000000A00000069 -> total WAL segments: 159.80 MB ) cmd=postgres: wal writer process
( written 140132352 bytes to pg_wal/000000010000000A00000069 -> total WAL segments: 159.80 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
INSERT 0 1000000

You can see that my ‘strace|awk’ script is running in the background and has counted about 160 MB of logging, partially from the ‘postgres: wal writer process’ and partly from ‘postgres: demo demo 192.168.56.122(38013)’ serving my connection.

The relation size as stored on disk is about 150 MB;

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 19231 | 1e+06 | 0 | 150
(1 row)

This makes sense. An insert has to write all new data into the log in order to be able to recover the pages until they are checkpointed.

Note that I have no index on this table for this test.

Postgres: update

I’m now updating one column for all rows.

update demo set b=b+1;
( written 150528000 bytes to pg_wal/000000010000000A0000006A -> total WAL segments: 4.01 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 162693120 bytes to pg_wal/000000010000000A0000006B -> total WAL segments: 17.84 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 29769728 bytes to pg_wal/000000010000000A0000006B -> total WAL segments: 28.44 MB ) cmd=postgres: wal writer process
...
( written 84287488 bytes to pg_wal/000000010000000A00000081 -> total WAL segments: 343.65 MB ) cmd=postgres: wal writer process
( written 453705728 bytes to pg_wal/000000010000000A00000082 -> total WAL segments: 347.36 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
UPDATE 1000001

I touched only a small part of the volume in bytes, but I touched all rows and all pages. An, even if only a few bytes are modified, Postgres logs the whole page to protect from fractured blocks in case of crash (pages partially written). So that’s about 150 MB. But postgres do not update rows in-place. The whole row is inserted in its new version, which means the whole volume again, which is another 150 MB. If we look at the size of the table, we can see 300MB of pages:

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-------------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 38462 | 1.21882e+06 | 0 | 300
(1 row)

So this update has generated even more logging: 347 MB.

Postgres: sparse update

Now updating only 1 row out of ten, still one column only:

update demo set b=b+1 where mod(a,10)=1;
( written 89923584 bytes to pg_wal/000000010000000A00000083 -> total WAL segments: 13.88 MB ) cmd=postgres: wal writer process
( written 469123072 bytes to pg_wal/000000010000000A00000084 -> total WAL segments: 22.98 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
...
( written 563576832 bytes to pg_wal/000000010000000A0000008D -> total WAL segments: 151.07 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 130940928 bytes to pg_wal/000000010000000A0000008D -> total WAL segments: 151.27 MB ) cmd=postgres: wal writer process
UPDATE 100000
analyze demo;

So, 10% of the rows had to be copied to their new version, which brings the table size to additional 15 MB.

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-------------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 40385 | 1.07267e+06 | 0 | 315
(1 row)

For these additional 15 MB, half of the table pages had to be modified (the current version having to point to the new version), and the logging generated was 150 MB. Because of MVCC at tuple level, doing something similar to ‘chained rows’ and ‘row migration’ for all updates, and because of full page logging, even sparse updates generate a lot log writes.

Postgres: delete

Here is a delete of those million rows:

delete from demo;
( written 576364544 bytes to pg_wal/000000010000000A0000008E -> total WAL segments: 6.44 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 134930432 bytes to pg_wal/000000010000000A0000008E -> total WAL segments: 6.73 MB ) cmd=postgres: wal writer process
( written 589225984 bytes to pg_wal/000000010000000A0000008F -> total WAL segments: 18.70 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
...
( written 162054144 bytes to pg_wal/000000010000000A00000099 -> total WAL segments: 184.70 MB ) cmd=postgres: wal writer process
( written 740352000 bytes to pg_wal/000000010000000A0000009A -> total WAL segments: 189.80 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
DELETE 1000001
( written 163217408 bytes to pg_wal/000000010000000A0000009A -> total WAL segments: 196.22 MB ) cmd=postgres: wal writer process

Marking tuples as deleted does not increase the table:

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 40385 | 275837 | 0 | 315
(1 row)

But all current tuples have to be marked as deleted and not visible once the transaction is committed. This touches all pages for the current version, which is more than 150 MB of logging here.

Postgres: vacuum

After two updates and a delete, I have old tuples in this table. It seems that VACUUM does not generate any logging:

vacuum demo;
( written 762445824 bytes to pg_wal/000000010000000A0000009B -> total WAL segments: 14.67 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
VACUUM

My guess (but remember that I am a newbie in Postgres) is that in case of a crash occurring before the next checkpoint we will just have to vacuum again. But this is not what was answered in the postgres-general list a few years ago.

Note that full page logging is not necessary for all changes, but only for the first change after the page was read from disk after a checkpoint. This is sufficient to cover future writes failures because recovery will start from there. Once we have full page logged, change vector is sufficient for further recovery. However, I had the same amount of WAL, 15 MB, when vacuuming after a checkpoint.

Oracle: insert

Let’s do some similar things in Oracle, which MVCC implementation is completely different: at block level, with undo logging.

SQL> create table demo as select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('0 to 0');
Table created.

I have exposed in a previous post how I get the delta values from V$MYSTAT join V$STATNAME using (STATISTIC#) for ‘redo size’, so no need to strace here. But we can see the same result by measuring the writes to redo log groups (do not double count the multiplexed members).

SQL> insert into demo select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('1 to 1000000');
1000001 rows inserted.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
141,342 155,218,876 4,380,448 104,411

This is about 150MB, which is the volume of the table:

SQL> exec dbms_stats.gather_table_stats(user,'DEMO');
PL/SQL procedure successfully completed.
SQL> select table_name,num_rows,blocks,(blocks*block_size/1024/1024) MB, avg_row_len from user_tables join dba_tablespaces using(tablespace_name) where table_name='DEMO';
 
TABLE_NAME NUM_ROWS BLOCKS MB AVG_ROW_LEN
------------------------------ ---------- ---------- ---------- -----------
DEMO 1000001 19280 150.625 131

Conclusion for inserts: all databases have to log the whole data inserted in order to be protected from instance crash. Note that Oracle has a way to insert directly into the file, bypassing the buffer cache, and then reduce the logging required for crash recovery. But I’m not doing bulk inserts here.

Oracle: update

The update in Oracle is done in-place. There is no need to copy the whole row (except in the rare cases where the row increases and do not fit into the block). However, the old value of the column must be copied for MVCC, into the UNDO segment. This is why we see 46 MB of ‘undo change vector size’ here.

SQL> update demo set b=b+1;
1000001 rows updated.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
170,777 105,301,308 48,641,772 82,221

The UNDO is only the change vector, not the full block. If you read about copies of full blocks to rollback segments, it is a confusion either from veterans of Oracle 5, or a misunderstanding of flashback features. The UNDO being stored in segments, written first into buffer cache, it is protected by redo logging, so about 46 MB of redo is actually the redo vector of undo vectors. The other 54 MB of redo is the new value of the update.

Oracle: sparse update

The logging of change vectors rather than full pages is even cheaper with sparse updates:

SQL> update demo set b=b+1 where mod(a,10)=1;
100001 rows updated.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
56,583 15,414,328 6,111,608 36,921

The volume of undo and redo generated is only 15 MB here, including 6 MB of undo vectors. This is really optimized and this is one reason why you should update only the columns changed (and not use the default non-dynamic update of Hibernate for example).

Oracle: delete

The delete has to mark all rows as deleted and because the space can immediately be reused then whole row must be logged into the UNDO, and this has to be logged into the REDO, so the delete generates lot of logging:

SQL> delete from demo;
1000001 rows deleted.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
2,124,823 403,755,892 240,302,088 1,093,821

I have no indexes here. With indexes, all index entries have to be marked as deleted, and this generates undo and redo vector because MVCC in Oracle is at block level: each block modification – for table or index – have to be logged.

Deleting a lot of rows is an expensive operation in Oracle. For bulk purges, it is often better to truncate and insert /*+ append */ when possible (as in non-atomic materialized view refresh). Partitioning helps for that for example to purge old data when partitioned on date.

Postgres without full page logging

Given the huge overhead, is full page logging really required? There are plans to avoid it, mentioned in the Postgres ToDo wiki, or at least to keep it only short term for crash recovery and not media recovery. Another possibility is to implement a checksum on the blocks so that fractured blocks can be detected. Then, when detected, the fractured blocks may not need full page logging to recover them if we can restore a previous backup. This takes longer to recover, but can be acceptable given the low probability of this kind of failure. In addition to that, when you have a physical standby synchronized with log-shipping, you have a easy way to recover without having to restore files. But you need a checksum to detect the problem.

Without a checksum, the problem is the detection of partial writes. But if you trust your storage and if you failover to the standby in case of a crash, you may accept to set full_page_writes=off and this is what I did here.


insert into demo select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);
...
( written 125255680 bytes to pg_wal/000000010000000A000000E3 -> total WAL segments: 140.65 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
INSERT 0 1000000

The insert still have to log all new data: 140 MB.


update demo set b=b+1;
...
( written 72613888 bytes to pg_wal/000000010000000A000000F2 -> total WAL segments: 213.02 MB ) cmd=postgres: wal writer process
UPDATE 1000001

The update has to log only what is modified, but because of Postgres MVCC implementation, the whole row has to be written in its new version, and the old ones have their pointer updated: 210 MB here.


update demo set b=b+1 where mod(a,10)=1;
( written 305709056 bytes to pg_wal/000000010000000A000000F3 -> total WAL segments: 1.96 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 72613888 bytes to pg_wal/000000010000000A000000F3 -> total WAL segments: 5.62 MB ) cmd=postgres: wal writer process
( written 75718656 bytes to pg_wal/000000010000000A000000F4 -> total WAL segments: 9.65 MB ) cmd=postgres: wal writer process
( written 310665216 bytes to pg_wal/000000010000000A000000F4 -> total WAL segments: 9.65 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
UPDATE 100000

The sparse update benefits from logging only the changed rows: 10 MB here. This one is even smaller than with Oracle because there’s no UNDO to write here: the old values stay in-place.


delete from demo;
( written 323256320 bytes to pg_wal/000000010000000A000000F5 -> total WAL segments: 11.27 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 338829312 bytes to pg_wal/000000010000000A000000F6 -> total WAL segments: 26.92 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 76562432 bytes to pg_wal/000000010000000A000000F6 -> total WAL segments: 31.41 MB ) cmd=postgres: wal writer process
( written 345415680 bytes to pg_wal/000000010000000A000000F7 -> total WAL segments: 39.73 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 83410944 bytes to pg_wal/000000010000000A000000F7 -> total WAL segments: 40.41 MB ) cmd=postgres: wal writer process
DELETE 1000001

The delete is cheap when full_page_writes=off because there’s only the visibility is changed but data remains (until committed and vacuumed). If you have a lot of rows to delete, then consider to set full_page_writes=off and be sure to have a backup to restore in case of crash.

Oracle full page logging in backup mode

So, Oracle by default does not need to protect from fractured blocks, because they can be detected. If the storage crashes while a block is partially written, the block is corrupt. Thanks to the checksum, this corruption will be detected during recovery (or even earlier depending on DB_BLOCK_CHECKSUM and DB_LOST_WRITE_PROTECT). The redo is not sufficient, as it contains only change vectors, but you can recover from the last backup and Oracle can do a simple block recover. This recovery can also be done from the standby database.

However, full page logging exists in Oracle. When running backup from a non-Oracle tool, not aware of block checksum, you need to enclose the copy or snapshot between ‘begin backup’ and ‘end backup’. You do this because online backup may read partially updated blocks, and without the checksum, cannot detect it. A corrupt backup is not very useful and this is why this backup mode will generate more redo to be able to recover them. This is very similar to full page logging: the redo generated for the first modification of the buffer will store the whole block. Next modifications, until buffer is checkpointed, will need only the change vectors.

I think the first article I’ve ever written was a description of the Oracle backup mode. And it is still visible thanks to archive.org only because it was published on… Google Knol!

So, here is the same run with Oracle in backup mode.

Insert does not change a lot as it fills full blocks:

SQL> insert into demo select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('1 to 1000000');
1000000 rows created.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
141,376 156,527,072 4,380,448 124,195

Full update of one column generates same undo, but more than 2x redo because of full page logging:

SQL> update demo set b=b+1;
1000001 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
170,778 238,317,632 48,641,772 104,640

Sparse update is exactly the same as full update because this 10% touches all pages:

SQL> update demo set b=b+1 where mod(a,10)=1;
100001 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
319,622 240,502,284 17,832,196 192,815

Delete generates even more because there’s all the the UNDO in addition to all data pages:

SQL> delete from demo;
1000001 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
2,125,285 558,510,928 240,303,768 1,143,131

So what?

Beyond the very different implementation of Postgres and Oracle, we can see that we have flexibility: the large logging generated by Postgres by default may be reduced in some cases, and the minimal logging which is the default for Oracle may be larger in some situations. The most important, as for all technologies, is to understand how it works. Only then you can do the right choice to balance between performance, availability, and cost. Understand how it works means: read the docs (how it is supposed to work) and test (how it actually works). With Oracle there’s additional information from a huge community testing and using it for decades. With Postgres, as with all Open Source projects, the source code with comments is an amazing documentation.

 

Cet article Full page logging in Postgres and Oracle est apparu en premier sur Blog dbi services.

Backup and Restore PostgreSQL with PgBackRest I

Wed, 2018-02-14 09:58

Many tools can be used to backup PostgreSQL databases. In this blog I will talk about PgBackRest which is a simple tool that can be used to backup and restore a PostgreSQL database. Full, differential, and incremental backups are supported.
In this first blog I will present a basic configuration of pgbackprest. Our configuration is composed of only one cluster and pgbackrest is installed on the server hosting the database. The goal is to explain a first use of PgBackRest.
Below our configuration
Server with Oracle Linux 7
PostgreSQL 10.1
PgBackRest 1.28
We supposed that the linux box and PostgreSQL 10.1 are already installed. So let’s install PgBackRest.

root@pgserver ~]# yum search pgbackrest
Loaded plugins: langpacks, ulninfo
=========================== N/S matched: pgbackrest ============================
pgbackrest.noarch : Reliable PostgreSQL Backup & Restore
pgbackrest.x86_64 : Reliable PostgreSQL Backup & Restore
Name and summary matches only, use "search all" for everything

And then we can install PgBackRest
[root@pgserver ~]# yum install pgbackrest.x86_64
After we can check the installation using pgbackrest command

[postgres@pgserver ~]$ /usr/bin/pgbackrest
pgBackRest 1.28 - General help
Usage:
pgbackrest [options] [command] Commands:
archive-get Get a WAL segment from the archive.
archive-push Push a WAL segment to the archive.
backup Backup a database cluster.
check Check the configuration.
expire Expire backups that exceed retention.
help Get help.
info Retrieve information about backups.
restore Restore a database cluster.
stanza-create Create the required stanza data.
stanza-delete Delete a stanza.
stanza-upgrade Upgrade a stanza.
start Allow pgBackRest processes to run.
stop Stop pgBackRest processes from running.
version Get version.
Use 'pgbackrest help [command]' for more information.

The configuration of PgBackRest is very easy, it consists of a configuration pgbackrest.conf file that must be edited. In my case the file is located in /etc. As specified, we will use a very basic configuration file.
Below the contents of my configuration file

[root@pgserver etc]# cat pgbackrest.conf
[global] repo-path=/var/lib/pgbackrest
[clustpgserver] db-path=/var/lib/pgsql/10/data
retention-full=2
[root@pgserver etc]#

In the file above,
• repo-path is where backup will be stored,
• clusterpgserver is the name of my cluster stanza (free to take what you want as name). A stanza is the configuration for a PostgreSQL database cluster that defines where it is located, how it will be backed up, archiving options, etc.
• db-path is the path of my database files
• retention-full : configure retention to 2 full backups
A complete list can be found here
Once the configuration file done, we can now create the stanza with the command create-stanza. Note that my PostgreSQL cluster is using the port 5435.

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 stanza-create
2018-02-08 14:01:49.293 P00 INFO: stanza-create command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:01:50.707 P00 INFO: stanza-create command end: completed successfully
[postgres@pgserver ~]$

After we create the stanza, we can verify that the configuration is fine using the check command

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 check
2018-02-08 14:03:42.095 P00 INFO: check command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:03:48.805 P00 INFO: WAL segment 00000001000000000000000C successfully stored in the archive at '/var/lib/pgbackrest/archive/clustpgserver/10-1/0000000100000000/00000001000000000000000C-c387b901a257bac304f27865478fd9f768de83d6.gz'
2018-02-08 14:03:48.808 P00 INFO: check command end: completed successfully
[postgres@pgserver ~]$

Since we did not take yet any backup with PgBackRest, the command info for the backups returns error

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info info
stanza: clustpgserver
status: error (no valid backups)
db (current)
wal archive min/max (10-1): 00000001000000000000000C / 00000001000000000000000C
[postgres@pgserver ~]$

Now let’s take a backup

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-08 14:06:52.706 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
WARN: no prior backup exists, incr backup has been changed to full
2018-02-08 14:06:54.734 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:06:53": backup begins after the next regular checkpoint completes
2018-02-08 14:06:55.159 P00 INFO: backup start archive = 00000001000000000000000E, lsn = 0/E000060
2018-02-08 14:07:09.867 P01 INFO: backup file /var/lib/pgsql/10/data/base/13805/1255 (592KB, 2%) checksum 61f284092cabf44a30d1442ef6dd075b2e346b7f


2018-02-08 14:08:34.709 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-08 14:08:34.895 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-08 14:08:34.932 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

We can see that by default PgBackRest will try to do an incremental backup. But as there is no full backup yet, a full backup will be done. Once full backup done, all future backups will be incremental unless we specify the type of backup.

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 backup
2018-02-08 14:26:25.590 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver
2018-02-08 14:26:29.314 P00 INFO: last backup label = 20180208-140653F, version = 1.28
2018-02-08 14:26:30.135 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:26:26": backup begins after the next regular checkpoint completes
...
2018-02-08 14:27:01.408 P00 INFO: expire command begin 1.28: --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-archive=2 --retention-full=2 --stanza=clustpgserver
2018-02-08 14:27:01.558 P00 INFO: full backup total < 2 - using oldest full backup for 10-1 archive retention
2018-02-08 14:27:01.589 P00 INFO: expire command end: completed successfully
[postgres@pgserver ~]$

If we want to perform another full backup we can specify the option –type=full

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info --db-port=5435 --type=full backup
2018-02-08 14:30:05.961 P00 INFO: backup command begin 1.28: --db1-path=/var/lib/pgsql/10/data --db1-port=5435 --log-level-console=info --repo-path=/var/lib/pgbackrest --retention-full=2 --stanza=clustpgserver --type=full
2018-02-08 14:30:08.472 P00 INFO: execute non-exclusive pg_start_backup() with label "pgBackRest backup started at 2018-02-08 14:30:06": backup begins after the next regular checkpoint completes
2018-02-08 14:30:08.993 P00 INFO: backup start archive = 000000010000000000000012, lsn = 0/12000028
….
….

To have info about our backups
[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver info
stanza: clustpgserver
status: ok
db (current)
wal archive min/max (10-1): 00000001000000000000000E / 000000010000000000000012
full backup: 20180208-140653F
timestamp start/stop: 2018-02-08 14:06:53 / 2018-02-08 14:08:19
wal start/stop: 00000001000000000000000E / 00000001000000000000000E
database size: 23.2MB, backup size: 23.2MB
repository size: 2.7MB, repository backup size: 2.7MB
incr backup: 20180208-140653F_20180208-142626I
timestamp start/stop: 2018-02-08 14:26:26 / 2018-02-08 14:26:52
wal start/stop: 000000010000000000000010 / 000000010000000000000010
database size: 23.2MB, backup size: 8.2KB
repository size: 2.7MB, repository backup size: 472B
backup reference list: 20180208-140653F
full backup: 20180208-143006F
timestamp start/stop: 2018-02-08 14:30:06 / 2018-02-08 14:31:30
wal start/stop: 000000010000000000000012 / 000000010000000000000012
database size: 23.2MB, backup size: 23.2MB
repository size: 2.7MB, repository backup size: 2.7MB
[postgres@pgserver ~]$

Now that we see how to perform backup with pgbackrest, let’s see how to restore.
First let identify the directory of our database files

[postgres@pgserver ~]$ psql
psql (10.1)
Type "help" for help.
postgres=# show data_directory ;
data_directory
------------------------
/var/lib/pgsql/10/data
(1 row)
postgres=#

And let’s remove all files in the directory

[postgres@pgserver data]$ pwd
/var/lib/pgsql/10/data
[postgres@pgserver data]$ ls
base pg_dynshmem pg_notify pg_stat_tmp pg_wal postmaster.pid
current_logfiles pg_hba.conf pg_replslot pg_subtrans pg_xact
global pg_ident.conf pg_serial pg_tblspc postgresql.auto.conf
log pg_logical pg_snapshots pg_twophase postgresql.conf
pg_commit_ts pg_multixact pg_stat PG_VERSION postmaster.opts
[postgres@pgserver data]$ rm -rf *
[postgres@pgserver data]$

Now if we try to connect, of course we will get errors

[postgres@pgserver data]$ psql
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5435"?
[postgres@pgserver data]$

So let’s restore with PgBackRest with the restore command

[postgres@pgserver ~]$ pgbackrest --stanza=clustpgserver --log-level-console=info restore
2018-02-08 14:52:01.845 P00 INFO: restore command begin 1.28: --db1-path=/var/lib/pgsql/10/data --log-level-console=info --repo-path=/var/lib/pgbackrest --stanza=clustpgserver
2018-02-08 14:52:03.490 P00 INFO: restore backup set 20180208-143006F
2018-02-08 14:52:21.904 P01 INFO: restore file /var/lib/pgsql/10/data/base/13805/1255 (592KB, 2%) checksum 61f284092cabf44a30d1442ef6dd075b2e346b7f
….
….
2018-02-08 14:53:21.186 P00 INFO: write /var/lib/pgsql/10/data/recovery.conf
2018-02-08 14:53:23.948 P00 INFO: restore global/pg_control (performed last to ensure aborted restores cannot be started)
2018-02-08 14:53:28.258 P00 INFO: restore command end: completed successfully
[postgres@pgserver ~]$

At the end of the backup, a recovery.conf file is created in the data directory

[postgres@pgserver data]$ cat recovery.conf
restore_command = '/usr/bin/pgbackrest --log-level-console=info --stanza=clustpgserver archive-get %f "%p"'

Now we can restart the PostgreSQL cluster

[postgres@pgserver data]$ pg_ctl start
waiting for server to start....2018-02-08 14:57:06.519 CET [4742] LOG: listening on IPv4 address "0.0.0.0", port 5435
2018-02-08 14:57:06.522 CET [4742] LOG: listening on IPv6 address "::", port 5435
2018-02-08 14:57:06.533 CET [4742] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5435"
2018-02-08 14:57:06.551 CET [4742] LOG: listening on Unix socket "/tmp/.s.PGSQL.5435"
2018-02-08 14:57:06.645 CET [4742] LOG: redirecting log output to logging collector process
2018-02-08 14:57:06.645 CET [4742] HINT: Future log output will appear in directory "log".
...... done
server started

And then connect

[postgres@pgserver data]$ psql
psql (10.1)
Type "help" for help.
postgres=#

Conclusion
In this blog we shown in a simple configuration how to perform backup using PgBackRest. This basic configuration can help for first use of PgBackRest. In future articles we will go further in an advanced use of this tool.

 

Cet article Backup and Restore PostgreSQL with PgBackRest I est apparu en premier sur Blog dbi services.

SQL Server on Docker and network bridge considerations

Wed, 2018-02-14 06:46

Let’s continue with this blog post series about SQL Server and Docker. A couple of days ago, I was in a customer shop that already implemented SQL Server 2017 on Linux as Docker containers. It was definitely a very interesting day with a lot of customer experience and feedbacks. We discussed with him about lot of architecture scenarios.

The interesting point here is I was able to compare with a previous customer who used docker containers for a while in a completely different way. Indeed, my new customer implemented a Docker infrastructure exclusively based on SQL Server containers whereas the older one already containerized its applications that were connected to an external and non-containerized SQL Server environment.

Use case 1 – Containerized apps and virtualized SQL Server environments Use case 2 – SQL Server containers and virtualized applications  blog 128 - 1- docker archi 2  blog 128 - 1 - docker archi 1

 

In this blog post I want to focus on the first use case in terms of networks.

Connecting to an outside SQL Server (from a docker perspective) is probably an intermediate solution for many customers who already deal with mission-critical environments implying very restrictive high-availability scenarios and when very high performance is required as well. Don’t get me wrong. I’m not saying docker is not designed for mission critical scenarios but let’s say that fear of unknown things, as virtualization before, is still predominant, at least for this kind of scenario. I always keep in mind the repetitive customer question: is Docker ready for production and for databases? Connecting to a non-containerized SQL Server environment may make sense here at least to speed containers adoption. That’s my guess but feel free to comment with your thoughts!

So, in this context we may use different Docker network topologies. I spent some times to study and to discuss with customers about implemented network topologies in their context. For simple Docker infrastructures (without orchestrators like Swarm or Kubernetes) Docker bridges seem to be predominant with either Docker0 bridges or user-defined bridges.

 

  • Docker default bridge (Docker0)

For very limited Docker topologies, default network settings will be probably sufficient with Docker0 bridge. It is probably the case of my latest customer with only 5 SQL Server containers on the top of one Docker engine. By default, each container created without any network specification (and any Docker engine setting customization) will have one network interface sitting on the docker0 bridge with an IP from 172.17.0.0/16 CIDR or whichever CIDR you have configured docker to use. But did you wonder what is exactly a bridge on Docker world?

Let’s have a deeper look on it with a very simple example concerning one docker engine that includes two containers based on microsoft/mssql-tools each and one outside SQL Server that runs on the top of Hyper-V virtual machine. The below picture shows some network details that I will explain later in this blog post.

blog 128 - 3 - docker network bridge

My 2 containers can communicate together because they are sitting on the same network bridge and they are also able to communicate with my database server through the NAT mechanism. IP masquerading and IP forwarding is enabled on my Docker host.

$ sudo docker run -tid --name docker1 microsoft/mssql-tools
77b501fe29af322dd2d1da2824d339a60ba3080c1e61a2332b3cf563755dd3e3

$ sudo docker run -tid --name docker2 microsoft/mssql-tools
3f2ba669591a1889068240041332f02faf970e3adc85619adbf952d5c135d3f4

$ sudo docker ps
CONTAINER ID        IMAGE                   COMMAND                  CREATED             STATUS              PORTS               NAMES
3f2ba669591a        microsoft/mssql-tools   "/bin/sh -c /bin/bash"   7 seconds ago       Up 6 seconds                            docker2
77b501fe29af        microsoft/mssql-tools   "/bin/sh -c /bin/bash"   11 seconds ago      Up 10 seconds                           docker1

 

Let’s take a look at the network configuration of each container. As a reminder, each network object represents a layer 2 broadcast domain with a layer 3 subnet as shown below. Each container is attached to a network through a specific endpoint.

$ sudo docker inspect docker1
[
"Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:02",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "985f25500e3d0c55d419790f1ac446f92c8d1090dddfd69987a52aab0717e630",
                    "EndpointID": "bd82669031ad87ddcb61eaa2dad823d89ca86cae92c4034d4925009aae634c14",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:02",
                    "DriverOpts": null
                }
            }
]

$sudo docker inspect docker2
[
"Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.3",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:03",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "985f25500e3d0c55d419790f1ac446f92c8d1090dddfd69987a52aab0717e630",
                    "EndpointID": "140cd8764506344958e9a9725d1c2513f67e56b2c4a1fc67f317c3e555764c1e",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:03",
                    "DriverOpts": null
                }
            }
]

 

To summarize, two IP addresses have been assigned for Docker1 container (172.17.0.2) and Docker2 container (172.17.0.3) in the IP address interval defined by the Docker0 bridge from the Docker internal IPAM module. Each network interface is created with their own MAC address and the gateway IP address (172.17.0.1) for both containers corresponds to the Docker0 bridge interface.

$ sudo ip a show docker0
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:2a:d0:7e:76 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:2aff:fed0:7e76/64 scope link
       valid_lft forever preferred_lft forever

 

Let’s try to connect from the both containers to my SQL Server database:

$ sudo docker exec -it docker1
…
$ sudo docker exec -it docker2
...

 

Then on each container let’s run the following sqlcmd command:

sqlcmd -S 192.168.40.30,1450 -Usa -Ptoto

 

Finally let’s switch on the SQL Server instance and let’s get a picture of existing connections (IP Address 192.168.40.30 and port 1450).

SELECT 
	c.client_net_address,
	c.client_tcp_port,
	c.local_net_address,
	c.protocol_type,
	c.auth_scheme,
	s.program_name,
	s.is_user_process
FROM sys.dm_exec_connections AS c
JOIN sys.dm_exec_sessions AS s ON c.session_id = s.session_id
WHERE client_net_address <> '<local machine>'

 

blog 128 - 4 - docker network bridge sqlcmd

We may notice that the IP address is basically the same (192.168.40.50) indicating we are using NAT to connect from each container.

Let’s go back to the Docker engine network configuration. After creating my 2 containers, we may notice the creation of 2 additional network interfaces.

$ ip a show | grep veth*
12: veth45297ff@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP
14: veth46a8316@if13: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP

 

What are they? At this point, we are entering to Linux network namespace world. You can read further technical details on the internet but to keep simple network namespace concepts, I would say they allow to run different and separate network instances (including routing tables) that operate independent of each other. In other words, there is a way to isolate different networks from each other based on the same physical network device. Assuming we are using docker bridge type networks, when creating a container, in background we are creating a dedicated network namespace that includes a virtual ethernet interface which comes in interconnected pairs. In fact, a virtual ethernet interface acts as a tube to connect a Docker container namespace (in this context) to the outside world via the default / global namespace where the physical interface exists.

Before digging further into details about virtual interfaces let’s say by default Docker doesn’t expose network namespace information because it uses it own libcontainer and the microsoft/mssql-tools docker image is based on a simplified Linux image that doesn’t include network tools to easily show virtual interface information. So, a workaround is to expose a Docker container namespace into the host.

First we have to find out the process id of the container and then link its corresponding proc namespace to /var/run/netns host directory as shown below:

$ sudo docker inspect --format '{{.State.Pid}}' docker1
2094
$ sudo ln -s /proc/2094/ns/net /var/run/netns/ns-2094

 

Then we may use ip netns command to extract the network information

$ sudo ip netns
ns-2094 (id: 0)
$ sudo ip netns exec ns-2094 ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
11: eth0@if12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

 

Here we go. The interesting information is the container network interface 11: eth0@if12

So, the first pair is the eth0 interface on the Docker container and the “outside” pair corresponds to the interface number 12. On the host the interface 12 corresponds to the virtual ethernet adapter veth45297ff. Note we may also find out the pair corresponding to the container interface (@if11).

$ ip a | grep "^12"
12: veth45297ff@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP

 

Finally, let’s take a look at the bridge used by the virtual ethernet adapter veth45297ff

$ sudo brctl show
bridge name     bridge id               STP enabled     interfaces
docker0         8000.02422ad07e76       no              veth45297ff
                                                        veth46a8316

 

The other veth (46a8316) corresponds to my second docker2 container.

 

  • User-defined network bridges

But as said previously using the Docker0 bridge is only suitable for very limited scenarios. User-defined bridges are more prevalent with more complex scenarios like microservice applications because they offer a better isolation between containers and the outside world as well as a better manageability and customization. At this stage we may also introduce macvlan networks but probably in the next blog post …

For example, let’s say you want to create 2 isolated network bridges for a 3-tiers application. The users will access the web server (from the exposed port) throughout the first network (frontend-server). But in the same time, you also want to prevent containers that sit on this network to make connections to the outside world. The second network (backend-server) will host containers that must have access to both the outside SQL Server database and the web server.

blog 128 - 5 - docker network bridge segregation

User-defined networks is a good solution to address these requirements. Let’s create two user-defined networks. Note by default containers may make connections to the outside world but the outside is not able to make connections to the containers without exposing listen ports. This is why I disabled ip masquerading (com.docker.network.bridge.enable_ip_masquerade=false) for the frontend-server network to meet the above requirements.

$sudo docker network create \
    --driver bridge \
    --subnet 172.20.0.0/16 \
  --gateway 172.20.0.1 \
  backend-server  
$sudo docker network create \
    --driver bridge \
    --subnet 172.19.0.0/16 \
    --gateway 172.19.0.1 \
    --opt com.docker.network.bridge.enable_ip_masquerade=false \
  frontend-server
$ sudo docker network ls 
NETWORK ID          NAME                DRIVER              SCOPE
5c6f48269d2b        backend-server      bridge              local
985f25500e3d        bridge              bridge              local
b1fbde4f4674        frontend-server     bridge              local
ad52b859e3f9        host                host                local
1beda56f93d3        none                null                local

 

Let’s now take a look at the corresponding iptables masquerading rules on my host machine:

$ sudo iptables -t nat -L -n | grep -i "masquerade"
MASQUERADE  all  --  172.20.0.0/16        0.0.0.0/0
MASQUERADE  all  --  172.17.0.0/16        0.0.0.0/0

 

You may notice only the Docker0 (172.17.0.0/16) and backend-server (172.20.0.0/16) bridges are allowed for ip masquerading.

Then let’s create 2 containers with the two first ones (docker1 and docker2) that will sit on the frontend-server network and the second one (docker2) on the backend-server network. For convenient purposes, I setup fixed hostnames for each container. I also used a different ubuntu image that provides this time all necessary network tools including ping command.

$ sudo docker run -d --rm --name=docker1 --hostname=docker1 --net=frontend-server -it smakam/myubuntu:v6 bash

$ sudo docker run -d --rm --name=docker2 --hostname=docker2 --net=frontend-server -it smakam/myubuntu:v6 bash

$sudo docker run -d --rm --name=docker3 --hostname=docker3 --net=backend-server -it smakam/myubuntu:v6 bash

$ sudo docker ps
CONTAINER ID        IMAGE                COMMAND             CREATED             STATUS              PORTS               NAMES
225ee13c38f7        smakam/myubuntu:v6   "bash"              2 minutes ago       Up 2 minutes                            docker3
d95014602fe2        smakam/myubuntu:v6   "bash"              4 minutes ago       Up 4 minutes                            docker2
1d9645f61245        smakam/myubuntu:v6   "bash"              4 minutes ago       Up 4 minutes                            docker1

 

First, probably one of the biggest advantages of using user-defined networks (unlike Docker0 bridge) is the ability to use automatic DNS resolution between containers on the same user-defined subnet on the same host (this is default behavior but you can override DNS settings by specifying –dns parameter at the container creation time). In fact, Docker applies update on the /etc/hosts file of each container when adding / deleting containers.

As expected, I may ping docker2 container from docker1 container and vice-versa but the same doesn’t apply between neither docker1 and docker3 nor docker2 and docker3 because they are not sitting on the same network bridge.

$ sudo docker exec -ti docker1 ping -c2 docker2
PING docker2 (172.19.0.3) 56(84) bytes of data.
64 bytes from docker2.frontend-server (172.19.0.3): icmp_seq=1 ttl=64 time=0.088 ms
64 bytes from docker2.frontend-server (172.19.0.3): icmp_seq=2 ttl=64 time=0.058 ms
…
$ sudo docker exec -ti docker2 ping -c2 docker1
PING docker1 (172.19.0.2) 56(84) bytes of data.
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=1 ttl=64 time=0.084 ms
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=2 ttl=64 time=0.054 ms
...
$ sudo docker exec -ti docker1 ping -c2 docker3
ping: unknown host docker3
...

 

From a network perspective, on the host we may notice the creation of two additional bridge interfaces and 3 virtual Ethernet adapters after the creation of the containers.

$ brctl show
bridge name     bridge id               STP enabled     interfaces
br-5c6f48269d2b         8000.0242ddad1660       no              veth79ae355
br-b1fbde4f4674         8000.02424bebccdd       no              vethb66deb8
                                                        vethbf4ab2d
docker0         8000.02422ad07e76       no
$ ip a | egrep "^[1-9][1-9]"
25: br-5c6f48269d2b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
28: br-b1fbde4f4674: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
58: vethb66deb8@if57: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-b1fbde4f4674 state UP
64: veth79ae355@if63: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-5c6f48269d2b state UP

 

If I want to make the docker3 container reachable from docker2 container I may simply connect the latter to the corresponding network as shown below:

$ sudo docker network connect backend-server docker2

$ sudo docker inspect docker2
[
"Networks": {
                "backend-server": {
                    "IPAMConfig": {},
                    "Links": null,
                    "Aliases": [
                        "d95014602fe2"
                    ],
                    "NetworkID": "5c6f48269d2b752bf1f43efb94437957359c6a72675380c16e11b2f8c4ecaaa1",
                    "EndpointID": "4daef42782b22832fc98485c27a0f117db5720e11d806ab8d8cf83e844ca6b81",
                    "Gateway": "172.20.0.1",
                    "IPAddress": "172.20.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:14:00:03",
                    "DriverOpts": null
                },
                "frontend-server": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "d95014602fe2"
                    ],
                    "NetworkID": "b1fbde4f4674386a0e01b7ccdee64ed8b08bd8505cd7f0021487d32951035570",
                    "EndpointID": "651ad7eaad994a06658941cda7e51068a459722c6d10850a4b546382c44fff86",
                    "Gateway": "172.19.0.1",
                    "IPAddress": "172.19.0.3",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:13:00:03",
                    "DriverOpts": null
                }
            }
]

 

You may notice the container is connected to the frontend-server and backend-server as well thanks to an additional network interface created at same time.

$ sudo docker exec -it docker2 ip a show | grep eth
59: eth0@if60: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:13:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.3/16 brd 172.19.255.255 scope global eth0
68: eth2@if69: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:14:00:03 brd ff:ff:ff:ff:ff:ff
    inet 172.20.0.3/16 brd 172.20.255.255 scope global eth2

 

Pinging both docker1 container and docker3 container from docker2 container is successful now.

$ sudo docker exec -it docker2 ping -c2 docker1
PING docker1 (172.19.0.2) 56(84) bytes of data.
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=1 ttl=64 time=0.053 ms
64 bytes from docker1.frontend-server (172.19.0.2): icmp_seq=2 ttl=64 time=0.052 ms
…
$ sudo docker exec -it docker2 ping -c2 docker3
PING docker3 (172.20.0.2) 56(84) bytes of data.
64 bytes from docker3.backend-server (172.20.0.2): icmp_seq=1 ttl=64 time=0.082 ms
64 bytes from docker3.backend-server (172.20.0.2): icmp_seq=2 ttl=64 time=0.054 ms
…

 

In this blog post, we surfaced Docker network bridges and use cases we may have to deal with SQL Server instances regarding the context. As a reminder, user-defined networks may allow to define fine-grained policy rules to interconnect containers on different subnets. This is basically what we may want to achieve with microservices applications. Indeed, such applications include some components that need to span multiple networks (backend and frontend networks) whereas other ones should by isolated (even from outside) regarding their role.

Happy containerization!

 

 

 

 

Cet article SQL Server on Docker and network bridge considerations est apparu en premier sur Blog dbi services.

Could you trust option_packs_usage_statistics.sql ?

Wed, 2018-02-14 04:17
Introduction

As a former Oracle LMS qualified auditor my opinion is sometimes requested before/during/after an Oracle LMS audit or simply to ensure a customer that his Oracle database is 100% in conformity with Oracle Licensing Policy. Even if

“The Oracle License Management Services (LMS) Group is the only Oracle group authorized to review and provide opinions on compliance status and will provide guidance, education and impartial opinions on a customer or partner’s compliance state. For more information please visit the following website: http://www.oracle.com/corporate/lms.”

I very do hope that you will find interesting tips in this blog.

Most of the time when a customer would like to check which Oracle options are used by his database infrastructure he is using the well known script “option_packs_usage_statistics.sql”. dbi services did checks of the options detected by the script provided by Oracle (My Oracle Support DOC ID 1317265.1). Depending on your database usage this script will detect the usage of different options, but could you really trust the output of this script and how to interpret the output ?

Could we trust the output of option_packs_usage_statistics.sql ?

The answer is quite easy and short: NO you can’t !

Why? Because as for any software there are some bugs and these bugs lead to false positive detection. The good news is that some of these false positive are documented on My Oracle Support. Indeed the script options_packs_usage_statistics.sql used and provided by Oracle has 14 documented bugs (My Oracle Support Doc ID 1309070.1) and some other non-documented bugs (eg. My Oracle Support BUG 17164904). These bugs are related to:

1.    Bug 11902001 – Exclude default users for Feature usage tracking for Securefiles option
2.    Bug 11902142 – Exclude default users for Feature usage tracking for Advanced Compression option
3.    Bug 19618850 – SOLUTION TO PREVENT UNINTENTED ORACLE OPTION USAGE
4.    Query against DBA_FEATURE_USAGE_STATISTICS is not a true test for use of SDO
5.    Bug 16088534 : RMAN default Backup BZIP2 Compression feature is reported wrongly as as an Advanced Compression feature
6.    Bug 22122625 – GETTING FALSE POSITIVES ON USAGE OF ADVANCED INDEX COMPRESSION
7.    Bug 24844549 – ADVANCED INDEX COMPRESSION SHOWS USAGE IN DBA_FEATURE_USAGE_STATISTICS WITH HCC
8.    Bug 16859747 – DBA_FEATURE_USAGE_STATISTICS SHOWS INCORRECT USAGE FOR HEAPCOMPRESSION
9.    Bug 16563444 – HEAT MAP FEATURE USAGE TRACKING IS NOT CORRECT
10.    Bug 19317899 – IMC: IN-MEMORY OPTION IS REPORTED AS BEING USED EVEN INMEMORY_SIZE IS 0
11.    Bug 19308780 – DO NOT FEATURE TRACK OBJECTS FOR IM WHEN INMEMORY_SIZE = 0
12.    Bug 21248059 – DBA_FEATURE_USAGE_STATISTICS BUG IN TRACKING “HYBRID COLUMNAR COMPRESSION” FEAT
13.    Bug 25661076 – DBA_FEATURE_USAGE_STATISTICS INCORRECTLY SHOWS SPATIAL USAGE IN 12C
14.    Bug 23734270 – DBA_FEATURE_USAGE_STATISTICS SHOWS PERMANENT USAGE OF REAL-TIME SQL MONITORING

These bugs may lead to the detection of features such as : Automatic Maintenance – SQL Tuning advisor & Automatic SQL Tuning Advisor, Real-Time SQL monitoring, Advanced security – Oracle Utility Datapump (Export) and Oracle Utility Datapump (Import), Advanced Compression – Heat Map, Advanced Compression – Oracle Utility Datapump (Export) and Oracle Utility Datapump (Import), aso….

Of course these bugs make the real options usage analysis especially difficult even for an experimented Database Administrator. Additionally the Oracle database in version 12 could make usage of options in maintenance windows without manual activation. That the case for instance of options such as : Automatic Maintenance – SQL Tuning Advisor, Automatic SQL Tuning Advisor and Automatic SQL Tuning Advisor.

14. Bug 23734270 – DBA_FEATURE_USAGE_STATISTICS SHOWS PERMANENT USAGE OF REAL-TIME SQL MONITORING
On a freshly created 12c database, DBA_FEATURE_USAGE_STATISTICS shows usage of Real-Time SQL Monitoring even if no reports have been run from OEM pages or with DBMS_SQL_MONITOR.
Reason :SQL Monitor reports are automatically generated and saved in AWR but should be considered as system usage.
This behavior is the same for all 12 releases and is not present in 11.2. – Extract of My Oracle Support Bug 23734270

Even if LMS team is not using option_packs_usage_statistics.sql script, the output of LMS_Collection_Tool (ReviewLite.sql) is quite the same. The direct consequence in case of an Oracle LMS audit is that the auditor could detect options that you simply never used and you will have to make the proof of non usage… if not you will have to pay the invoice following the final LMS report as stated in your LMS preliminary/final report.

“Based upon the information provided to License Management Services, the following licensing issues need to be resolved within 30 days from the date of the Final Report.”

“In accordance to Oracle compliance policies, backdated support charges are due for the period of unlicensed usage of Oracle Programs.
Please provide your feedback on this preliminary report within 10 days from the presentation of this report.”- extract of an Oracle LMS audit

Even if I do not have hundreds of cases where the LMS department made wrong detection, I’ve concrete stories where LMS team detected some false positives. Last case was related to the detection of more than 700 usage of Advanced compression due to unpublished BUG 17164904. Thanks to the metalink Doc ID 1993134.1, the bug is explained:

In 12.1.0.1,  the compression counter is incorrectly incremented (COMPRESSCNT=1) for compression=metadata_only (either explicitly or by default) due to unpublished BUG 17164904 – INCORRECT FEATURE USAGE STATISTICS FOR DATA PUMP COMPRESSION, fixed with 12.1.0.2.

How to interpret the output of option_packs_usage_statistics.sql ?

Sometimes this script could provide you some non sense option usage. That the case for instance for features provided only since database version 12c but detected on your old database version 11g. In such a case simply edit the option_packs_usage_statistics.sql script and have a look on the comments. A perfect example of that is illustrated by the detection of Heat Map usage in database version 11g whereas this option is available since version 12c. You can see below another example of wrong options detection related to “Automatic Maintenance – SQL Tuning Advisor” and “Automatic SQL Tuning Advisor”:


SELECT ‘Tuning Pack’                                         , ‘Automatic Maintenance – SQL Tuning Advisor’              , ‘^12\.‘                      , ‘INVALID‘ from dual union all  – system usage in the maintenance window
SELECT ‘Tuning Pack’                                         , ‘Automatic SQL Tuning Advisor’                            , ‘^11\.2|^12\.’               , ‘INVALID‘ from dual union all  — system usage in the maintenance window
SELECT ‘Tuning Pack’                                         , ‘Real-Time SQL Monitoring’                                , ‘^11\.2′                     , ‘ ‘       from dual union all

This INVALID clause explain that the detection of this option is due to system usage in the maintenance window in version 12 (Automatic Maintenance – SQL Tuning Advisor) and in version 11.2 and 12 for Automatic SQL Tuning Advisor. This is also explained few lines after in the option_packs_usage_statistics.sql script :


where nvl(CONDITION, ‘-‘) != ‘INVALID‘                   — ignore features for which licensing is not required without further conditions

    and not (CONDITION = ‘C003′ and CON_ID not in (0, 1))  — multiple PDBs are visible only in CDB$ROOT; PDB level view is not relevant
)

In such a case the option does not have to be considered since the normal behavior of an oracle database in version 12 is to use this option in the maintenance window. This is just an example to illustrate that some detected option does not have to be licensed as explained in the script.

Conclusion

I very do hope that this blog helps you to have a better understanding of how to detect what your database infrastructure really uses in terms of Oracle options. Anyway if you are convinced that you do not use an Oracle database option despite the output of scripts such as option-packs_usage_statistics or ReviewLite which proofs the opposite, have a look on My Oracle Support. Look for bug related to wrong detection of this feature and with a little bit of luck you will find something interesting. Oracle is definitively engineered for heroes…

Oracle Engineered For Heroes

Oracle Engineered For Heroes

 

Cet article Could you trust option_packs_usage_statistics.sql ? est apparu en premier sur Blog dbi services.

How we build our customized PostgreSQL Docker image

Tue, 2018-02-13 13:21

Docker becomes more and more popular these days and a lot of companies start to really use it. At one project we decided to build our own customized Docker image instead of using the official PostgreSQL one. The main reason for that is that we wanted to compile from source so that we only get want is really required. Why having PostgreSQL compiled with tcl support when nobody will ever use that? Here is how we did it …

To dig in right away, this is the simplified Dockerfile:

FROM debian

# make the "en_US.UTF-8" locale so postgres will be utf-8 enabled by default
ENV LANG en_US.utf8
ENV PG_MAJOR 10
ENV PG_VERSION 10.1
ENV PG_SHA256 3ccb4e25fe7a7ea6308dea103cac202963e6b746697366d72ec2900449a5e713
ENV PGDATA /u02/pgdata
ENV PGDATABASE "" \
    PGUSERNAME "" \
    PGPASSWORD ""

COPY docker-entrypoint.sh /

RUN set -ex \
        \
        && apt-get update && apt-get install -y \
           ca-certificates \
           curl \
           procps \
           sysstat \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           bison \
           flex \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           make \
           gcc \
           unzip \
           python \
           locales \
        \
        && rm -rf /var/lib/apt/lists/* \
        && localedef -i en_US -c -f UTF-8 en_US.UTF-8 \
        && mkdir /u01/ \
        \
        && groupadd -r postgres --gid=999 \
        && useradd -m -r -g postgres --uid=999 postgres \
        && chown postgres:postgres /u01/ \
        && mkdir -p "$PGDATA" \
        && chown -R postgres:postgres "$PGDATA" \
        && chmod 700 "$PGDATA" \
        \
        && curl -o /home/postgre/postgresql.tar.bz2 "https://ftp.postgresql.org/pub/source/v$PG_VERSION/postgresql-$PG_VERSION.tar.bz2" \
        && echo "$PG_SHA256 /home/postgres/postgresql.tar.bz2" | sha256sum -c - \
        && mkdir -p /home/postgres/src \
        && chown -R postgres:postgres /home/postgres \
        && su postgres -c "tar \
                --extract \
                --file /home/postgres/postgresql.tar.bz2 \
                --directory /home/postgres/src \
                --strip-components 1" \
        && rm /home/postgres/postgresql.tar.bz2 \
        \
        && cd /home/postgres/src \
        && su postgres -c "./configure \
                --enable-integer-datetimes \
                --enable-thread-safety \
                --with-pgport=5432 \
                --prefix=/u01/app/postgres/product/$PG_VERSION \\
                --with-ldap \
                --with-python \
                --with-openssl \
                --with-libxml \
                --with-libxslt" \
        && su postgres -c "make -j 4 all" \
        && su postgres -c "make install" \
        && su postgres -c "make -C contrib install" \
        && rm -rf /home/postgres/src \
        \
        && apt-get update && apt-get purge --auto-remove -y \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           gcc \
           make \
           unzip \
        && apt-get install -y libxml2 \
        && rm -rf /var/lib/apt/lists/*

ENV LANG en_US.utf8
USER postgres
EXPOSE 5432
ENTRYPOINT ["/docker-entrypoint.sh"]

We based the image on the latest Debian image, that is line 1. The following lines define the PostgreSQL version we will use and define some environment variables we will user later. What follows is basically installing all the packages required for building PostgreSQL from source, adding the operating system user and group, preparing the directories, fetching the PostgreSQL source code, configure, make and make install. Pretty much straight forward. Finally, to shrink the image, we remove all the packages that are not any more required after PostgreSQL was compiled and installed.

The final setup of the PostgreSQL instance happens in the docker-entrypoint.sh script which is referenced at the very end of the Dockerfile:

#!/bin/bash

# this are the environment variables which need to be set
PGDATA=${PGDATA}/${PG_MAJOR}
PGHOME="/u01/app/postgres/product/${PG_VERSION}"
PGAUTOCONF=${PGDATA}/postgresql.auto.conf
PGHBACONF=${PGDATA}/pg_hba.conf
PGDATABASENAME=${PGDATABASE}
PGUSERNAME=${PGUSERNAME}
PGPASSWD=${PGPASSWORD}

# create the database and the user
_pg_create_database_and_user()
{
    ${PGHOME}/bin/psql -c "create user ${PGUSERNAME} with login password '${PGPASSWD}'" postgres
    ${PGHOME}/bin/psql -c "create database ${PGDATABASENAME} with owner = ${PGUSERNAME}" postgres
}

# start the PostgreSQL instance
_pg_prestart()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} -w start
}

# start postgres and do not disconnect
# required for docker
_pg_start()
{
    ${PGHOME}/bin/postgres "-D" "${PGDATA}"
}

# stop the PostgreSQL instance
_pg_stop()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} stop -m fast
}

# initdb a new cluster
_pg_initdb()
{
    ${PGHOME}/bin/initdb -D ${PGDATA} --data-checksums
}


# adjust the postgresql parameters
_pg_adjust_config() {
    # PostgreSQL parameters
    echo "shared_buffers='128MB'" >> ${PGAUTOCONF}
    echo "effective_cache_size='128MB'" >> ${PGAUTOCONF}
    echo "listen_addresses = '*'" >> ${PGAUTOCONF}
    echo "logging_collector = 'on'" >> ${PGAUTOCONF}
    echo "log_truncate_on_rotation = 'on'" >> ${PGAUTOCONF}
    echo "log_filename = 'postgresql-%a.log'" >> ${PGAUTOCONF}
    echo "log_rotation_age = '1440'" >> ${PGAUTOCONF}
    echo "log_line_prefix = '%m - %l - %p - %h - %u@%d '" >> ${PGAUTOCONF}
    echo "log_directory = 'pg_log'" >> ${PGAUTOCONF}
    echo "log_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "log_autovacuum_min_duration = '60s'" >> ${PGAUTOCONF}
    echo "log_min_error_statement = 'NOTICE'" >> ${PGAUTOCONF}
    echo "log_min_duration_statement = '30s'" >> ${PGAUTOCONF}
    echo "log_checkpoints = 'on'" >> ${PGAUTOCONF}
    echo "log_statement = 'none'" >> ${PGAUTOCONF}
    echo "log_lock_waits = 'on'" >> ${PGAUTOCONF}
    echo "log_temp_files = '0'" >> ${PGAUTOCONF}
    echo "log_timezone = 'Europe/Zurich'" >> ${PGAUTOCONF}
    echo "log_connections=on" >> ${PGAUTOCONF}
    echo "log_disconnections=on" >> ${PGAUTOCONF}
    echo "log_duration=off" >> ${PGAUTOCONF}
    echo "client_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "wal_level = 'replica'" >> ${PGAUTOCONF}
    echo "hot_standby_feedback = 'on'" >> ${PGAUTOCONF}
    echo "max_wal_senders = '10'" >> ${PGAUTOCONF}
    echo "cluster_name = '${PGDATABASENAME}'" >> ${PGAUTOCONF}
    echo "max_replication_slots = '10'" >> ${PGAUTOCONF}
    echo "work_mem=8MB" >> ${PGAUTOCONF}
    echo "maintenance_work_mem=64MB" >> ${PGAUTOCONF}
    echo "wal_compression=on" >> ${PGAUTOCONF}
    echo "max_wal_senders=20" >> ${PGAUTOCONF}
    echo "shared_preload_libraries='pg_stat_statements'" >> ${PGAUTOCONF}
    echo "autovacuum_max_workers=6" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_scale_factor=0.1" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_threshold=50" >> ${PGAUTOCONF}
    # Authentication settings in pg_hba.conf
    echo "host    all             all             0.0.0.0/0            md5" >> ${PGHBACONF}
}

# initialize and start a new cluster
_pg_init_and_start()
{
    # initialize a new cluster
    _pg_initdb
    # set params and access permissions
    _pg_adjust_config
    # start the new cluster
    _pg_prestart
    # set username and password
    _pg_create_database_and_user
}

# check if $PGDATA exists
if [ -e ${PGDATA} ]; then
    # when $PGDATA exists we need to check if there are files
    # because when there are files we do not want to initdb
    if [ -e "${PGDATA}/base" ]; then
        # when there is the base directory this
        # probably is a valid PostgreSQL cluster
        # so we just start it
        _pg_prestart
    else
        # when there is no base directory then we
        # should be able to initialize a new cluster
        # and then start it
        _pg_init_and_start
    fi
else
    # initialze and start the new cluster
    _pg_init_and_start
    # create PGDATA
    mkdir -p ${PGDATA}
    # create the log directory
    mkdir -p ${PGDATA}/pg_log
fi
# restart and do not disconnect from the postgres daemon
_pg_stop
_pg_start

The important point here is: PGDATA is a persistent volume that is linked into the Docker container. When the container comes up we need to check if something that looks like a PostgreSQL data directory is already there. If yes, then we just start the instance with what is there. If nothing is there we create a new instance. Remember: This is just a template and you might need to do more checks in your case. The same is true for what we add to pg_hba.conf here: This is nothing you should do on real systems but can be handy for testing.

Hope this helps …

 

Cet article How we build our customized PostgreSQL Docker image est apparu en premier sur Blog dbi services.

La Suisse concernée par le RGPD ?

Tue, 2018-02-13 05:23

Le 25 mai 2018 entretra en application le Règlement Général sur la Protection des Données (RGPD). Il impactera toutes les entreprises opérant du traitement de données à caractère personnel sur des résidents européens.

A l’aube de cette échéance, beaucoup de fausses informations circulent au sein des entreprises suisse. L’objectif de ce blog est d’apporter un peu de lumière à certaines questions qui se posent. Pour ce faire, nous avons demandé à Me Metille, avocat spécialisé en protection des données de répondre aux questions que peut se poser une entreprise Suisse face à ce règlement.

1. Q. Est-ce qu’un sous-traitant d’une entreprise soumise au RGPD est obligatoirement soumis au RGPD, ceci en lien avec la chaîne de solidarité entre le responsable du traitement et ses fournisseurs ?
A. > Certains spécialistes considèrent que le sous-traitant est soumis, mais la plupart retiennent que le sous-traitant n’est pas soumis du seul fait qu’il est sous-traitant. En revanche, le sous-traitant qui vise de manière intentionnelle des clients européens sera soumis. Dans tous les cas le sous-traitant aura des obligations liées au RGPD, notamment celles de l’article 28.

2. Q. Est-ce qu’une entreprise Suisse qui emploie des personnes dans l’Union Européenne est systématiquement soumise au RGPD ?
A. > Non, une entreprise n’est pas soumise au RGPD sous prétexte qu’elle emploie des ressortissants de l’union européenne.

3. Q. Est-ce qu’une entreprise suisse qui emploie des personnes dans l’Union Européenne est soumise au RGPD si ses employés ont une activité commerciale dans l’Union Européenne ?
A. > Cela dépend de plusieurs facteurs tel que le lieu d’émission des factures, le lieu depuis lequel est initié l’activité ou encore une volonté de développer une activité en Europe, mais il y a un risque que les personnes employés soient qualifiées d’établissement et que ‘entreprise suisse soit assimilée à une entreprise présente dans l’Union Européenne.

4. Q. En tant que sous-traitant, est-ce qu’il y a une obligation de conseil par rapport aux données qui sont traitées ?
A > Le sous-traitant doit principalement s’assurer de la sécurité des données traitées et du respect du contrat. Si le sous-traitant à connaissance d’un élément qui peut potentiellement mettre en péril la sécurité des données il se doit de l’annoncer. (ex. Connexion VPN non sécurisée permettant d’accéder à des données sensibles). Dans certains cas, il sera de son devoir de refuser d’exécuter les tâches qui lui sont mandatées (ex. faire un export de données sensibles non crypté et l’envoyer à un destinataire hors de l’Union Européenne) La responsable de traitement est celui qui a le plus de responsabilités et qui doit s’assurer que le traitement est licite.

5. Q. Est-ce que le droit à l’effacement ou droit à l’oubli spécifié dans l’article 17 de la RGPD est contradictoire avec des articles de loi nécessitant de conserver des données pour une durée déterminée?
A. > Si une loi suisse indique qu’il est nécessaire de conserver des informations pour une durée déterminée c’est cette loi qui prévaut.

6. Q. Comment concilier droit à l’oubli et politique de sauvegarde de données ?
A. > La sauvegarde ne doit pas être utilisée dans un autre but, et en particulier pas pour retrouver des données que l’on aurait pas le droit de conserver. Il faut prendre les mesures nécessaires afin que d’une part seules les personnes autorisées puissent accéder aux médias de sauvegarde et que d’autre part en cas de restauration d’un jeu de sauvegarde seule les données pouvant l’être soient exploitées (on retirera des données privées, des données qui ont été effacées du support original après la sauvegarde ou que le droit à l’oubli implique de ne pas conserver.)

7. Q. Combien de temps peut-on conserver des données personnelles ?
A. > Il ne faut pas les conserver  plus longtemps que le temps nécessaire à atteindre le but visé au moment de leur collecte. On peut néanmoins ajouter une marge de sécurité raisonnable. Un délai de quelques mois peut être considéré comme une marge de sécurité raisonnable

8. Q. Peut-on conserver ad vitam aeternam la postulation d’un employé ?
A. > Non, si le candidat n’est pas retenu, il faut retourner ou détruire son dossier de candidature. Une conservation durant 3-4 mois, pour pouvoir se justifier en cas de reproches liés au non-engagment, est admise.
Si on veut conserver le dossier en vue d’un autre poste, il faut explicite du candidat (et dans ce cas on se limitera à 2-3 ans).

9. Q. Quels moyens techniques peuvent participer concrètement à la protection des données ?
A. > Logiciel d’encryption, de disponibilité et d’intégrité des données. Tous ce qui permet de retracer ce qui a été fait (contrôle d’accès, log, documentation, hardening, etc..)

10. Q. Est-ce que les sanctions liées au RGPD s’appliquent uniquement en cas de faille de sécurité ou de vol de données ?
A. > Les sanctions s’appliquent en cas de violation des règles du RGPD et donc pas uniquement en cas de faille de sécurité. Il pourrait même y avoir une double sanction en cas de faille de sécurité si l’entreprise n’annonce pas une faille. Dans un premier temps l’entreprise soumise au RGPD pourrait être sanctionnée pour n’avoir pas assuré la sécurité des données et dans un deuxième temps car l’entreprise qui a subi un vol de données ne l’a pas annoncé. Même sans se faire voler les données, elle pourrait déjà être amendée pour ne pas avoir pris des mesures élémentaires visant à assurer la sécurité.

11. Q. Quel pays de l’Union Européenne va appliquer les sanctions sur le territoire suisse ?
A. > Vraisemblablement chaque autorité nationale d’un pays de l’Union européenne ou des résidents sont concernés. Il n’est pas certain que les actes d’enquêtes et la sanction soient valablement reconnus par un juge suisse, ce qui rendrait l’encaissement difficile.

12. Q. Est-ce qu’il y a des assurances pour couvrir les risques liés au RGPD ?
A. > Oui il existe des produits pour couvrir certains risques mais on trouve de tout et certains produits sont fantaisistes. Il est important de distinguer ce qui est couvert de ce qui ne l’est pas, les conditions d’exclusion, le franchises et les limites de couverture.

13. Q. Est-ce qu’une entreprise soumise au RGPD à l’obligation d’informer un sous-traitant qu’il travaille sur des données personnelles ?
A.> L’entreprise soumise au RGPD doit non seulement indiquer à son sous-traitant qu’il travaille sur des données personnelles, mais en plus lui indiquer précisément ce qu’il doit faire.

14. Q. Faut-il avoir un DPO externe ou interne ?
>A. Le DPO est exigé dans des cas limités (autorité publique, suivi systématique à grande échelles, données sensibles). Il peut être interne ou externe.
Indépendamment de l’obligation légale, l’important est que le DPO dispose d’une bonne connaissance de l’entreprise et soit disponible pour les employés ou chefs de projets. Si le DPO n’a pas les connaissances juridiques ou techniques, il pourra demander de l’aide extérieure, mais si il ne connaît pas bien l’entreprise, il aura de la peine à se faire aider sur les aspects métiers.
>A. Le rôle de DPO peut être assumé par plusieurs personnes (un comité) mais il faut que les responsabilités soient clairement définies au préalable et qu’il y ait un seul point de contact externe.

15. Q. Quel lien doit-il y avoir entre le DPO et la Direction ?
>A. Le DPO doit faire rapport au plus haut niveau de la Direction. Il exerce ses tâches de manière indépendante et ne reçoit pas d’instruction. Il peut aussi assumer d’autres tâches et être rattachés administrativement à un service et pas directement à la Direction.

16. Q. Existe-t-il un/des liens entre GDPR et ISO 27000 ?
A. > Les thèmes tel que Identification des traitements, audit, data maping, contrôle d’accès et documentation sont très semblables.

Vous trouverez d’autres informations en lien avec le RGPD sur le blog de Me Metille à l’adresse suivante : https://smetille.ch/2017/12/14/la-suisse-et-le-rgpd/ ou encore dans l’article du temps, Le RGPD, la révolution du consentement, daté du 12 février 2018.

En espérant que ce blog ait pu répondre à certaines de vos questions.

 

Cet article La Suisse concernée par le RGPD ? est apparu en premier sur Blog dbi services.

V$MYSTAT delta values

Mon, 2018-02-12 14:32

Here is a little script I use from time to time to look at V$MYSTAT values and displaying on one line a set of statistics with their delta value between two calls.

The first script, _mystat_init.sql, initializes the variables. The second one displays the values, such as:

SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
57,371 15,445,852 6,111,608 37,709

Those two scripts are generated by defining the statistics:

define names="'redo size','redo entries','undo change vector size','db block changes'"

abd running the following to spool the two scripts:

sqlplus -s / as sysdba <<'END'
set pagesize 0 feedback off linesize 1000 trimspool on verify off echo off
with stats as (
select rownum n,stat_id,name from (select stat_id,name from v$statname where name in (&names) order by stat_id)
)
select 'define LAG'||stat_id||'=0' from stats
union all
select 'column "CUR'||stat_id||'" new_value '||'LAG'||stat_id||' noprint' from stats
union all
select 'column "DIF'||stat_id||'" heading '''||name||''' format 999G999G999G999' from stats
.
spool _mystat_init.sql
/
spool off
with stats as (
select rownum n,stat_id,name from (select stat_id,name from v$statname where name in (&names) order by stat_id)
)
select 'set termout off verify off' from dual
union all
select 'select ' from dual
union all
select ' '||decode(n,1,' ',',')||'"CUR'||stat_id||'" - '||'&'||'LAG'||stat_id||' "DIF'||stat_id||'"' from stats
union all
select ' '||',nvl("CUR'||stat_id||'",0) "CUR'||stat_id||'"' from stats
union all
--select ','''||'&'||'1'' comments' from dual
--union all
select q'[from (select stat_id,value from v$mystat join v$statname using(statistic#) where name in (&names)) pivot (avg(value)for stat_id in (]' from dual
union all
select ' '||decode(n,1,' ',',')||stat_id||' as "CUR'||stat_id||'"' from stats
union all
select '))' from dual
union all
select '.' from dual
union all
select 'set termout on' from dual
union all
select '/' from dual
.
spool _mystat_diff.sql
/
spool off
END

Then, in sqlplus or SQLcl, you run:

SQL> _mystat_init.sql

to initialize the values to 0 and:

SQL> @ _mystat_diff.sql

each time you want to display the difference from last call.

 

Cet article V$MYSTAT delta values est apparu en premier sur Blog dbi services.

Index seek operator and residual IO

Mon, 2018-02-12 09:00

This blog post draws on the previous article about index seek and gotchas. I encountered another interesting case but it is not so much because of the nature of the problem I decided to write this article but rather the different ways that exist to troubleshoot it.

Firstly, let’s set the scene:

A simple update query and its corresponding execution plan that tend to say the plan is efficient  in terms of performance

declare @P0 datetime2
       ,@P1 int,
       @P2 nvarchar(4000)

set @P0 = GETDATE()
set @P1 = 1
set @P2 = '005245'

UPDATE
	[TABLE]
SET
	[_DATE] = @P0
WHERE
	[_ID] = @P1
    AND
    [_IDENTIFIER] = @P2
;

 

blog 129 - 1- query plan before optimization

An index Seek – with a cardinality of 1 – and then a clustered index update operation. A simple case as you may notice here. But although the customer noticed a warning icon in the execution plan, he was confident enough about this plan to take care about it.

But if we look closer at the warning icon it concerns a CONVERT_IMPLICT operation (meaning hidden operation done by the optimizer) that may affect the SeekPlan as stated by the pop up:

blog 129 - 2- query plan warning

In fact, in the context of this customer, this query became an issue when it was executed a thousand times in a very short period leading to trigger lock time out issues and to consume one entire VCPU of the server as well.

Let’s jump quickly to the root cause and the solution. Obviously the root cause concerned the @P2 parameter type here – NVARCHAR(4000) – that forced the optimizer to use an CONVERT_IMPLICT operation which makes the predicate non sargable and an index seek operation inefficient. This is particularly true when the CONVERT_IMPLICIT operation concerns the column in the predicate rather than the leading parameter. Indeed in this case @P2 type is NVARCHAR(4000) whereas the _IDENTIFIER column type is VARCHAR(50) and the optimizer has to convert first the _IDENTIFIER column (with lower precedence type) to match with the @P2 parameter (with the higher precedence type). Let’s say also that the @P1 parameter is not selective enough to prevent a range scan operation in this case.

Although the actual execution plan is showing index seek with a cardinality of 1, I was able to confirm quickly to my customer that the real number was completely overshadowed by using IO and TIME statistics.

  CPU time = 0 ms,  elapsed time = 0 ms.
Table 'TABLE'. Scan count 1, logical reads 17152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 609 ms,  elapsed time = 621 ms.

 

Well, do you really think that a true index seek may lead to read such number of pages and to consume this amount of CPU per execution? :) … I let you image how intensive this query may be in terms of CPU when executed a lot of times in a very short period …

Another point that puts us quickly on the right track is that the comparison is done in the predicate section of the index seek operation meaning it will be checked after the seek predicate. Thus, the index seek operator acts as an index scan by starting at the first page and keeps running until satisfying the seek predicate.

blog 129 - 3- query plan index seek operator

Referring to the IO statistics output above we may notice the operator read 17152 pages and a quick look at the corresponding index physical stats allowed us to assume safely the storage engine read a big part of the index in this case.

SELECT 
	index_id,
	partition_number,
	index_level,
	page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('TABLE'), NULL, NULL, 'DETAILED')
WHERE index_id = 10

 

Ultimately, we may also benefit from an undocumented trace flag 9130 that highlights in an obvious way how to address this issue by showing a filter operator (predicate section of index seek operator pop up) after the index seek operator itself (reading from the right to the left). From a storage perspective we are far from the first exposed reality.

blog 129 - 5- TF9130

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'table'. Scan count 1, logical reads 6, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

 

Ok my problem was fixed but let’s jump quickly to another interesting point. Previously I used different tools / tricks to identify the “residual” seek operation. But shall I go through all this stuff to identify this kind of issue? Is there an easier way for that? The good news is yes if you run on SQL Server 2012 SP3+ / SQL Server 2014 SP2+ or SQL Server 2016+ because the SQL Server team added runtime information to showplan.

In my context, it may simplify drastically the analysis because runtime statistics are directly exposed per operator. I got interesting information as ActualRowsRead (3104706 rows) and ActualLogicalReads (17149 pages).

Note that if we refer to the IO statistics from the first query where I got 17152 pages for the table TABLE, the missing 3 pages are in fact on the update index operator.

<RelOp AvgRowSize="36" 
						   EstimateCPU="3.28721" 
						   EstimateIO="12.1472" 
						   EstimateRebinds="0" 
						   EstimateRewinds="0" 
						   EstimatedExecutionMode="Row" 
						   EstimateRows="1" 
						   LogicalOp="Index Seek" 
						   NodeId="3" 
						   Parallel="false" 
						   PhysicalOp="Index Seek" 
						   EstimatedTotalSubtreeCost="15.4344" 
						   TableCardinality="5976470">
                      <OutputList>
                        <ColumnReference 
							Database="[DB]" 
							Schema="[dbo]" 
							Table="[TABLE]" 
							Column="_DELIVERED" />
                      </OutputList>
                      <RunTimeInformation>
                        <RunTimeCountersPerThread Thread="0" 
												  ActualRows="1" 
												  ActualRowsRead="3104706" 
												  ActualEndOfScans="1" 
												  ActualExecutions="1" 
												  ActualElapsedms="1209" 
												  ActualCPUms="747" 
												  ActualScans="1" 
												  ActualLogicalReads="17149" 
												  ActualPhysicalReads="1" 
												  ActualReadAheads="17141" 
												  ActualLobLogicalReads="0" 
												  ActualLobPhysicalReads="0" 
												  ActualLobReadAheads="0" />
                      </RunTimeInformation>

 

Finally, if you’re an enthusiastic of the famous Plan Explorer tool like me in order to troubleshoot execution plans, the addition of runtime statistics was also included to this tool accordingly. The warning icons and corresponding popups alert us very quickly about the residual IO issue as shown below:

blog 129 - 6- PlanExplorer execution plan

blog 129 - 6- PlanExplorer

Happy troubleshooting!

 

Cet article Index seek operator and residual IO est apparu en premier sur Blog dbi services.

Windows Server – Service not starting with ‘Error 1067: The process terminated unexpectedly’

Sat, 2018-02-10 02:30

Some time ago, we were installing a Migration Center (from fmeAG) on a Windows Server and at the end of the installation, the service named Migration Center Job Server is configured and finally started. Unfortunately this didn’t go well and the start command wasn’t working at all. We were using a dedicated technical account (AD account) to do the installation and to run this service. This is the error we got:

fmeAG_MC_Service_Start_Fail_Error

 

The error code 1067 means ‘ERROR_PROCESS_ABORTED’. This was the first time I saw this error, ever, so I wanted to know more about it. I performed a lot of tests on this environment to try to narrow down the issue and the following tests in particular were quite relevant:

  • Changing the Log On As user to something else
    • The service is able to start
    • It means that the issue is somehow linked to this particular technical user
  • Checking the ‘Log On As a service’ Local Policies
    • The technical user is listed and allowed to start the service so there is no issue there
  • Using the technical user to start a service on another Windows Server
    • The service is able to start
    • It means that the issue is not linked to the technical user globally

 

So with all these information, it appeared that the issue was linked to this particular technical user but only locally to this Windows Server… When working with AD accounts, it is always possible to face some issues with local profile VS domain profile (thanks Stéphane Haby for pointing that out to me, I’m not a Windows expert ;)) so I tried to work on that and after some time, I found a workaround to this issue. The workaround is simply to delete the local profile…

Since this is an AD account (that I will call <SYS_AD_ACCOUNT> below), it is possible to remove the local profile, it will just be recreated automatically when needed, you just need to remove it properly and for that, there are particular steps:

  1. Login to the Windows Server with another administrator account
  2. Log out all <SYS_AD_ACCOUNT> sessions: stop all services running with this account, all processes (from the task manager, there should be nothing on the “users” tab), aso… Alternatively, you can also disable the services (or manual mode) and then reboot the Windows Server
  3. Delete the complete folder C:\Users\<SYS_AD_ACCOUNT>. If you already tried this but did not remove the registry properly, you might end up with a C:\Users\TEMP folder… If this is the case, remove it now too.
  4. Delete the registry key matching <SYS_AD_ACCOUNT> (check the “ProfileImagePath” parameter) under “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList”
  5. Reboot the Windows Server
  6. Login to the Windows Server with the administrator account used for the step 1.
  7. Set <SYS_AD_ACCOUNT> for the “Log On As” of the target Service
  8. Start the target Service => It should now be working
  9. Login to the Windows Server with the <SYS_AD_ACCOUNT> account
  10. Open a cmd prompt => The current folder should be C:\Users\<SYS_AD_ACCOUNT> and not C:\Users\TEMP

 

I am not saying that this workaround will fix each and every issues linked to an error code 1067… Sometimes, this error will be linked to the fact that the service cannot find the executable for example. But in my case, it fixed the issue so if you are running out of options, maybe you can just give it a try ;).

 

 

Cet article Windows Server – Service not starting with ‘Error 1067: The process terminated unexpectedly’ est apparu en premier sur Blog dbi services.

Documentum – DA 7.3 showing java.lang.NullPointerException on every actions

Sat, 2018-02-10 01:45

Last year, we started to upgrade some Documentum Administrator from 7.2 to 7.3 and directly after, we started to see some NullPointerException on the log files. We are using DA 7.3 on some recent WebLogic Servers versions (12.1.3, 12.2.1.2). We usually deploy DA as a WAR file (so not exploded) with just the dfc.properties, keystores and logs outside of it. This is the kind of errors we started to see as soon as it was upgraded to 7.3 in the startup log file (nohup log file in our case):

java.lang.NullPointerException
        at java.io.FileInputStream.<init>(FileInputStream.java:130)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at com.documentum.web.form.WebformTag.fetchExtnNativeVersion(WebformTag.java:282)
        at com.documentum.web.form.WebformTag.renderExtnJavaScript(WebformTag.java:268)
        at com.documentum.web.form.WebformTag.doStartTag(WebformTag.java:159)
        at jsp_servlet._custom._jsp.__loginex._jsp__tag3(__loginex.java:1687)
        at jsp_servlet._custom._jsp.__loginex._jspService(__loginex.java:272)
        at weblogic.servlet.jsp.JspBase.service(JspBase.java:35)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:286)
        at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:260)
        at weblogic.servlet.internal.StubSecurityHelper.invokeServlet(StubSecurityHelper.java:137)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:350)
        at weblogic.servlet.internal.ServletStubImpl.onAddToMapException(ServletStubImpl.java:489)
        at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:376)
        ...

 

The stack on the line 7 and 8 changed, depending on the action that was being performed. For example an access to the login page would print the following:

        at jsp_servlet._custom._jsp.__loginex._jsp__tag3(__loginex.java:1687)
        at jsp_servlet._custom._jsp.__loginex._jspService(__loginex.java:272)

 

Then once logged in, you would see each component printing the same kind of stack, with the following lines 7 and 8:

        at jsp_servlet._webtop._main.__mainex._jsp__tag0(__mainex.java:286)
        at jsp_servlet._webtop._main.__mainex._jspService(__mainex.java:116)

 

        at jsp_servlet._webtop._classic.__classic._jsp__tag0(__classic.java:408)
        at jsp_servlet._webtop._classic.__classic._jspService(__classic.java:112)

 

        at jsp_servlet._webtop._titlebar.__titlebar._jsp__tag0(__titlebar.java:436)
        at jsp_servlet._webtop._titlebar.__titlebar._jspService(__titlebar.java:175)

 

        at jsp_servlet._webtop._messagebar.__messagebar._jsp__tag0(__messagebar.java:145)
        at jsp_servlet._webtop._messagebar.__messagebar._jspService(__messagebar.java:107)

 

aso…

We were working with OpenText on this issue. As mentioned in the stack trace, this is actually because DA is trying to fetch the “ExtnNativeVersion”. This is a property file located there: wdk/extension/client/EMC/ContentXfer/com.emc.wdk.native/1. Unfortunately, when DA 7.3 is trying to locate this file, it does not work but the file is really present… It does not work because the DA is deployed as a WAR file (an archive) and therefore the path to the file is wrong. I suspect this is something that Documentum changed recently, using the getRealPath(). To change the behavior of the getRealPath function, you have to set the property “Archived Real Path Enabled” to true so it returns the canonical path of the file…

So to remove these exceptions, you have two options:

I. At the domain level:
  1. Login to the WebLogic Administration Console using your weblogic account
  2. Navigate to the correct page: DOMAIN > Configuration > Web Applications
  3. Click on the ‘Lock & Edit’ button
  4. Check the ‘Archived Real Path Enabled’ checkbox ( = set it to true)
  5. Click on the ‘Save’ and then ‘Activate Changes’ buttons

This will add the configuration to the global config.xml file so it is enabled for the whole domain. As often, I would not recommend that but rather configuring this at the application level because you might have other applications that do NOT want this setting to be set to true… So that is why you have the option number two:

II. At the application level:

The high level steps to do that would be the following ones:

  1. Extract the weblogic.xml file from the application war file
  2. Add the ‘Archived Real Path Enabled’ property in it and set it to true
  3. Repackage the war file and redeploy it

This is pretty simple:

[weblogic@weblogic_server_01 ~]$ cd $APPS_HOME
[weblogic@weblogic_server_01 apps]$ jar -xvf da.war WEB-INF/weblogic.xml
 inflated: WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ vi WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ tail -6 WEB-INF/weblogic.xml

   <container-descriptor>
      <show-archived-real-path-enabled>true</show-archived-real-path-enabled>
   </container-descriptor>

</weblogic-web-app>
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ jar -uvf da.war WEB-INF/weblogic.xml
adding: WEB-INF/weblogic.xml(in = 989) (out= 398)(deflated 59%)
[weblogic@weblogic_server_01 apps]$

 

Then, you just need to update the deployment in the WebLogic Administration Console and that’s it, the exceptions should be gone now. As far as I’m aware of, these exceptions did not have any impact on the proper behavior of Documentum Administrator but it is still very ugly to have hundreds of them in the log file…

 

 

Cet article Documentum – DA 7.3 showing java.lang.NullPointerException on every actions est apparu en premier sur Blog dbi services.

WebLogic – SSO/Atn/Atz – Infinite loop

Sat, 2018-02-10 01:00

This blog will be the last of my series around the WebLogic SSO (how to enable logs, 403 N°1, 403 N°2) which I started some weeks ago. Several months ago on a newly built High Availability Environment (2 WebLogic Servers behind a Load Balancer), an application team was deploying their D2 Client as always. After the deployment, it appeared to the tester that the Single Sign-On was not working when using the SSO URL through the Load Balancer. For this user, the URL was going crazy on an infinite loop between the LB URL and the SAML2 Partner URL. When I tried to replicate the issue, the SSO was working fine for me…

The only possible reason for the infinite loop for some users but not for all is that there is an issue with the SSO setup/configuration on one of the two WebLogic Servers only… Indeed, the Load Balancer (with sticky session) probably redirected the tester on the WLS with the issue while my session was redirected to the other one. Enabling the debug logs (on both WLS) quickly confirmed that all communications that were going through the first WebLogic Server were working properly (with SSO) while the second WebLogic Server had this infinite loop issue.

The logs from the WebLogic Server 1 and 2 were identical up to a certain point (basically all the SAML2 part), except some elements such as the local hostname, the date/hour and some IDs (like java class instances, aso…). This is the common content on both WLS logs:

<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Filter: Processing request on URI '/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is not a service URI>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'SPinitiator'>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: processing>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: partner id is null>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: use partner binding HTTP/POST>
<Nov 21, 2017 9:25:39 AM UTC> <Debug> <SecuritySAML2Service> <post from template url: null>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Servlet: Processing request on URI '/saml2/sp/acs/post'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/saml2/sp/acs/post'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): service URI is '/sp/acs/post'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'ACS'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Assertion consumer service: processing>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <get SAMLResponse from http request:PBvbnNSJ1cXMlFtZzaXM6bmHhtbG5zm46NhwOlJlOnNhbWxHbWxIb2Fc3wP6dGM6
U0FiB4bWxuczp4NTAwPSJ1cm46b2FzNTDoyLjA6YXNzZXJ0aW9uIaXM6bmFtZXM6
U0FdG9jb2wiIHhtbG5zOmRzaWc9Imh0dHA6Ly93NTDoyLjA6cHJvd3cudzMub3Jn
aHR0cDoa5vcmcvMjAwMS9W5zdGFuY2vL3d3dy53MyYTUxTY2hlbWEtUiIERlc3Rp
MWNxM2FjNzI1ZDjYmIVhNDM1Zlzc3VlSW5zdGFudD0ijhlNjc3OTkiIEMjAxNy0x
LzINpZyMiIHhtwMDAvMDkveG1sZHbG5DovL3d3dy53My5vczOmVuYz0iaHR0cmcv
...
YWRpb24+P1sOkF1ZGllbHJpY3mPHNhNlUmVzdC9zYW1sOkNvbmRpdGlvbnM+bWw6
YXNzUzpjbYW1YXNpczpuIuMlmVmPnVybjpvDphYxzp0YzpTQU1MOjGFzc2VzOlBh
OjAXh0Pj0OjUyWiI+PHNsOkF1dGhuQhxzYW129udGV4bWw6QXV0aG5Db250ZdENs
UmVnRlepBdXRobkNvbzYWmPjwvc2FtbDF1dGhuU1sOkHQ+PC93RhdGVtZW50Pjwv
c2F9zY25zZtbDpBcW1scDpSZXNwb3NlcnRpb24+PCT4=
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <BASE64 decoded saml message:<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:enc="http://www.w3.org/2001/04/xmlenc#" xmlns:saml="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:x500="urn:oasis:names:tc:SAML:2.0:profiles:attribute:X500" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Destination="https://weblogic_server_01/saml2/sp/acs/post" ID="id-HpDtn7r5XxxAQFYnwSLXZmkuVgIHTExrlreEDZ4-" InResponseTo="_0x7258edc52961ccbd5a435fb13ac67799" IssueInstant="2017-11-21T9:25:55Z" Version="2.0"><saml:Issuer Format="urn:oasis:names:tc:SAML:2.0:nameid-format:entity">https://idp_partner_01/fed/idp</saml:Issuer><dsig:Signature><dsig:SignedInfo><dsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><dsig:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><dsig:Reference URI="#id-HpDtn7r5XxxAQFYnwSLXZmkuVgIHTExrlreEDZ4-"><dsig:Transforms><dsig:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><dsig:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/></dsig:Transforms><dsig:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><dsig:DigestValue>YGtUZvsfo3z51AsBo7UDhbd6Ts=</dsig:DigestValue></dsig:Reference></dsig:SignedInfo><dsig:SignatureValue>al8sJwbqzjh1qgM3Sj0QrX1aZjwyI...JB6l4jmj91BdQrYQ7VxFzvNLczZ2brJSdLLig==</dsig:SignatureValue><dsig:KeyInfo><dsig:X509Data><dsig:X509Certificate>MIwDQUg+nhYqGZ7pCgBQAwTTELMAkGA1UEBhMCQk1ZhQ...aATPRCd113tVqsvCkUwpfQ5zyUHaKw4FkXmiT2nzxxHA==</dsig:X509Certificate></dsig:X509Data></dsig:KeyInfo></dsig:Signature><samlp:Status><samlp:StatusCode Value="urn:oasis:names:tc:SAML:2.0:status:Success"/></samlp:Status><saml:Assertion ID="id-0WrMNbOz6wsuZdFPhfjnw7WIXXQ6k89-1AgHZ9Oi" IssueInstant="2017-11-21T9:25:55Z" Version="2.0"><saml:Issuer Format="urn:oasis:names:tc:SAML:2.0:nameid-format:entity">https://idp_partner_01/fed/idp</saml:Issuer><dsig:Signature><dsig:SignedInfo><dsig:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/><dsig:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/><dsig:Reference URI="#id-0WrMNbOz6wsuZdFPhfjnw7WIXXQ6k89-1AgHZ9Oi"><dsig:Transforms><dsig:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><dsig:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/></dsig:Transforms><dsig:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/><dsig:DigestValue>7+jZtq8SpY3BKVaFjIFeEJm51cA=</dsig:DigestValue></dsig:Reference></dsig:SignedInfo><dsig:SignatureValue>GIlXt4B4rVFoDJRxidpZO73gXB68Dd+mcpoV9DKrjBBjLRz...zGTDcEYY2MG8FgtarZhVQGc4zxkkSg8GRT6Wng3NEuTUuA==</dsig:SignatureValue><dsig:KeyInfo><dsig:X509Data><dsig:X509Certificate>MIwDQUg+nhYqGZ7pCgBQAwTTELMAkGA1UEBhMCQk1ZhQ...aATPRCd113tVqsvCkUwpfQ5zyUHaKw4FkXmiT2nzxxHA==</dsig:X509Certificate></dsig:X509Data></dsig:KeyInfo></dsig:Signature><saml:Subject><saml:NameID Format="urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified">PATOU_MORGAN</saml:NameID><saml:SubjectConfirmation Method="urn:oasis:names:tc:SAML:2.0:cm:bearer"><saml:SubjectConfirmationData InResponseTo="_0x7258edc52961ccbd5a435fb13ac67799" NotOnOrAfter="2017-11-21T9:30:55Z" Recipient="https://weblogic_server_01/saml2/sp/acs/post"/></saml:SubjectConfirmation></saml:Subject><saml:Conditions NotBefore="2017-11-21T9:25:55Z" NotOnOrAfter="2017-11-21T9:30:55Z"><saml:AudienceRestriction><saml:Audience>SAML2_Entity_ID_01</saml:Audience></saml:AudienceRestriction></saml:Conditions><saml:AuthnStatement AuthnInstant="2017-11-21T9:25:55Z" SessionIndex="id-oX9wXdpGmt9GQlVffvY4hEIRULEd25nrxDzE8D7w" SessionNotOnOrAfter="2017-11-21T9:40:55Z"><saml:AuthnContext><saml:AuthnContextClassRef>urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport</saml:AuthnContextClassRef></saml:AuthnContext></saml:AuthnStatement></saml:Assertion></samlp:Response>>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <<samlp:Response> is signed.>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionServiceImpl.assertIdentity(SAML2.Assertion.DOM)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionTokenServiceImpl.assertIdentity(SAML2.Assertion.DOM)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2IdentityAsserterProvider: start assert SAML2 token>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2IdentityAsserterProvider: SAML2IdentityAsserter: tokenType is 'SAML2.Assertion.DOM'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion signature>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: The assertion is signed.>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion signature>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion attributes>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion attributes>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion issuer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion issuer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion conditions>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion conditions>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: Start verify assertion subject>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert: End verify assertion subject>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2NameMapperCache.getNameMapper: Found name mapper in the cache>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAML2Assert.processAttributes - processAttrs: false, processGrpAttrs: true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <DefaultSAML2NameMapperImpl: mapName: Mapped name: qualifier: null, name: PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAMLIACallbackHandler: SAMLIACallbackHandler(true, PATOU_MORGAN, null)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAMLIACallbackHandler: callback[0]: NameCallback: setName(PATOU_MORGAN)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity returning PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.getCachedIdentity(idm:_def_Idmusr:PATOU_MORGAN)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.getCachedIdentity(idm:_def_Idmusr:PATOU_MORGAN) returning null>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityAssertionCallbackServiceImpl.assertIdentity did not find a cached identity.>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.constructor>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl.getJAASIdentityAssertionConfigurationName()>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSJAASLoginServiceImpl$ServiceImpl.authenticate>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login ClassLoader=sun.misc.Launcher$AppClassLoader@7d4991ad>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login ThreadContext ClassLoader Original=com.bea.common.security.utils.SAML2ClassLoader@625ecb90>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl.getAppConfigurationEntry(com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl0)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl.getAppConfigurationEntry>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getAssertionModuleConfiguration>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl$JAASProviderImpl.getProviderAppConfigurationEntry returning LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl, ControlFlag=LoginModuleControlFlag: sufficient>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getClassLoader>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ClassLoader=java.net.URLClassLoader@108c7fe5>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ControlFlag=LoginModuleControlFlag: sufficient>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getAssertionModuleConfiguration>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASIdentityAssertionConfigurationServiceImpl$JAASProviderImpl.getProviderAppConfigurationEntry returning LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl, ControlFlag=LoginModuleControlFlag: required>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.legacy.service.JAASIdentityAssertionProviderImpl$V1Wrapper.getClassLoader>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ClassLoader=java.net.URLClassLoader@108c7fe5>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.wrap ControlFlag=LoginModuleControlFlag: required>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login created LoginContext>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login ThreadContext ClassLoader Current=com.bea.common.security.utils.SAML2ClassLoader@625ecb90>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize LoginModuleClassName=weblogic.security.providers.authentication.LDAPAtnLoginModuleImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize ClassLoader=java.net.URLClassLoader@108c7fe5>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize created delegate login module>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP ATN LoginModule initialized>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.initialize delegated>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.login>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Login>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle callbcacks[0] will be delegated>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle callbcacks[0] will use NameCallback to retrieve name>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle will delegate all callbacks>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Atn> <SAMLIACallbackHandler: callback[0]: NameCallback: setName(PATOU_MORGAN)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle delegated callbacks>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.CallbackHandlerWrapper.handle got username from callbacks[0], UserName=PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Login username: PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getUserDNName? user:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getDNForUser search("ou=people,ou=internal,dc=company,dc=com", "(&(uid=PATOU_MORGAN)(objectclass=companyperson))", base DN & below)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getConnection return conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <ConnSetupMgr><Connecting to host=ldap.company.com, ssl port=3636>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <ConnSetupMgr><Successfully connected to host=ldap.company.com, ssl port=3636>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved guid:c6e6f021-2b29d761-8284c0aa-df200d0f>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <DN for user PATOU_MORGAN: uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <found user from ldap, user:PATOU_MORGAN, userDN=uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved username from LDAP :PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <returnConnection conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Asserted Identity for PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <List groups that member: PATOU_MORGAN belongs to>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <DN for user PATOU_MORGAN: uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved dn:uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com for user:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <search("ou=bind_user,ou=applications,ou=internal,dc=company,dc=com", "(&(uniquemember=uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com)(objectclass=groupofuniquenames))", base DN & below)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <getConnection return conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Result has more elements: false>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <search("ou=bind_user,ou=applications,ou=internal,dc=company,dc=com", "(objectclass=groupofURLs)", base DN & below)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <returnConnection conn:LDAPConnection {ldaps://ldap.company.com:3636 ldapVersion:3 bindDN:"ou=bind_user,ou=applications,ou=internal,dc=company,dc=com"}>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <DN for user PATOU_MORGAN: uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Retrieved dn:uid=PATOU_MORGAN,ou=people,ou=internal,dc=company,dc=com for user:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <login succeeded for username PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.login delegated, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.commit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Commit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <LDAP Atn Principals Added>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.LoginModuleWrapper.commit delegated, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login logged in>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login subject=Subject:
        Principal: PATOU_MORGAN
        Private Credential: PATOU_MORGAN
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSIdentityServiceImpl.getIdentityFromSubject Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principals)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) Principal=PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) PrincipalClassName=weblogic.security.principal.WLSUserImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) trying PrincipalValidator for interface weblogic.security.principal.WLSPrincipal>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) PrincipalValidator handles this PrincipalClass>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Generated signature and signed WLS principal PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) PrincipalValidator signed the principal>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.sign(Principal) All required PrincipalValidators signed this PrincipalClass, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.JAASLoginServiceImpl.login identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSJAASLoginServiceImpl$ServiceImpl.authenticate authenticate succeeded for user PATOU_MORGAN, Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <weblogic.security.service.internal.WLSJAASLoginServiceImpl$ServiceImpl.authenticate login succeeded and PATOU_MORGAN was not previously locked out>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.cachedIdentity(Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
),cacheKe y= idm:_def_Idmusr:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Using redirect URL from request cache: 'https://d2.company.com:443/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Redirecting to URL: https://d2.company.com:443/D2/X3_Portal.jsp>

 

At this point, the SAML2 exchange is successful and the user has been found in the LDAP successfully. So the only thing remaining is the validation of the user on WebLogic side as well as the access to the Application (D2 in this case). That’s where the logs of the WLS started to diverge.

On the WebLogic Server 1 (no issue), we can see WebLogic validating the user and finally granting access to the application:

<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.cachedIdentity(Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
),cacheKe y= idm:_def_Idmusr:PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Using redirect URL from request cache: 'https://d2.company.com:443/D2/X3_Portal.jsp'>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecuritySAML2Service> <Redirecting to URL: https://d2.company.com:443/D2/X3_Portal.jsp>

<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <PrincipalAuthenticator.validateIdentity>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <PrincipalAuthenticator.validateIdentity will use common security service>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principals)>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) Principal=PATOU_MORGAN>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) PrincipalClassName=weblogic.security.principal.WLSUserImpl>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) trying PrincipalValidator for interface weblogic.security.principal.WLSPrincipal>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) PrincipalValidator handles this PrincipalClass>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <Validate WLS principal PATOU_MORGAN returns true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) PrincipalValidator said the principal is valid>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principal) One or more PrincipalValidators handled this PrincipalClass, returning true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.PrincipalValidationServiceImpl.validate(Principals) validated all principals>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <AuthorizationManager will use common security for ATZ>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <weblogic.security.service.WLSAuthorizationServiceWrapper.isAccessAllowed>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Roles=[ "Anonymous" "consumer" ]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Resource=type=<url>, application=D2, contextPath=/D2, uri=/X3_Portal.jsp, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Direction=ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): input arguments:>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Subject: 1
        Principal = weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Roles:Anonymous, consumer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Resource: type=<url>, application=D2, contextPath=/D2, uri=/X3_Portal.jsp, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Direction: ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> < Context Handler: >
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Accessed Subject: Id=urn:oasis:names:tc:xacml:2.0:subject:role, Value=[Anonymous,consumer]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Evaluate urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of([consumer,consumer],[Anonymous,consumer]) -> true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <primary-rule evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <urn:bea:xacml:2.0:entitlement:resource:type@E@Furl@G@M@Oapplication@ED2@M@OcontextPath@E@UD2@M@Ouri@E@U@K@M@OhttpMethod@EGET, 1.0 evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): returning PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed AccessDecision returned PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AuthorizationServiceImpl.isAccessAllowed returning adjudicated: true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <AuthorizationManager will use common security for ATZ>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <weblogic.security.service.WLSAuthorizationServiceWrapper.isAccessAllowed>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Roles=[ "Anonymous" "consumer" ]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Resource=type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/x3_portal.nocache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Direction=ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): input arguments:>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Subject: 1
        Principal = weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Roles:Anonymous, consumer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Resource: type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/x3_portal.nocache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Direction: ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Context Handler: >
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Accessed Subject: Id=urn:oasis:names:tc:xacml:2.0:subject:role, Value=[Anonymous,consumer]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Evaluate urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of([consumer,consumer],[Anonymous,consumer]) -> true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <primary-rule evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <urn:bea:xacml:2.0:entitlement:resource:type@E@Furl@G@M@Oapplication@ED2@M@OcontextPath@E@UD2@M@Ouri@E@U@K@M@OhttpMethod@EGET, 1.0 evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): returning PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed AccessDecision returned PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AuthorizationServiceImpl.isAccessAllowed returning adjudicated: true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <AuthorizationManager will use common security for ATZ>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <weblogic.security.service.WLSAuthorizationServiceWrapper.isAccessAllowed>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Identity=Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Roles=[ "Anonymous" "consumer" ]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Resource=type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/79E87564DA9D026D7ADA9326E414D4FC.cache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed Direction=ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): input arguments:>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Subject: 1
        Principal = weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Roles:Anonymous, consumer>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Resource: type=<url>, application=D2, contextPath=/D2, uri=/x3_portal/79E87564DA9D026D7ADA9326E414D4FC.cache.js, httpMethod=GET>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Direction: ONCE>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <        Context Handler: >
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Accessed Subject: Id=urn:oasis:names:tc:xacml:2.0:subject:role, Value=[Anonymous,consumer]>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <Evaluate urn:oasis:names:tc:xacml:1.0:function:string-at-least-one-member-of([consumer,consumer],[Anonymous,consumer]) -> true>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <primary-rule evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <urn:bea:xacml:2.0:entitlement:resource:type@E@Furl@G@M@Oapplication@ED2@M@OcontextPath@E@UD2@M@Ouri@E@U@K@M@OhttpMethod@EGET, 1.0 evaluates to Permit>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <XACML Authorization isAccessAllowed(): returning PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AccessDecisionServiceImpl.isAccessAllowed AccessDecision returned PERMIT>
<Nov 21, 2017 9:25:55 AM UTC> <Debug> <SecurityAtz> <com.bea.common.security.internal.service.AuthorizationServiceImpl.isAccessAllowed returning adjudicated: true>

 

On the other hand, the WebLogic Server 2 (with issue) didn’t even start to validate the user, it just started again the full SAML2 process:

<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecurityAtn> <com.bea.common.security.internal.service.IdentityCacheServiceImpl.cachedIdentity(Subject: 1
        Principal = class weblogic.security.principal.WLSUserImpl("PATOU_MORGAN")
),cacheKe y= idm:_def_Idmusr:PATOU_MORGAN>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <Using redirect URL from request cache: 'https://d2.company.com:443/D2/X3_Portal.jsp'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <Redirecting to URL: https://d2.company.com:443/D2/X3_Portal.jsp>

<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Filter: Processing request on URI '/D2/X3_Portal.jsp'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/D2/X3_Portal.jsp'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is not a service URI>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'SPinitiator'>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: processing>
<Nov 21, 2017 10:34:53 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: partner id is null>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <SP initiating authn request: use partner binding HTTP/POST>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <post from template url: null>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <SAML2Servlet: Processing request on URI '/saml2/sp/acs/post'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): request URI is '/saml2/sp/acs/post'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): service URI is '/sp/acs/post'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <getServiceTypeFromURI(): returning service type 'ACS'>
<Nov 21, 2017 10:34:54 AM UTC> <Debug> <SecuritySAML2Service> <Assertion consumer service: processing>

 

This WebLogic Server 2 is acting just like if it was a completely new request and not an existing one… And this is the reaction I had when I saw this log so I checked the session identifier!

Whenever you configure an Application to support SAML2 SSO, then the session identifier must be the default one. If you set something else, then you will have this infinite loop of authentication because WebLogic will consider a successful authentication as a non-authenticated one. This is described in the Oracle documentation.

So to solve this issue, that’s pretty simple… Either comment the session identifier or set it to the default value (below I’m just resetting it to the default value):

[weblogic@weblogic_server_01 ~]$ cd $APPS_HOME
[weblogic@weblogic_server_01 apps]$ jar -xvf D2.war WEB-INF/weblogic.xml
 inflated: WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ sed -i 's,<cookie-name>[^<]*<,<cookie-name>JSESSIONID<,' WEB-INF/weblogic.xml
[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ grep -A3 -B2 cookie-name WEB-INF/weblogic.xml

  <session-descriptor>
    <cookie-name>JSESSIONID</cookie-name>
    <cookie-http-only>false</cookie-http-only>
  </session-descriptor>

[weblogic@weblogic_server_01 apps]$
[weblogic@weblogic_server_01 apps]$ jar -uvf D2.war WEB-INF/weblogic.xml
adding: WEB-INF/weblogic.xml(in = 901) (out= 432)(deflated 52%)
[weblogic@weblogic_server_01 apps]$

 

Then you obviously need to redeploy your application and it will be working. :)

 

 

Cet article WebLogic – SSO/Atn/Atz – Infinite loop est apparu en premier sur Blog dbi services.

Server process name in Postgres and Oracle

Fri, 2018-02-09 16:01

Every database analysis should start with system load analysis. If the host is in CPU starvation, then looking at other statistics can be pointless. With ‘top’ on Linux, or equivalent such as process explorer on Windows, you see the process (and threads). If the name of the process is meaningful, you already have a clue about the active sessions. Postgres goes further by showing the operation (which SQL command), the state (running or waiting), and the identification of the client.

Postgres

By default ‘top’ displays the program name (like ‘comm’ in /proc or in ‘ps’ format), which will be ‘postgres’ for all PostgreSQL processes. But you can also display the command line with ‘c’ in interactive mode, or directly starting with ‘top -c’, which is the same as the /proc/$pid/cmdline or ‘cmd’ or ‘args’ in ‘ps’ format.


top -c
 
Tasks: 263 total, 13 running, 250 sleeping, 0 stopped, 0 zombie
%Cpu(s): 24.4 us, 5.0 sy, 0.0 ni, 68.5 id, 0.9 wa, 0.0 hi, 1.2 si, 0.0 st
KiB Mem : 4044424 total, 558000 free, 2731380 used, 755044 buff/cache
KiB Swap: 421884 total, 418904 free, 2980 used. 2107088 avail Mem
 
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20347 postgres 20 0 394760 11660 8696 S 7.6 0.3 0:00.49 postgres: demo demo 192.168.56.125(37664) DELETE
20365 postgres 20 0 393816 11448 8736 S 6.9 0.3 0:00.37 postgres: demo demo 192.168.56.125(37669) idle
20346 postgres 20 0 393800 11440 8736 S 6.6 0.3 0:00.37 postgres: demo demo 192.168.56.125(37663) UPDATE
20356 postgres 20 0 396056 12480 8736 S 6.6 0.3 0:00.42 postgres: demo demo 192.168.56.125(37667) INSERT
20357 postgres 20 0 393768 11396 8736 S 6.6 0.3 0:00.40 postgres: demo demo 192.168.56.125(37668) DELETE waiting
20366 postgres 20 0 394728 11652 8736 S 6.6 0.3 0:00.35 postgres: demo demo 192.168.56.125(37670) UPDATE
20387 postgres 20 0 394088 11420 8720 S 6.6 0.3 0:00.41 postgres: demo demo 192.168.56.125(37676) UPDATE
20336 postgres 20 0 395032 12436 8736 S 6.3 0.3 0:00.37 postgres: demo demo 192.168.56.125(37661) UPDATE
20320 postgres 20 0 395032 12468 8736 R 5.9 0.3 0:00.33 postgres: demo demo 192.168.56.125(37658) DROP TABLE
20348 postgres 20 0 395016 12360 8736 R 5.9 0.3 0:00.33 postgres: demo demo 192.168.56.125(37665) VACUUM
20371 postgres 20 0 396008 12708 8736 R 5.9 0.3 0:00.40 postgres: demo demo 192.168.56.125(37673) INSERT
20321 postgres 20 0 396040 12516 8736 D 5.6 0.3 0:00.31 postgres: demo demo 192.168.56.125(37659) INSERT
20333 postgres 20 0 395016 11920 8700 R 5.6 0.3 0:00.36 postgres: demo demo 192.168.56.125(37660) UPDATE
20368 postgres 20 0 393768 11396 8736 R 5.6 0.3 0:00.43 postgres: demo demo 192.168.56.125(37671) UPDATE
20372 postgres 20 0 393768 11396 8736 R 5.6 0.3 0:00.36 postgres: demo demo 192.168.56.125(37674) INSERT
20340 postgres 20 0 394728 11700 8736 S 5.3 0.3 0:00.40 postgres: demo demo 192.168.56.125(37662) idle
20355 postgres 20 0 394120 11628 8672 S 5.3 0.3 0:00.32 postgres: demo demo 192.168.56.125(37666) DELETE waiting
20389 postgres 20 0 395016 12196 8724 R 5.3 0.3 0:00.37 postgres: demo demo 192.168.56.125(37677) UPDATE
20370 postgres 20 0 393768 11392 8736 S 4.6 0.3 0:00.34 postgres: demo demo 192.168.56.125(37672) DELETE
20376 postgres 20 0 393816 11436 8736 S 4.6 0.3 0:00.37 postgres: demo demo 192.168.56.125(37675) DELETE waiting
20243 postgres 20 0 392364 5124 3696 S 1.0 0.1 0:00.06 postgres: wal writer process

This is very useful information. Postgres changes the process title when it executes a statement. In this example:

  • ‘postgres:’ is the name of the process
  • ‘demo demo’ are the database name and the user name
  • ‘192.168.56.125(37664)’ are the IP address and port of the client.
  • DELETE, UPDATE… are the commands. They are more or less the command name used in the feed back after the command completion
  • ‘idle’ is for sessions not currently running a statement
  • ‘waiting’ is added when the session is waiting on a blocker session (enqueued on a lock for example)
  • ‘wal writer process’ is a background process

This is very useful information, especially because we have, on the same sampling, the Postgres session state (idle, waiting or running an operation) with the Linux process state (S when sleeping, R when runnable or running, D when in I/O,… ).

Oracle

With Oracle, you can have ASH to sample session state, but being able to see it at OS level would be great. It would also be a safeguard if we need to kill a process.

But, the Oracle processes do not change while running. They are set at connection time.

The background processes mention the Oracle process name and the Instance name:

[oracle@VM122 ~]$ ps -u oracle -o pid,comm,cmd,args | head
 
PID COMMAND CMD COMMAND
1873 ora_pmon_cdb2 ora_pmon_CDB2 ora_pmon_CDB2
1875 ora_clmn_cdb2 ora_clmn_CDB2 ora_clmn_CDB2
1877 ora_psp0_cdb2 ora_psp0_CDB2 ora_psp0_CDB2
1880 ora_vktm_cdb2 ora_vktm_CDB2 ora_vktm_CDB2
1884 ora_gen0_cdb2 ora_gen0_CDB2 ora_gen0_CDB2

The foreground processes mention the instance and the connection type, LOCAL=YES for bequeath, LOCAL=NO for remote via listener.


[oracle@VM122 ~]$ ps -u oracle -o pid,comm,cmd,args | grep -E "[ ]oracle_|[ ]PID"
 
PID COMMAND CMD COMMAND
21429 oracle_21429_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21431 oracle_21431_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21451 oracle_21451_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21517 oracle_21517_cd oracleCDB1 (LOCAL=NO) oracleCDB1 (LOCAL=NO)

You need to join V$PROCESS with V$SESSION on (V$PROCESS.ADDR=V$SESSION.PADDR) to find the state, operation and client information

For the fun, you can change the program name (ARGV0) and arguments (ARGS).

The local connections can change the name in the BEQueath connection string:


sqlplus -s system/oracle@"(ADDRESS=(PROTOCOL=BEQ)(PROGRAM=$ORACLE_HOME/bin/oracle)(ARGV0=postgres)(ARGS='(DESCRIPTION=(LOCAL=MAYBE)(ADDRESS=(PROTOCOL=BEQ)))')(ENVS='OLE_HOME=$ORACLE_HOME,ORACLE_SID=CDB1'))" <<< "host ps -u oracle -o pid,comm,cmd,args | grep -E '[ ]oracle_|[ ]PID'"
 
PID COMMAND CMD COMMAND
21155 oracle_21155_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21176 oracle_21176_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21429 oracle_21429_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21431 oracle_21431_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21451 oracle_21451_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21517 oracle_21517_cd oracleCDB1 (LOCAL=NO) oracleCDB1 (LOCAL=NO)
22593 oracle_22593_cd postgres (DESCRIPTION=(LOCA postgres (DESCRIPTION=(LOCAL=MAYBE)(ADDRESS=(PROTOCOL=BEQ)))

The remote connection can have the name changed from the static registration, adding an ARVG0 value on the listener side:


LISTENER=(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=0.0.0.0)(PORT=1521)))
SID_LIST_LISTENER=(SID_LIST=
(SID_DESC=(GLOBAL_DBNAME=MYAPP)(ARGV0=myapp)(SID_NAME=CDB1)(ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1))
(SID_DESC=(GLOBAL_DBNAME=CDB1_DGMGRL)(SID_NAME=CDB1)(ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1))
(SID_DESC=(GLOBAL_DBNAME=CDB2_DGMGRL)(SID_NAME=CDB2)(ORACLE_HOME=/u01/app/oracle/product/12.2.0/dbhome_1))
)

When reloading the listener with this (ARGV0=myapp) to identify connection from this MYAPP service

[oracle@VM122 ~]$ sqlplus -s system/oracle@//localhost/MYAPP <<< "host ps -u oracle -o pid,comm,cmd,args | grep -E '[ ]oracle_|[ ]PID'"
PID COMMAND CMD COMMAND
21155 oracle_21155_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21176 oracle_21176_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21429 oracle_21429_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21431 oracle_21431_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21451 oracle_21451_cd oracleCDB2 (LOCAL=NO) oracleCDB2 (LOCAL=NO)
21517 oracle_21517_cd oracleCDB1 (LOCAL=NO) oracleCDB1 (LOCAL=NO)
24261 oracle_24261_cd myapp (LOCAL=NO) myapp (LOCAL=NO)

However, I would not recommend to change the default. This can be very confusing for people expecting ora_xxxx_SID and oracleSID process names.

 

Cet article Server process name in Postgres and Oracle est apparu en premier sur Blog dbi services.

12cR2 PDB archive

Wed, 2018-02-07 13:50

In 12.1 we had the possibility to unplug a PDB by closing it and generating a .xml file that describes the PDB metadata required to plug the datafiles into another CDB.
In 12.2 we got an additional possibility to have this .xml file zipped together with the datafiles, for an easy transport. But that was not working for ASM files.
The latest Release Update, Oct 17 includes the patch that fixes this issue and is the occasion to show PDB archive.

Here is Oracle 12.2.0.1 with Oct 2017 (https://updates.oracle.com/download/26737266.html) applied (needs latest OPatch https://updates.oracle.com/download/6880880.html)
With a PDB1 pluggable database:

[oracle@VM106 ~]$ rman target /
 
Recovery Manager: Release 12.2.0.1.0 - Production on Wed Oct 18 16:16:41 2017
 
Copyright (c) 1982, 2017, Oracle and/or its affiliates. All rights reserved.
 
connected to target database: CDB1 (DBID=920040307)
 
RMAN> report schema;
 
using target database control file instead of recovery catalog
Report of database schema for database with db_unique_name CDB1
 
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 810 SYSTEM YES /acfs/oradata/CDB1/datafile/o1_mf_system_dmrbv534_.dbf
3 540 SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrbxvds_.dbf
4 70 UNDOTBS1 YES /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrbz8mm_.dbf
5 250 PDB$SEED:SYSTEM NO /acfs/oradata/CDB1/datafile/o1_mf_system_dmrc52tm_.dbf
6 330 PDB$SEED:SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrc52t9_.dbf
7 5 USERS NO /acfs/oradata/CDB1/datafile/o1_mf_users_dygrpz79_.dbf
8 100 PDB$SEED:UNDOTBS1 NO /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrc52x0_.dbf
21 250 PDB1:SYSTEM YES /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_system_dygrqqq2_.dbf
22 350 PDB1:SYSAUX NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_sysaux_dygrqqs8_.dbf
23 100 PDB1:UNDOTBS1 YES +ASM1/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/DATAFILE/undotbs1.257.957719779
 
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 33 TEMP 32767 /acfs/oradata/CDB1/datafile/o1_mf_temp_dmrc4wlh_.tmp
2 64 PDB$SEED:TEMP 32767 /acfs/oradata/CDB1/pdbseed/temp012017-06-10_19-17-38-745-PM.dbf
3 64 PDB1:TEMP 32767 /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_temp_dygrqqsh_.dbf

I have moved one file to ASM to show that it is now handled correctly.

The pluggable database is closed, we can unplug it. Nothing changes with the unplug syntax except the extension of the file. If the file mentioned is a .pdb instead of a .xml then it is a PDB archive:

RMAN> alter pluggable database PDB1 unplug into '/var/tmp/PDB1.pdb';
 
RMAN> alter pluggable database PDB1 close;
 
Statement processed
 
RMAN> alter pluggable database PDB1 unplug into '/var/tmp/PDB1.pdb'
2> ;
 
Statement processed
 
RMAN> exit

Actually it is just a zip file with the datafiles, without the full path:

[oracle@VM106 ~]$ unzip -t /var/tmp/PDB1.pdb
Archive: /var/tmp/PDB1.pdb
testing: o1_mf_system_dygrqqq2_.dbf OK
testing: o1_mf_sysaux_dygrqqs8_.dbf OK
testing: undotbs1.257.957719779 OK
testing: /var/tmp/PDB1.xml OK
No errors detected in compressed data of /var/tmp/PDB1.pdb.

You can see that the ASM file is not different from the others.

I drop the pluggable database

RMAN> drop pluggable database PDB1 including datafiles;
 
using target database control file instead of recovery catalog
Statement processed
 

And plug back the PDB1, as PDB2, using the zip file:

RMAN> create pluggable database PDB2 using '/var/tmp/PDB1.pdb';
 
Statement processed
 
RMAN> report schema;
 
Report of database schema for database with db_unique_name CDB1
 
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 810 SYSTEM YES /acfs/oradata/CDB1/datafile/o1_mf_system_dmrbv534_.dbf
3 540 SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrbxvds_.dbf
4 70 UNDOTBS1 YES /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrbz8mm_.dbf
5 250 PDB$SEED:SYSTEM NO /acfs/oradata/CDB1/datafile/o1_mf_system_dmrc52tm_.dbf
6 330 PDB$SEED:SYSAUX NO /acfs/oradata/CDB1/datafile/o1_mf_sysaux_dmrc52t9_.dbf
7 5 USERS NO /acfs/oradata/CDB1/datafile/o1_mf_users_dygrpz79_.dbf
8 100 PDB$SEED:UNDOTBS1 NO /acfs/oradata/CDB1/datafile/o1_mf_undotbs1_dmrc52x0_.dbf
24 250 PDB2:SYSTEM NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_system_dygwt1lh_.dbf
25 350 PDB2:SYSAUX NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_sysaux_dygwt1lm_.dbf
26 100 PDB2:UNDOTBS1 NO /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_undotbs1_dygwt1lo_.dbf
 
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 33 TEMP 32767 /acfs/oradata/CDB1/datafile/o1_mf_temp_dmrc4wlh_.tmp
2 64 PDB$SEED:TEMP 32767 /acfs/oradata/CDB1/pdbseed/temp012017-06-10_19-17-38-745-PM.dbf
4 64 PDB2:TEMP 32767 /acfs/oradata/CDB1/5BD3ED9D73B079D2E0536A4EA8C0967B/datafile/o1_mf_temp_dygwt1lp_.dbf

Here all files are there, created in the db_create_file_dest.

File name convert

When you create a pluggable database and you are not in OMF you need to add a FILE_NAME_CONVERT to convert from the source file names to destination file names. When the files are referenced by a .xml file, the .xml file references the path to the files as they were in the source database. If you move then, you can update the .xml file, or you can use SOURCE_FILE_NAME_CONVERT to mention the new place. With a .pdb archive, the .xml inside contains the original path, but this is not what will be used. The path of the .pdb itself is used, as if the files were unzipped at that place.

If you use Oracle-Managed-Files, don’t care about the file names and then you don’t need all those file name converts.

 

Cet article 12cR2 PDB archive est apparu en premier sur Blog dbi services.

JAN18: Database 11gR2 PSU, 12cR1 ProactiveBP, 12cR2 RU

Tue, 2018-02-06 15:32

If you want to apply the latest patches (and you should), you can go to the My Oracle Support Recommended Patch Advisor. But sometimes it is not up-todate. For example, for 12.1.0.2 only the PSU is displayed and not the Proactive Bundle Patch, which is highly recommended. And across releases, the names have changed and can be misleading: PSU for 11.2.0.4 (no Proactive Bundle Patch except for Engineered Systems). 12.1.0.2 can have SPU, PSU, or Proactive BP but the latest is highly recommended, especially now that it includes the adaptive statistics patches. 12.2.0.1 introduce the new RUR and RU, the latest one being the one recommended.

To get things clear, there’s also the Master Note for Database Proactive Patch Program, with reference to one note per release. This blog post is my master note to link directly to the recommended updates for Oracle Database.

Master Note for Database Proactive Patch Program (Doc ID 756671.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=756671.1

11.2.0.4 – PSU

Database 11.2.0.4 Proactive Patch Information (Doc ID 2285559.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=2285559.1
Paragraph -> 11.2.0.4 Database Patch Set Update

Latest as of Q1 2018 -> 16-Jan-2018 11.2.0.4.180116 (Jan 2018) Database Patch Set Update (DB PSU) 26925576 (Windows: 26925576)

12.1.0.2  – ProactiveBP

Database 12.1.0.2 Proactive Patch Information (Doc ID 2285558.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=2285558.1
Paragraph -> 12.1.0.2 Database Proactive Bundle Patches (DBBP)

Latest as of Q1 2018 -> 16-Jan-2018 12.1.0.2.180116 Database Proactive Bundle Patch (Jan 2018) 12.1.0.2.180116 27010930

12.2.0.1 – RU

Database 12.2.0.1 Proactive Patch Information (Doc ID 2285557.1)
https://support.oracle.com/epmos/faces/DocContentDisplay?id=2285557.1
Paragraph -> 12.2.0.1 Database Release Update (Update)

Latest as of Q1 2018 -> 16-Jan-2018 12.2.0.1.180116 (Jan 2018) Database Release Update 27105253 (Windows: 12.2.0.1.180116 WIN DB BP 27162931)
 

Don’t forget SQL Developer

In the 12c Oracle Home SQL Developer is installed, but you should update it to the latest version.
Download the following from http://www.oracle.com/technetwork/developer-tools/sql-developer/downloads/index.html
– The SQL Developer zip for ‘Other Platforms’, currently 17.4
– The SQLcl zip for ‘All Platforms’, currently 17.4

On the server, remove, or rename the original directory:
mv $ORACLE_HOME/sqldeveloper $ORACLE_HOME/sqldeveloper.orig

Unzip what you have downloaded:
unzip -d $ORACLE_HOME/ sqldeveloper-*-no-jre.zip
unzip -d $ORACLE_HOME/sqldeveloper sqlcl-*-no-jre.zip

I suggest to have a login.sql which sets the beautiful ansiconsole for SQLcl:

echo "set sqlformat ansiconsole" > $ORACLE_HOME/sqldeveloper/sqlcl/login.sql

On 12.2 you can run SQLcl just with ‘sql’ (and same arguments as sqlplus: / as sysdba or /nolog,…) because this is what is defined in $ORACLE_HOME/bin.
However, it sets the current working directory and i prefer to keep the current one as it is probably were I want to run scripts from.

Then I add the following aliases in .bashrc

alias sqlcl='JAVA_HOME=$ORACLE_HOME/jdk SQLPATH=$ORACLE_HOME/sqldeveloper/sqlcl bash $ORACLE_HOME/sqldeveloper/sqlcl/bin/sql'
alias sqldev='$ORACLE_HOME/sqldeveloper/sqldeveloper.sh'

When running SQL Developer for the first time you can create automatically a ‘/ as sysdba’ connection (but remember this is not a good practice to connect like this) and a connection for each user declared in the database: Right click on Connections and Create Local Connections

 

Cet article JAN18: Database 11gR2 PSU, 12cR1 ProactiveBP, 12cR2 RU est apparu en premier sur Blog dbi services.

Pages