Friday, April 6, 2012

Polling with the DbAdapter in a clustered environment

Introduction

Most customers use a clustered production environment. The development environment is often not clustered. There are several things to consider for developers when the software developed will eventually run in a clustered environment. It would be a shame if the software has been developed, unit tested, system tested, accepted by the users and then breaks on the production system.

I will first discuss the DbAdapter and polling in this post. This is not a complete description of all the settings which can influence this behavior, just some things I've tried and problems I've encountered.

I've used the (active-active cluster) setup as described in;
http://javaoraclesoa.blogspot.com/2012/03/oracle-soa-suite-cluster-part-1.html
http://javaoraclesoa.blogspot.com/2012/03/oracle-soa-suite-cluster-part-2.html

This article is about the DbAdapter. An error which can occur when using the AqAdapter when dequeueing in a clustered environment is that a message is queued once and dequeued more then once. This can occur in 11.2 databases. Look at; http://www.oracle.com/technetwork/middleware/docs/aiasoarelnotesps5-1455925.html for a description on how to fix this. Below has been copied from the mentioned document;

Bug: 13729601
Added: 20-February-2012
Platform: All
The dequeuer returns the same message in multiple threads in high concurrency environments when Oracle database 11.2 is used. This means that some messages are dequeued more than once. For example, in Oracle SOA Suite, if Service 1 suddenly raises a large number of business events that are subscribed to by Service 2, duplicate instances of Service 2 triggered by the same event may be seen in an intermittent fashion. The same behavior is not observed with a 10.2.0.5 database or in an 11.2 database with event10852 level 16384 set to disable the 11.2 dequeue optimizations.

Workaround: Perform the following steps:

    Log in to the 11.2 database:
    CONNECT /AS SYSDBA

    Specify the following SQL command in SQL*Plus to disable the 11.2 dequeue optimizations:
    SQL> alter system set event='10852 trace name context forever,
    level 16384'scope=spfile;

Polling setup

In an active/active cluster configuration, a deployed process will have two instances of a process polling on the same table. In this case it is important to consider if it will be a problem if more then one instance picks up the same entry in the table.

I used the following database setup to simulate and log the test

The script contains three tables;
POLLING_TEST_CLUSTER
- this table will be used by the DbAdapter for polling
POLLING_TEST_LOG
- this table will log status changes in POLLING_TEST_CLUSTER (POLLING_TEST_CLUSTER has a before update trigger)
POLLING_TEST_OUTPUT
- a BPEL process will read from POLLING_TEST_CLUSTER and put entries in this table. this table has a unique constraint on the ID column. the same ID is used as in POLLING_TEST_CLUSTER thus if the same entry from the  POLLING_TEST_CLUSTER table is picked up twice by BPEL, it will cause a unique key constraint when it tries to insert the entry in POLLING_TEST_CLUSTER

I've used a 'pragma autonomous_transaction' in the logging procedure. This will fail (with an ORA-06519: active autonomous transaction detected and rolled back) if I don't end the procedure with an explicit commit.

Next configure a datasource and the database adapter in the Weblogic Console so you can use them in BPEL. Don't create an XA datasource! It will cause problems with autonomous transactions such as java.sql.SQLException: Cannot call rollback when using distributed transactions (XA datasources can also cause problems with database links; http://javaoraclesoa.blogspot.com/2012/02/exception-occured-when-binding-was.html)

When configuring the DbAdapter, keep in mind that you have to copy the Plan.xml file (deployment plan for the DbAdapter) to the other managed server if you have not configured a shared storage for this file (which is suggested in the Enterprise Deployment Guide, http://docs.oracle.com/cd/E17904_01/core.1111/e12036/extend_soa.htm, paragraph 5.21.1). If you don't do this, the connection factory will not be available in the other managed server.

Polling test

You can download the processes here

I created a small process to insert a record in the POLLING_TEST_CLUSTER table with a status NEW so it would directly be picked up. I used SOAPUI (http://www.soapui.org/) to do a stress test and call this process a large number of times.

I was able to produce the error (a small number of times at high loads) that two instances of the adapter, running on different servers in the cluster, picked up and processed a message at the same time. I have also seen this happening at a customer.

In my setup this situation would cause a unique constraint violation as shown below;

<bpelFault><faultType>0</faultType><bindingFault xmlns="http://schemas.oracle.com/bpel/extension"><part name="summary"><summary>Exception occured when binding was invoked. Exception occured during invocation of JCA binding: "JCA Binding execute of Reference operation 'insert' failed due to: DBWriteInteractionSpec Execute Failed Exception. insert failed. Descriptor name: [write_textline_DB.PollingTestOutput]. Caused by java.sql.BatchUpdateException: ORA-00001: unique constraint (TESTUSER.POLLING_TEST_OUTPUT_PK) violated . Please see the logs for the full DBAdapter logging output prior to this exception. This exception is considered not retriable, likely due to a modelling mistake. To classify it as retriable instead add property nonRetriableErrorCodes with value "-1" to your deployment descriptor (i.e. weblogic-ra.xml). To auto retry a retriable fault set these composite.xml properties for this invoke: jca.retry.interval, jca.retry.count, and jca.retry.backoff. All properties are integers. ". The invoked JCA adapter raised a resource exception. Please examine the above error message carefully to determine a resolution. </summary></part><part name="detail"><detail>ORA-00001: unique constraint (TESTUSER.POLLING_TEST_OUTPUT_PK) violated </detail></part><part name="code"><code>1</code></part></bindingFault></bpelFault>

This occurred even with the NumberOfThreads value set to 1 (this is the default);


Below I will describe two possible solutions for this issue and my experience with it. Distributed polling and using a Reserved Value.

Distributed Polling

This is also described more extensively in; http://www.oracle.com/technetwork/database/features/availability/maa-soa-assesment-194432.pdf

It is possible to set the DbAdapter property to do distributed polling;


Distributed Polling means that when a record is read, it is locked by the reading instance. Another instance which wants to pickup the record skips locked records. This can however cause problems with locks which could originate from different sources then the processes; records which would require processing, could be skipped.

Also, a BPEL process is by default invoked asynchronously by the DbAdapter;


This causes the lock to be released right after the DbAdapter is done with it and the BPEL process is started. This makes a case for using the logical delete provided in the DbAdapter if you want to use this mechanism and not update the field later in the BPEL process.

Using distributed polling in combination with logical delete is however not recommended by Oracle; from the manual (Help button in the JDeveloper wizard); A better alternative is to set either NumberOfThreads or MarkReservedValue for logical delete or delete strategies.

I tested the same process with distributed polling enabed. Still a small number of processes failed with a unique key constraint thus this mechanism is not 100% safe. I did get a bit better results however.

Unread and reserved value

Unread Value

Setting the Unread value causes the select query used for polling to contain a where clause matching the field to the Unread value. During my test I found that setting the Unread value in the DbAdapter configuration wizard caused my process not to pickup records with the set value. I have however seen at customers that this value was used succesfully to limit the records being picked up.


The help documentation says the following (which made me doubt the purpose of this field); Unread Value
(Optional) Enter an explicit value to indicate that the row does not need to be read. During polling, this row is skipped.


The below image showed a setting that did work.


Reserved value

In the release notes of SOA Suite 11.1.1.4 (https://supporthtml.oracle.com/epmos/faces/ui/km/SearchDocDisplay.jspx?_afrLoop=3308018508015000&type=DOCUMENT&id=1290512.1&displayIndex=3&_afrWindowMode=0&_adf.ctrl-state=y8cqyff7j_134), the following is documented;

18.1.5.1 Distributed Polling Using MarkReservedValue Disabled by Default
In this release, Oracle recommends that you use the new distributed polling approach based on skip locking. When editing an Oracle Database Adapter service which has a MarkReservedValue set, that value will be removed to enable the new best practice. To use the old distributed polling approach based on a reserved value, select the value from the drop down menu.

In the help in JDeveloper, the following is documented for distributed polling (skip locking as mentioned above); Distributed Polling. Select this checkbox if distributed polling is required. However, this implementation uses a SELECT FOR UPDATE command. A better alternative is to set either NumberOfThreads or MarkReservedValue for logical delete or delete strategies.


As you can see in the above screenshots, it is possible to set a reserved value.


This reserved value (MarkReservedValue in *_db.jca) causes an instance of the process to set an identifier. This identifier is skipped by the other polling instances in the cluster.

When I however tried to use this setting (Reserved Value) in 11.1.1.6 (of course setting the Unread value to ''), I noticed the DbAdapter did not pickup any messages. When changing the Read Value and redeploying (after emptying the reserved value), it did pickup messages again immediately. I'm not sure why it didn't work in this test. My guess is for this to work, an additional setting is required. If I've found this setting, I will update this post. Also notice the wizard empties the Reserved Value if you go through it again. For the time being, I'll use the skip locking setting (distributed polling).

Singleton DbAdapter

Based on a suggestion done in the comments of this post, there is also the option to configure the DbAdapter as a singleton with a JCA property.

See the documentation for more information on this;
http://docs.oracle.com/cd/E23943_01/integration.1111/e10231/life_cycle.htm#BABDAFBH

The behavior of this property is described in the following post; http://ayshaabbas.blogspot.nl/2012/11/db-adapter-singleton-behaviour-in-high.html

5 comments:

  1. Thanks for Nice Article. How about the File Polling in Clustered envionrment. Is there any way apart from deleting/archiving the file for avoiding it to be picked up by another managed server.

    ReplyDelete
  2. For inbound operations I would suggest using a shared storage and the archive/delete options.

    As an alternative, I've seen a customer cluster implementation where there was no shared storage and inbound files were put on different servers (each running a managed server) in the same directory. The distribution of where (on which server) to put the inbound files, was done externally.

    Another option is to keep track of processed files in a BPEL process (by using for example a database table) and check if the file is already being processed by the another managed server. I have not tried this and if you're considering this, I would suggest a stress test to ensure this mechanism works under high loads and to maybe tweak the DbAdapter used for querying/updating the table in which you keep track of processed files.

    Outbound FileAdapter high availability configuration can be found at http://docs.oracle.com/cd/E25178_01/core.1111/e12036/extend_soa.htm#SOEDG237.

    ReplyDelete
  3. I am having a problem with BPEL process (Database adapter) on clustered environment (2 nodes). The adapter is supposed to poll a database table for changes after 8 minutes interval (PollingInterval=480) but it is working in a different manner.

    1) BPEL instance is created and it polls at table (time 05:13:21)
    2) BPEL instance is created and it polls at table (time 05:14:54) Another instance is created that only fetches single records from table even if there are many records.
    3) BPEL instance is created and it polls at table (time 05:14:53)
    4) BPEL instance is created and it polls at table (time 05:21:21)
    5) BPEL instance is created and it polls at table (time 05:22:54)

    Just wanted to know if you faced similar situation or have an idea what I am doing wrong?

    ReplyDelete
  4. When deploying a process to a cluster, usually you first deploy the process to the first node. The cluster then deploys the process at a later time to the other nodes of the cluster. The time at which a process at a specific node starts polling can differ because of this. In your case I think the process at the first node starts polling at 05:13:21 and after 8 minutes; 05:21:21. The process on the other node starts polling at 05:14:54 and after 8 minutes at 05:22:54. I can't explain instance 3).

    If only one record is fetched, you should probably tweak the DbAdapter settings. The following post might help; http://www.albinsblog.com/2012/10/db-adapter-message-throttling-in-oracle.html#.UJoGpIaJXsc

    ReplyDelete
  5. Thanks Maarten!

    Now I am singleton property to true in composite.xml and it is running fine.

    ReplyDelete