SQL Server Indonesia User Groups Community January 2009 - Posts - kasim.wirama

January 2009 - Posts

Kinds of Storage Testing in SQL Server
29 January 09 09:18 PM | kasim.wirama | with no comments

Kinds of Storage Testing in SQL Server

By : Kasim Wirama, MCDBA, MCITP, MVP SQL Server

 

Before a new database is deployed into production server, it is strongly suggested that storage of the server box is tested against performance perspective to know how far the storage capability to handle normal load and under heavy load. There are 3 kinds of test, i.e. performance test, stress test and real life performance.

 

Performance test is a test to some or all database objects to get how fast a thing gets done. Sometimes performance test is more like stress test but the objective between them are different. In performance test, the objective is how fast you can get something done, not how many you have it done.

 

Stress test is used to get information in what extent a database server could hold up for concurrent request. Stress test could be said parallel test while performance test is called sequential test. Result correctness would be in consideration when doing both kind of test.

 

Real life performance test is a mixed test between performance test and stress test. Usually use case is used to get information for the test. What is difference between real life performance test and 2 previous tests? Real life performance refers to actual unit of works a user does while stress/performance test refers to smaller server subsystem such as single disk’s I/O or few disk I/O’s together. Talking about use case, you can devise what kind operation might be involved whether they are random/sequential read or random/sequential write for each activities in a use case.

 

For example of sequence of customer billing info activities in a use case:

  1. query customer for general info

operation : random/sequential read.

 
  1. get customer detail info

operation : random/sequential read

 
  1. get customer billing summary

operation : random/sequential read

 
  1. get customer billing history

operation : random/sequential read

 
  1. update customer data

operation : random write

 
Filed under:
Measuring I/O Performance with System Monitor
29 January 09 09:18 PM | kasim.wirama | with no comments

Measuring I/O Performance with System Monitor

By : Kasim Wirama, MCDBA, MCITP, MVP SQL Server

 

System monitor has a bunch of counters for measuring environment subsystem such as processor, memory, disk and SQL Server subsystems. To measure how well your storage subsystem performs to serve application request into its database server, system monitor provides list of informative system counters. Here they are :

  1. Avg. Disk Queue Length, Avg. Disk Read Queue Length, and Avg. Disk Write Queue Length

These performance counters are used to indicate how many queues regarding to disk requests. High values indicate that the disk is over stressed. Recommended value for them is at most one or zero is even better.

 
  1. Avg. Disk Sec/Read, Avg. Disk Sec/Transfer, and Avg. Disk Sec/Write

They give information about latency of I/O. the results is average over a period. When it is converted into mili second, you can compare them to results from SQLIO test result.

 
  1. Disk Bytes/Sec, Disk Read Bytes/Sec, and Disk Write Bytes/Sec

They give information about throughput in Bytes/Sec. when it is converted into MB/Sec, you can compare them to results from SQLIO result.

 
  1. Disc Reads/Sec, Disk Transfers/Sec and Disk Writes/Sec

They give information how many I/O operations per second (IOPS). IOPS from them could be used to compare IOPS optimum capacity resulted from SQLIO test.

 

For daily storage monitoring, I recommend you monitor by using these 2 performance counters here :

  1. %Disk read time, %Disk time, %Disk write time and %Disk idle time
  2. Avg. Disk Bytes/Read, Avg. Disk Bytes/Transfer and Avg. Disk Bytes/Write.
Filed under:
Concatenating String in Transact SQL
29 January 09 09:17 PM | kasim.wirama | with no comments

Concatenating String in Transact SQL

By : Kasim Wirama, MCDBA, MCITP, MVP SQL Server

 

I would like to discuss about concatenating string in Transact SQL (T-SQL). T-SQL is scripting language in SQL world and has powerful performance when it is used in set-based oriented way. One common requirement is to display result of some values coming from a column and they are separated by certain character, such as comma. For example : I would like to display customer list served by each employees in orders table database Northwind. Let’s take a look how to achieve this in SQL Server 2000 and then compare with SQL Server 2005-later.

 In SQL Server 2000 and previous version 

You can output customer list for a given employee by the following T-SQL

DECLARE @customer varchar(8000);

 

SELECT @customer = COALESCE(@customer + ‘, ‘, ‘’) + CAST(CustomerID AS VARCHAR(5))
FROM dbo.Orders
WHERE EmployeeID = @EmployeeID

 

RETURN @customer;

 

You can encapsulate the script above into function here :

 

CREATE FUNCTION dbo.GetCustomerList
(@EmployeeID INT)
RETURNS VARCHAR(8000)

AS
BEGIN

 

DECLARE @customer varchar(8000);

 

SELECT @customer = COALESCE(@customer + ‘, ‘, ‘’) + CAST(CustomerID AS VARCHAR(5))
FROM dbo.Orders
WHERE EmployeeID = @EmployeeID

 

RETURN @customer;

 

END;

 

After creating the function, you can called the function for getting customerlist for each employee.

 

SELECT EmployeeID, dbo.GetCustomerList(EmployeeID)

FROM dbo.Employees;

 In SQL Server 2005 and later version

You can use combination of STUFF and FOR XML PATH combination as shown here :

DECLARE @customerList varchar(8000);

 

SET @ customerList = STUFF((select + ',  ' + customerID FROM dbo.Orders WHERE employeeID = @employeeID FOR XML PATH ('') ),1,2,'');

 

RETURN @customerList ;

 

With similar way, encapsulate it into function and it is ready to be call by other query.

 Which one is recommended way?

Both returns same result but concatenation in SQL Server 2000 harness undocumented recursive call. It is side effect from query engine and it is not an intended capability from the engine itself. So it is not guaranteed it will be continually supported in future version of SQL Server. If you use SQL Server 2005 or later, it is better you change such a recursive call into more elegant solution using proprietary STUFF and FOR XML PATH though it is not in ANSI standard.

Filed under:
Service Broker
19 January 09 09:26 AM | kasim.wirama | with no comments

Service Broker

By : Kasim Wirama, MCDBA, MCITP, MVP SQL Server

 

This posting, I would like to unveil basic knowledge of service broker. Service broker has been introduced since SQL Server 2005. And I observe that it is still not widely used by database application. I think it is because no user interface to setup and monitor service broker application. In fact, service broker, a database level messaging system, offers reliable database scalability and high availability because of transactional and asynchronous nature. Service broker application is about communication between services matter, those services could be in one database, across database in one SQL Server instance, across database in across SQL Server instances and across database in across SQL Server instance in different physical database machine. Communication in service broker is based on TCP/IP binary mode. It means that case sensitivity in service broker objects does matter so I recommend you create service broker objects (described further below) with url schema type enclosed by [] sign.

By default service broker is not enabled in your database, to check whether your database is broker enabled you could issue this statement :

SELECT IS_BROKER_ENABLED FROM SYS.DATABASES WHERE NAME = ‘Adventureworks2008’

I use sample database in SQL Server 2008, AdventureWorks2008.

If it is enable state then it returns 1, otherwise 0. If you have 0 , you need to enable it by issuing TSQL command :

ALTER DATABASE Adventureworks2008
SET ENABLE_BROKER;

You need to remember that when a database is restored from backup, by default service broker is not enabled. In this case make sure you check broker ability state by doing steps shown previously.

In this posting, I would describe each components of service broker, and I show you example of creating those components with T-SQL command.

 

Message type

Smallest unit of service broker is message type. What contains in message type is information that you want to deliver to destination. It is optional if you want to include your data in message type. Here is declaration to create message type :

CREATE MESSAGE TYPE [//test/ItemAdded]
 this is most simple one without any data in the message type.

Another variation of message type creation is (remember that each time you try to create variation of message type with same name, don’t forget to drop them first by issuing :

DROP MESSAGE TYPE  [//test/ItemAdded];

)

CREATE MESSAGE TYPE [//test/ItemAdded]
VALIDATION = NONE;

VALIDATION = NONE here means no validation applied to the message type when a data is included in it.

Another variation of message type creation is

CREATE MESSAGE TYPE [//test/ItemAdded]
VALIDATION = EMPTY;

VALIDATION = EMPTY here means you are not expected to put any data in it. If you put data in it, service broker will raise error.

Another variation of message creation is

CREATE MESSATE TYPE [//test/ItemAdded]
VALIDATION = WELL_FORMED_XML

VALIDATION = WELL_FORMED_XML here means you are expected to input standard and well formed common XML.

Another variation of message creation is

CREATE MESSATE TYPE [//test/ItemAdded]
VALIDATION = VALID_XML WITH XML SCHEMA COLLECTION <xml schema collection name>

VALIDATION = VALID_XML WITH XML SCHEMA COLLECTION here means that input data should be conformed with XSD (XML schema definition) which is defined in xml schema collection name database object.

You can query list of message type in your database by referencing dynamic management view (DMV) named SYS.SERVICE_MESSAGE_TYPES.

Message contract

This service broker object will bind one or more message types to message sender (called INITIATOR) and message receiver (called TARGET). I show you syntax how to create message contract.

CREATE CONTRACT [//test/ItemAddedContract]
([//test/ItemAdded] SENT BY ANY);

[//test/ItemAddedContract] is contract name and [//test/ItemAdded] is message type that you create earlier. Both message type could be used by INITIATOR and TARGET.

You can drop message contract by issuing :

DROP CONTRACT [//test/ItemAddedContract];

You can query DMV before deletion by checking through this query :

SELECT * FROM SYS.SERVICE_CONTRACTS;

If you want INITIATOR and TARGET issue different message type you could create the following contract here :

CREATE CONTRACT [//test/ItemAddedContract]
([//test/AddItem] SENT BY INITIATOR
,[//test/ItemAdded] SENT BY TARGET);

Before you execute above statement, make sure those message types exist first ([//test/AddItem] and [//test/ItemAdded]).

 

Queue

Like MSMQ, service broker implements queue. Queue is a place where a message (encapsulated by message type, enforced by message type) is put into it before the message is taken out by destination. Physically queue is database table created by service broker that has already has its own schema predefined. It is recommended you put queue in other filegroup. Here is the queue creation syntax :

CREATE QUEUE dbo.ItemQueue
WITH STATUS = OFF, ACTIVATION
(
     PROCEDURE_NAME = dbo.myprocedure,
     MAX_QUEUE_READERS = 10,
     EXECUTE AS SELF
);

If you don’t specify STATUS = OFF, by default it will be ON immediately after creation. MAX_QUEUE_READERS here is maximum number of activation stored procedure instances when queue starts at the same time.

To set the queue status from OFF to ON issue this TSQL command :

ALTER QUEUE dbo.ItemQueue
WITH STATUS = ON;

To get information about queues in your database, issue statement to SYS.SERVICE_QUEUES dynamic management view.

Service

Service is actually queue. So what is difference between service and queue? Service is queue in logical term, or I can say service is logical queue whereas queue is physical one. With service, it is possible to decouple queue to its physical database instance (activation stored procedure and physical filegroup). Service also binds queue with service contract. More interesting here is that a service can only have one queue but one or more service contract.

Here is syntax to create service :

CREATE SERVICE [//test/AddItemService]
ON QUEUE dbo.ItemQueue
([//test/ItemAddedContract]);

Establish dialog between services

Now, basic service broker infrastructure have been created, it is now ready to be used in service broker conversation. To begin conversation issue this statement :

BEGIN DIALOG CONVERSATION @conversation_handle
FROM SERVICE [//test/AddItemService]
TO SERVICE ‘//test/AddItemService’
ON CONTRACT [//test/ItemAddedContract]
WITH ENCRYPTION = OFF, LIFETIME = 3600;

A conversation session has been established through @conversation_handle, an uniqueidentifier variable type. LIFETIME here denotes maximum time an conversation is held (in seconds).

A DMV for knowing status of conversation state is issuing :

SELECT STATE, STATE_DESC FROM SYS.CONVERSATION_ENDPOINTS;

To close conversation, issue the following TSQL statement :

END CONVERSATION @conversation_handle;

Monitor service broker

The most possible service broker application is in trouble when a message has not been sent out successfully to destination. It might be because of destination is unreachable. To monitor message status of running service broker, you can issue this statement :

SELECT * FROM SYS.TRANSMISSION_QUEUE;

You can check TRANSMISSION_STATUS column in SYS.TRANSMISSION_QUEUE;

Once you master basic of service broker application objects, you have foundation to build next stage of reliable service broker application.

Filed under:
Differential Backup in SQL Server
19 January 09 08:12 AM | kasim.wirama | with no comments

Differential Backup in SQL Server

By : Kasim Wirama, MCDBA, MCITP, MVP SQL Server

 

This article I would like to share about differential database backup in SQL Server. SQL Server gives 3 kinds of database backup options, i.e. full database backup, differential database backup, and transaction log backup. It is not incorrect decision to use all of them, it depends on your SLA (Service Level Agreement), in what extent your organization allows data loss in case there happens database disaster. Let’s read through for database differential backup discussion.

Differential database backup will backup last changed data since full database backup. How does SQL Server recognize data changes since last full backup? Here is the secret under the hood. During full backup, SQL Server will reset bitmap for each extent. An extent contains 8 contiguous 8 KB-size pages. When a data changes in one extent, a bitmap will be changed and when differential database backup happens, it will check current bitmap, when it is changed since last full backup, the extent will be backup.

How do you use differential backup for restore purpose? If you have last full database backup and several differential database backup, you just need to restore last full database backup and follow by only last differential database backup to have your data back as recent as possible.

Other things you need to consider differential backup in your backup strategy is that the size of differential backup is typically smaller than full database backup. It is possible that size of differential backup is same as full database backup. When it happens, you need to revise the length period between full database backup and next full database backup. Differential database backup time is shorter than full database backup. It is recommended especially you have limited backup space.

Filed under:
How SQL Server Full Backup Works
19 January 09 07:48 AM | kasim.wirama | with no comments

How SQL Server Full Backup Works

By : Kasim Wirama, MCDBA, MCITP, MVP SQL Server

 

I would like to share with you about inner working of SQL Server full backup in this posting. As you probably know that full backup in SQL Server will backup data into device. Why don’t I say backup to file? Let’s read through the remainder of this article.

Device in terms of SQL Server backup is backup media. It might be tape or file. A device might contain one or more files inside.  You can backup multiple times into same device or you want to correspond one to one relationship between backup file and device. Honestly speaking, it is easy to find out or track backup file with one to one relationship.

From SQL Server 2005, two backup activities could run parallel. It is good capability but it is necessary and also creates high I/O demand.

Now, what happened when full backup runs? Here is the steps SQL Server will be doing :

1.       Checkpoint database means that SQL Server writes all dirty pages (pages that have changed since loaded into memory) to disk to ensure only committed transactions included in full backup.

2.       Mark beginning of transaction log to validate backup file(s) whether it could be used in restore process.

3.       Read data file and write into backup device.

4.       Mark end of transaction log when full backup has finished.

At point 3 I mention data file, why don’t I mention entire database (including transaction log file)? Full backup doesn’t include transaction log file, meaning that transaction log file is neither truncated nor backed up. Full backup doesn’t backup server login and job as well.

When you have only full backup database, you just only restore last transaction which is included on the last full backup device.

Grasp this basic important understanding how full backup works as DBA basic task, an exceptional data guard.

Filed under:
Concatenation, Merge Join Concatenation and Hash Match Union
17 January 09 05:04 AM | kasim.wirama | with no comments

Concatenation, Merge Join Concatenation and Hash Match Union

By : Kasim Wirama, MCDBA

Other operators that you might encounter in execution plan are concatenation, merge join concatenation and hash match union. Usually these operators are generated when your query contains UNION or UNION ALL. Let’s look each of them below:

Concatenation

 

The operator happens when your query contains UNION ALL. For example :

SELECT lastname , city, country FROM employees
UNION ALL
SELECT contactname, city, country FROM customers;

The corresponding execution plan is :

  |--Concatenation

       |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Employees].[PK_Employees]))

       |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Customers].[PK_Customers]))

Notice that two clustered index scans are join with concatenation operator with any attributes on it. It just joins two input sets become one without eliminating duplicate rows from both inputs.

Merge Join Concatenation

 

Merge join operator happens when select list columns get indexed and sorted. The example query is :

SELECT lastname , city, country FROM employees
UNION ALL
SELECT contactname , city, country FROM customers
ORDER BY lastname;

Before executing above query, create index lastname, city, country on employes table and contactname, city, country on customers table.

Execution plan for above query is :

  |--Merge Join(Concatenation)

       |--Index Scan(OBJECT:([Northwind].[dbo].[Employees].[ix_1]), ORDERED FORWARD)

       |--Index Scan(OBJECT:([Northwind].[dbo].[Customers].[ix_1]), ORDERED FORWARD)

Hash Match Union

 

Other possible operator for UNION query is hash match. Hash match happens when you join a huge table with many duplicate with a table with unique column.

For example :

CREATE TABLE hugetable
(cola INT PRIMARY KEY ,
colb INT,
colc CHAR(1000));

CREATE TABLE uniquetable
(cola INT PRIMARY KEY,
colb INT UNIQUE,
colc CHAR(1000));

Populate hugetable with 500000 rows (large number of rows).

If you issue query below, you will get hash match operator :

SELECT colb, colc FROM hugetable
UNION
SELECT colb, colc FROM uniquetable

The execution plan is :

  |--Hash Match(Union)

       |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[hugetable].[PK__hugetable__4BAC3F29]))

       |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[uniquetable].[PK__uniquetable__5070F446]))

With characteristic of your table design, query statement, and row distribution, you will predict what optimizer will likely choose between these operators to produce feasible execution plan.

Filed under:
Dynamic Index Seek Review
17 January 09 05:03 AM | kasim.wirama | with no comments

Dynamic Index Seek Review

By : Kasim Wirama, MCDBA

 

Other execution plan pattern that is noticeable is dynamic index seek. A plan is called dynamic index seek when the optimizer doesn’t know variable values at compile time, and the plan generates index seek operator and merge interval operator. Merge interval operator? It sounds like uncommon operator. I would like to describe more about the operator with following query example.

See the execution plan for this query at Northwind database :

SELECT ordered
FROM orders
WHERE shippostalcode IN ('02389-673','01307')

Its execution plan is :

SELECT orderid  FROM orders  WHERE shippostalcode IN ('02389-673','01307')
  |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShipPostalCode]), SEEK:([Northwind].[dbo].[Orders].[ShipPostalCode]=N'01307' OR [Northwind].[dbo].[Orders].[ShipPostalCode]=N'02389-673') ORDERED FORWARD)

Because the index is highly selective, optimizer choose index seek. Now compare execution on the similar query but parameterized one below :

DECLARE @var1 nvarchar(20), @var2 nvarchar(20);
SELECT @var1='02389-673', @var2='01307';
SELECT ordered
FROM orders
WHERE shippostalcode IN (@var1,@var2);

The execution plan is :

SELECT orderid  FROM orders  WHERE shippostalcode IN (@var1,@var2);

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1009], [Expr1010], [Expr1011]))

       |--Merge Interval

       |    |--Sort(TOP 2, ORDER BY:([Expr1012] DESC, [Expr1013] ASC, [Expr1009] ASC, [Expr1014] DESC))

       |         |--Compute Scalar(DEFINE:([Expr1012]=((4)&[Expr1011]) = (4) AND NULL = [Expr1009], [Expr1013]=(4)&[Expr1011], [Expr1014]=(16)&[Expr1011]))

       |              |--Concatenation

       |                   |--Compute Scalar(DEFINE:([@var2]=[@var2], [@var2]=[@var2], [Expr1003]=(62)))

       |                   |    |--Constant Scan

       |                   |--Compute Scalar(DEFINE:([@var1]=[@var1], [@var1]=[@var1], [Expr1006]=(62)))

       |                        |--Constant Scan

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShipPostalCode]), SEEK:([Northwind].[dbo].[Orders].[ShipPostalCode] > [Expr1009] AND [Northwind].[dbo].[Orders].[ShipPostalCode] < [Expr1010]) ORDERED FORWARD)

The second execution plan is quite different with the first one. Recall the first passage why optimizer generates complex execution plan. Merge interval is to remove duplication between 2 values, but actually optimizer sees the 2 values as 2 ranges. Overlapping is possible between 2 ranges, and that’s the task of Merge Interval operator to make sure no duplication after collapsing 2 ranges. But the question probably arises from your thought, which is wondering why SORT operator doesn’t handle such a duplication. In dynamic index seek, sort operator which is executed right before merge interval just make sure a value is adjacent to its neighbor.

Second execution plan above is similar to execution plan of the following range query:

DECLARE @orderdate1 datetime, @orderdate2 datetime;

SELECT @orderdate1 = '19970101', @orderdate2 = '19970104';

SELECT ordered
FROM orders
WHERE orderdate BETWEEN @orderdate1 AND dateadd(DAY, 7, @orderdate1)
OR orderdate BETWEEN @orderdate2 AND dateadd(DAY,7,@orderdate2);

Merge interval operator is clearly shown at the above range query.

Filed under:
Index Union Analysis
17 January 09 05:02 AM | kasim.wirama | with no comments

Index Union Analysis

By : Kasim Wirama, MCDBA

 

When there are 2 or more where criteria which are joined with OR, each columns is different among others, they are indexed and search selectivity is high, optimizer could consider multiple indexes and join the end result with concatenation or merge join concatenation operator. Such a pattern of execution plan is called index union. Let’s explore what possibilities of execution plan generated from index union of a query statement.

 

Here is the query to get any ordered which is shipped or created between 1 January 2007 and 7 January 2007.

 

SELECT orderid
FROM orders
WHERE shippeddate BETWEEN '19970101' AND '19970107'
OR orderdate BETWEEN '19970101' AND '19970107';

 

Because there is index for shippeddate and orderdate column respectively, optimizer decides to use each indexes to get result from shippeddate BETWEEN '19970101' AND '19970107' and orderdate BETWEEN '19970101' AND '19970107'. Execution plan for the query is :

 

  |--Sort(DISTINCT ORDER BY:([Northwind].[dbo].[Orders].[OrderID] ASC))

       |--Concatenation

            |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShippedDate]), SEEK:([Northwind].[dbo].[Orders].[ShippedDate] >= 1997-01-01 00:00:00.000' AND [Northwind].[dbo].[Orders].[ShippedDate] <= 1997-01-07 00:00:00.000') ORDERED FORWARD)

            |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]), SEEK:([Northwind].[dbo].[Orders].[OrderDate] >= 1997-01-01 00:00:00.000' AND [Northwind].[dbo].[Orders].[OrderDate] <= 1997-01-07 00:00:00.000') ORDERED FORWARD)

 

Notice that result from 2 index seeks is joined with concatenation operator. Because the result might be duplicate, optimizer implement SORT DISTINCT operator to remove duplication. The query above could be rewritten as :

 

SELECT ordered
FROM orders
WHERE shippeddate BETWEEN '19970101' AND '19970107'
UNION
SELECT ordered
FROM orders
WHERE orderdate BETWEEN '19970101' AND '19970107';

 

Query rewrite above is valid because orderid is primary key which is also considered as unique key.

 

Now consider another query below :

 

SELECT orderid
FROM orders
WHERE shippeddate = '19970101' OR orderdate = '19970101';

 

Query plan for the query is :

 

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Northwind].[dbo].[Orders].[OrderID]))

       |--Stream Aggregate(GROUP BY:([Northwind].[dbo].[Orders].[OrderID]))

       |    |--Merge Join(Concatenation)

       |         |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShippedDate]), SEEK:([Northwind].[dbo].[Orders].[ShippedDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

       |         |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]), SEEK:([Northwind].[dbo].[Orders].[OrderDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

       |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]), SEEK:([Northwind].[dbo].[Orders].[OrderID]=[Northwind].[dbo].[Orders].[OrderID]) LOOKUP ORDERED FORWARD)

 

Notice that stream aggregate and merge join concatenation appears instead of concatenation and SORT DISTINCT operator. Why does optimizer choose merge join and stream aggregate? It’s because that the query uses equality instead of inequality. Recall that merge operator will probably appear when a predicate contains at least one equality operator. From efficiency of resource utilization perspective, merge join and stream aggregate is better than sort operator because they do not require memory rather than SORT operator that might introduce data spill out to tempdb database when memory is under pressure.

 

The query above could be written as :

 

SELECT orderid
FROM orders
WHERE shippeddate = '19970101'
UNION
SELECT orderid
FROM orders
WHERE orderdate = '19970101';

 

Query plan for this query is :

   |--Merge Join(Union)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[ShippedDate]), SEEK:([Northwind].[dbo].[Orders].[ShippedDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[OrderDate]), SEEK:([Northwind].[dbo].[Orders].[OrderDate]='1997-01-01 00:00:00.000') ORDERED FORWARD)

 

Merge join (Union) operator replaces Stream Aggregate and Merge join (concatenation).

Query rewriting is always generated from query plan analysis, and the process is fun and rewarding.

Filed under:
Peer-to-Peer Replication in SQL Server 2008
17 January 09 05:02 AM | kasim.wirama | with no comments

Peer-to-Peer Replication in SQL Server 2008

By : Kasim Wirama, MCDBA

 

One of interesting enhancements made in SQL Server 2008 February CTP is in area of replication, peer to peer replication particularly. Now with SQL Server 2008 February CTP, a SQL Server 2008 new node could join to existing peer-to-peer existing topology without having to stop database activity on the topology, and also visual designer is introduced in this upcoming SQL Server version. With visual designer, I just only to visually design, implement and review peer-to-peer easily especially when I have more than 3 nodes to be implemented, which are more complex peer-to-peer topology.

 

Peer-to-peer replication is located under transactional replication. By default, transactional replication turns off peer-to-peer replication. After you have set up publication with transactional replication, you need to open the publication property, and set value to true for enable peer-to-peer replication. Once you set it to true, you couldn’t revert back to transactional replication (by setting the value from true to false), the only way is to re-create the publication. Not like with other type of replication, peer-to-peer replication doesn’t need snapshot publication, uniqueidentifier column and trigger created on underlying table of an article. It doesn’t support re-initialization and timestamp type column because timestamp column is generated automatically by SQL Server so it couldn’t replicate to its peer. Another restriction of peer-to-peer replication is that it doesn’t support horizontal and vertical filtering. Besides that, peer-to-peer replication is not designed to handle conflict resolution even though it is able to able to detect conflict by turning on conflict detection. When conflict happens, data would not replicate to its peer node, thus manual intervention is needed to manually resolve conflict resolution. If you would like to automate conflict resolution with possible sophisticated scenario, merge transactional replication is the best replication option rather than peer-to-peer replication.

 

The least performance impact is belonged to peer-to-peer replication, and for high availability, better fault tolerance, and read performance enhancement in OLTP scenario, it is one of the viable options besides log shipping, and database mirroring.

Filed under:
Rebind and Rewind on Non Equal Correlated Subquery
17 January 09 05:01 AM | kasim.wirama | with no comments

Rebind and Rewind on Non Equal Correlated Subquery

By : Kasim Wirama, MCDBA

This posting, I would show other execution plan operator that has caching capability regarding to non correlated subquery that involved non equality relationship between inner and outer query.

What is the behaviour difference for optimizer to generate plan between correlated and noncorrelated subquery? In noncorrelated subquery, inner query doesn’t depend on outer query thus optimizer compute inner query first then outer query, whereas in correlated subquery, optimizer compute every row in outer query then to inner query because inner query depends on value of outer query.

Example of nonequality-correlated subquery is :

SELECT

orderid

FROM orders AS o1

WHERE freight <

(

            SELECT avg(freight)

            FROM orders AS o2

            WHERE o2.orderdate < o1.orderdate

);

Execution plan for this query is :

  |--Filter(WHERE:([Northwind].[dbo].[Orders].[Freight] as [o1].[Freight]<[Expr1004]))

       |--Nested Loops(Inner Join, OUTER REFERENCES:([o1].[OrderDate]))

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o1]))

            |--Index Spool (SEEK:([o1].[OrderDate]=[Northwind].[dbo].[Orders]. [OrderDate] as [o1].[OrderDate]))

                 |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012]/CONVERT_IMPLICIT(money,[Expr1011],0) END))

                      |--Stream Aggregate (DEFINE:([Expr1011]=COUNT_BIG ([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight]), [Expr1012]=SUM([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight])))

                           |--Index Spool(SEEK:([o2].[OrderDate] < [Northwind].[dbo]. [Orders].[OrderDate] as [o1].[OrderDate]))

                                |--Clustered Index Scan (OBJECT:([Northwind].[dbo]. [Orders].[PK_Orders] AS [o2]))

  

First index spool is lazy spool and second one is eager spool (look at logicalOp column). For first access into inner query, optimizer scan orders table then create eager index spool (second spool). Eager index spool is also called as index on the fly spool because it eagerly build temporary index on entire outer input on first access. On subsequent access from outer query, average aggregation is handled by computer scalar and stream aggregate operator and then it is cached by first index spool (lazy index spool). It is called lazy index spool, because it cache result from immediate previous execution. When next input row has the same orderdate, then the result is taken from lazy index spool only. This behaviour is called rewind. But if next input row has different orderdate, then the result is taken from accumulated value in lazy index spool or if it is not in lazy index spool, the operator request input from stream aggregate and computer scalar operator. Such behaviour is called rebind because it rebinds new value into lazy index spool.

If you check number of rows of rebind and rewind at property of lazy index spool, their total will be equal to total number of rows in orders table.

Optimizer tends to create index spool operator when there are many duplicate value on orderdate column and there are a lot of input rows from outer query. So if you reduce the number of input column, index spool operator will be disappear. Look at this example and its execution plan below :

SELECT

orderid

FROM orders AS o1

WHERE freight <

(

            SELECT avg(freight)

            FROM orders AS o2

            WHERE o2.orderdate < o1.orderdate

)

AND shipcity = 'Caracas';

Execution plan of the query is :

  |--Filter(WHERE:([Northwind].[dbo].[Orders].[Freight] as [o1].[Freight]<[Expr1004]))

       |--Nested Loops(Inner Join, OUTER REFERENCES:([o1].[OrderDate]))

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o1]), WHERE:([Northwind].[dbo].[Orders].[ShipCity] as [o1].[ShipCity]=N'Caracas'))

            |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012]/CONVERT_IMPLICIT(money,[Expr1011],0) END))

                 |--Stream Aggregate(DEFINE:([Expr1011]=COUNT_BIG([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight]), [Expr1012]=SUM([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight])))

                      |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o2]), WHERE:([Northwind].[dbo].[Orders].[OrderDate] as [o2].[OrderDate]<[Northwind].[dbo].[Orders].[OrderDate] as [o1].[OrderDate]))

Observe that there is no index spool operator.

Filed under:
Execution Plan for Noncorrelated Query
17 January 09 05:01 AM | kasim.wirama | with no comments

Execution Plan for Noncorrelated Query

By : Kasim Wirama, MCDBA

 

This posting, I would like to show various possible execution plan for non correlated query and possible action to rewrite query for better execution plan. Let’s run the query at Northwind database that displays list of orderid whose freight cost is under average freight cost of all orders

 

SELECT orderid FROM dbo.orders

WHERE freight < (SELECT AVG(freight) FROM dbo.orders)

 

Execution plan for the non correlated and scalar query is :

  |--Nested Loops(Inner Join, WHERE:([Northwind].[dbo].[Orders].[Freight]<[Expr1006]))

       |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1013]=(0)

       |    |--Stream Aggregate(DEFINE:([Expr1013]=COUNT_BIG([|        

|--Clustered Index Scan (OBJECT:([Northwind].[dbo].

       |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders]))

 

The above execution plan calculate average value of freight and then verify all records in orders database which has freight value below average value of freight. It’s straightforward description from the execution plan. Now let’s look into another noncorrelated subquery

 

SELECT

            orderid

FROM orders

WHERE customerid =

(

            SELECT customerid

            FROM customers

            WHERE contactname = 'Antonio Moreno'

);

 

Execution plan for the query is :

 

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008]))

       |--Assert(WHERE:(CASE WHEN [Expr1007]>(1) THEN (0)

       |    |--Stream Aggregate(DEFINE:([Expr1007]=Count(*), [Expr1008]=ANY([Northwind].[dbo].[Customers].[CustomerID])))

       |         |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Customer WHERE:([Northwind].[dbo].[Customers].[ContactName]=N'Antonio Moreno'))

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID]),   SEEK:([Northwind].[dbo].[Orders].[CustomerID]=[Expr1008]) ORDERED FORWARD)

 

The query above could probably returns error because inner query might return non scalar value (more than 1 row) and contactname is not unique, so probably 1 contactname serves several customerid. The way optimizer checking whether inner query returns more than 1 rows is by adding ASSERT operator. If it is true, then the query raises error.

 

Another noticeable operator is ANY operator. It is internal operator that counts the number of record for each customerID. But because Stream Aggregate expects customerid to be aggregated or to be put in GROUP BY clause, so it requires ANY as internal aggregate operator, so the query :

 

SELECT COUNT(*), customerid

FROM customers

WHERE contactname = 'antonio moreno'

 

is not parsed by optimizer except you define GROUP BY for customerid.

 

As long as, you are aware that contactname could have more than one customerid, the following query is valid and it is considered valid to previous non correlated query :

 

SELECT

            orderid

FROM orders JOIN customers

ON orders.customerid = customers.customerid

WHERE customers.contactname = 'Antonio Moreno'

 

Execution plan is :

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Northwind].[dbo].[Customers].[CustomerID]))

       |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Customers].[PK_Customers]), WHERE:([Northwind].[dbo].[Customers].[ContactName]=N'Antonio Moreno'))

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID]), SEEK:([Northwind].[dbo].[Orders].[CustomerID]=[Northwind].[dbo].[Customers].[CustomerID]) ORDERED FORWARD)

 

This execution plan is simpler than previous one.

 

Now create unique index on contactname column at customers table and execute previous non correlated subquery, the execution plan is :

 

  |--Nested Loops(Inner Join, OUTER REFERENCES:([Northwind].[dbo].[Customers].[CustomerID]))

       |--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[ix_1]), SEEK:([Northwind].[dbo].[Customers].[ContactName]=N'Antonio Moreno') ORDERED FORWARD)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID]), SEEK:([Northwind].[dbo].[Orders].[CustomerID]=[Northwind].[dbo].[Customers].[CustomerID]) ORDERED FORWARD)

 

For this case, proper index selection will remove some operator and execution plan will be simpler and more efficient because I tell optimizer that there is only one contactname for each customerid, so optimizer decides to choose index seek to get customer record with contactName = ‘Antonio Moreno’ and optimizer stops searching to other customerid with same contactName.

Filed under:
Skip Stored Procedure Recompilation
17 January 09 05:00 AM | kasim.wirama | with no comments

Skip Stored Procedure Recompilation

By : Kasim Wirama, MCDBA

 

Stored procedure (and also function) is preferable to be implemented in SQL Server because its compilation code is cached, so next execution will take the compilation form in cache. Imagine that you have a complex stored procedure and you found that it recompile for each time it is called through SP:Recompile and SQL:StmtRecompile. In this case SQL Server have some options that you might to consider to avoid the problematic stored procedure from being recompiled.

 

If you want to make your query got cached in cache store, there is some ways to achieve it :

  1. make your tables which are involved in query to read only.
  2. turn off automatic update statistics, it is applicable when there is little database update activity that impacted on index update. But it is recommended that you turn on auto update statistics.
  3. if your column is unique, make it unique constraint or unique index and your select query use the column in WHERE clause.
  4. if your column is not unique, you can consider use query hints : Option (Keepfixed Plan), it is available on SQL Server 2005.
 

For overall recompilation monitoring for SQL Server, performance monitor give 2 relevant counters under performance object : MSSQLServer : SQL Statisctics, i.e. SQL Re-compilation/sec and SQL:Batch Requests/sec. If ratio between them is high, your SQL Server has recompilation issue that might degrade database performance.

Filed under:
Degree of Parallelism, Max Degree of Parallelism, Affinity Mask and Thread Allocation
17 January 09 04:59 AM | kasim.wirama | with no comments

Degree of Parallelism, Max Degree of Parallelism,

Affinity Mask and Thread Allocation

By : Kasim Wirama, MCDBA

 

You can view parallelism in your execution plan by viewing either graphically or textually. Optimizer decides between parallelism and serial plan on cost-based. Parallelism itself is determined by SQL Server advanced configuration setting and number of processor of SQL Server resides. One of advanced configuration item relating to parallelism is Max Degree of Parallelism and Affinity Mask.

 

If you don’t specify MAXDOP N query hints, by default max degree of parallelism is equal to number of processor that is allowed to give the thread to run corresponding parallelism operator. The number of allowed processor is determined through Affinity Mask in advanced configuration setting. If you specify MAXDOP N, it will used N DOP, but if you put MAXDOP 0, the max DOP will be equal to number of allowed processor in Affinity Mask.

 

DOP reflects number of thread that will execute on an operator. Even though you have max DOP more than allowed processor, it doesn’t mean that you can have all thread on the max DOP. It depends on the available thread which is specified in Max Worker Thread (found in advanced configuration setting). Max Worker Thread itself depends on number of processor and kinds of platform (32 bit or 64 bit). You can view how much query request takes number of thread through querying to Sys.dm_os_tasks. The information difference between sys.dm_os_tasks and parallelism in execution plan is that sys.dm_os_tasks gives information all threads in a query while parallelism in execution plan gives information about how much thread taken by each operators. Generally parallelism among operators could be executed at the same time, but exception applies here is that, if you found that there is blocking operator(s), the execution between operator before and after blocking operator is not done at the same time, so those operators can share resources such as memory and threads.

Filed under:
Decorrelating Correlated Subquery
17 January 09 04:59 AM | kasim.wirama | with no comments

Decorrelating Correlated Subquery

By : Kasim Wirama, MCDBA

I have written posting about correlated and noncorrelated subquery. Now let’s take a look correlated subquery but optimizer generated noncorrelated query-like execution plan.

Here is query to display order for each customer who lives in Madrid

select orderid

from orders as o

where exists

     (

          select *

          from Customers as c

          where c.CustomerID = o.CustomerID

          and c.City = 'Madrid'        

     )

Execution plan for the correlated subquery is :

  |--Nested Loops(Inner Join, OUTER REFERENCES:(Coffee.[CustomerID]))

       |--Index Seek(OBJECT:([Northwind].[dbo].[Customers].[City] AS Coffee), SEEK:(Coffee.[City]=N'Madrid') ORDERED FORWARD)

       |--Index Seek(OBJECT:([Northwind].[dbo].[Orders].[CustomerID] AS Time), SEEK:(Time.[CustomerID]=[Northwind].[dbo].[Customers].[CustomerID] as Coffee.[CustomerID]) ORDERED FORWARD)

The execution above is same as execution plan for the following query :

select orderid

from orders as o

join Customers as c

on o.CustomerID = c.CustomerID

where c.City = 'Madrid'

Second example here is query returning freight and its average from all orders within same customer.

select o1.OrderID, o1.Freight,

     (select AVG(freight) from orders as o2

      where o2.CustomerID = o1.CustomerID

     ) as AverageFreight

from dbo.orders as o1

Its execution plan is :

  |--Compute Scalar(DEFINE:([Expr1006]=[Expr1004]))

       |--Hash Match(Right Outer Join, HASH:([o2].[CustomerID])=([o1].[CustomerID]), RESIDUAL:([Northwind].[dbo].[Orders].[CustomerID] as [o2].[CustomerID]=[Northwind].[dbo].[Orders].[CustomerID] as [o1].[CustomerID]))

            |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1013]=(0) THEN NULL ELSE [Expr1014]/CONVERT_IMPLICIT(money,[Expr1013],0) END))

            |    |--Stream Aggregate(GROUP BY:([o2].[CustomerID]) DEFINE:([Expr1013]=COUNT_BIG([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight]), [Expr1014]=SUM([Northwind].[dbo].[Orders].[Freight] as [o2].[Freight])))

            |         |--Sort(ORDER BY:([o2].[CustomerID] ASC))

            |              |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o2]))

            |--Clustered Index Scan(OBJECT:([Northwind].[dbo].[Orders].[PK_Orders] AS [o1]))

The item immediately below Hash Match operator is scanning all orders and sort by customerid before they are computed for its average freight with grouping by CustomerId. To get freight column, optimizer decides to do clustered index scan, if you remove freight column, optimizer will choose index scan rather than clustered index scan, which is more efficient. There is possible that one customer may have more than one orders. For example, a customer has 5 orders. In this case optimizer will calculate average freight once for the customer, not 5 times for the customer. It is reflected with calculation of aggregation first before matched with the inner execution of clustered index scan of alias O1. The more orders for each customers, the more efficient execution plan, because it just calculates average value once for each customers.

Most cases, noncorrelated subquery is more efficient than correlated subquery. If you have options to change correlated subquery into non correlated subquery, it is better to do so.

Filed under:
More Posts Next page »