Showing posts with label Database Testing. Show all posts
Showing posts with label Database Testing. Show all posts

How to test dynamic reports - part 2

There is more to say about testing dynamic reports.

In part 1, the focus was on describing how manual testing is done.

But sometimes, more than manual testing is possible, is it not?

Let's have another look at the same report dashboard.

Mousing over a part of a sub-report will display a popup with various info:


As discussed in part 1, you will need a stored procedure or more (with parameters) to verify all information from the popup.

But is it possible to automate this testing?

A closer look at the report shows that the report data can be exported in a cross tab format (CSV):




The export file is very simple (the highlighting is mine):



What about creating a Java app that 

- reads the content of the CSV file and stores it in some sort of a list

- runs the stored procedures for all elements of the report dashboard and stores the results in another list

- compares the results of the 2 lists (one from CSV, the second from stored procedures)

Is there a better way for regression testing of dynamic reports?

How to test dynamic reports

Reports testing is something that I did rarely in my career.

I validated simple reports here and there but the work was always short and easy.

The reports were simple, displayed data in a tabular format, with some sorting, filtering and grouping.

The validation consisted in checking that

- the report displays the correct data; I would do this by cross checking the report data with the data from an existing app

- the report's details are correct: filtering, sorting, grouping


A recent reports testing assignment was much different.


The reports were far from simple in this case.

Look, for example, at the Retail Site Openings report from this page:





This is the report content:


This is not a simple report but a dashboard with 4 sub-reports.

Play around with the dashboard a little bit and you will see that clicking parts of every sub-report updates the content in the remaining sub-reports.

The same happens when changing the year or selecting categories.

Even more, multiple selections are possible in the bottom tree map reports.

How do you test this type of report?



Before discussing how to test the report, lets see how the report is built.



It is made with Tableau Desktop and published on a website.

Its data comes from an OLTP database populated through the company's main system.

The OLTP database is first transformed into a simpler database through ETL packages.

The simpler database is then used as the source of an OLAP cube.



The report uses the OLAP cube's data. 

The cube does not include the raw OLTP data but data aggregations (sums, counts, etc).



How do you test this type of report?


It is difficult to cross check the report data using the main OLTP system (the main OLTP system has raw data while the OLAP cube has data aggregations).

How else can the reports data be validated?

The answer is through SQL queries executed on the OLTP database.

If the SQL queries' results match the reports data, this means that:

- the relational warehouse was created correctly from the OLTP database

- the OLAP cube was created correctly from the relational warehouse

- the dashboards were created correctly from the OLAP cube


Creating the queries is not complicated as soon as the database schema is understood.


But with so many reports in a dashboard, interactions between the reports and caused by filters, the number of SQL queries can be high.


Running the SQL queries is not easy since the parameters values have to be changed every time and the SQL queries are difficult to maintain.


To solve all these problems (not easy to run, difficult to maintain, high number of queries), the SQL queries can be  changed into stored procedures with multiple parameters.


The stored procedures resolve all existing issues:

- simplify the execution since only the stored procedure header (name and parameter values) is needed.

- the maintenance is easy as the code is stored in one place.

- one stored procedure replaces a few SQL queries.


The stored procedures come with performance problems because of the execution plans.

In some situations, running a SQL script returns results in 10 seconds but running the same script as a stored procedure takes 30 mins.


What can be done in this case?


I tried multiple things like dynamic where clauses, parameter sniffing, etc.

Nothing helped.


The only idea that worked was to build the SQL script in each stored procedure based on the values of the parameters and then execute it.

See below a sample stored procedure and the SQL script used for executing it.

ALTER PROCEDURE [dbo].[getResultCountForReport1] 

@Code int
, @Layer varchar(50)
, @SubLayer varchar(50)
, @Filter1 varchar(50)
, @Filter2 varchar(3)
, @Filter3 as integer
, @Filter4 as integer
, @ResultValue as integer OUTPUT

AS
BEGIN

SET NOCOUNT ON;
   
DECLARE @LayerCode integer, @SubLayerCode integer       

IF (@Layer <> 'All')
BEGIN

SET @LayerCode = dbo.getLayerCode(@Code, @Layer)

SET @SubLayerCode = dbo.getSubLayerCode(@Code, @Layer, @SubLayer)

END

DECLARE @SqlQuery nvarchar(4000)
       
SET @SqlQuery = 

' SELECT
SUM(CASE 
WHEN xxxxxx = ''abcdef'' THEN 1 
WHEN xxxxxx = ''ghijkl''   THEN 2 
ELSE 0 
 END)    

 FROM xxxxxxxx AS www

 LEFT JOIN xxxxx AS www   ON xxxxxx = xxxxxxx
 LEFT JOIN xxxxx    AS www   ON xxxxxx = xxxxxxx
 LEFT JOIN xxxxx AS www   ON xxxxxx = xxxxxxx
 LEFT JOIN xxxxx AS www   ON xxxxxx = xxxxxxx
 LEFT join xxxxx AS www ON xxxxxx = xxxxxxx

 WHERE xxxxxxxxxx = ''AAA'' AND  '

IF @Code <> 0
BEGIN

IF @Code = 123      
BEGIN

SET @SqlQuery = @SqlQuery + ' add WHERE clause '
 
IF (@Layer <> 'All')
SET @SqlQuery = @SqlQuery + ' add WHERE clause AND '

END

ELSE

IF @Code = 456
BEGIN

 SET @SqlQuery = @SqlQuery + ' add WHERE clause AND '      
 
 IF (@Layer <> 'All')

SET @SqlQuery = @SqlQuery + ' add WHERE clause AND ' 
END

END
                
SET @SqlQuery = @SqlQuery +

'    mmmmmmmmmmmmm = ''11111'' 
AND mmmmmmmmmmmmm IN (1, 2, 3)        
AND mmmmmmmmmmmmm = 1  
AND mmmmmmmmmmmmm IS NOT NULL  
AND xxxxxxxxxx.Date between @Filter3 and @Filter4 '



IF @Filter1 != 'All'
SET @SqlQuery = @SqlQuery + ' AND xxxxxxx = @Filter1  '


IF @Filter2 != 'All' 
SET @SqlQuery = @SqlQuery + ' AND xxxxxxx = @Filter2 '



DECLARE @ResultTable TABLE (countvalue int)

INSERT INTO @ResultTable (countvalue)
EXEC sp_executesql @SqlQuery, 
                               N'@Code int, 
                                  @Layer varchar(50), 
                                  @SubLayer varchar(50), 
          @Filter1 varchar(50), 
                                  @Filter2 varchar(3), 
                                  @Filter3 integer, 
                                  @Filter4 integer',
  @Code, 
                                  @Layer, 
                                  @SubLayer, 
                                  @Filter1, 
                                  @Filter2, 
                                  @Filter3, 
                                  @Filter4

SET @ResultValue =  (SELECT countvalue FROM @ResultTable)
 
END

------------------------------------------------------


DECLARE @Code int = 11

DECLARE @Layer varchar(50)        = 'layer1', 
                  @SubLayer varchar(50)  = 'layer2'

DECLARE @Filter1 varchar(50) = 'value1', 
                  @Filter2 varchar(3)   = 'value2', 
                  @Filter3 integer        = 33,
                  @Filter4 integer        = 44

DECLARE @ResultValue integer


EXEC getResultCountForReport1  @Code, 
                                                        @Layer, 
                                                        @SubLayer, 
                                                        @Filter1,  
                                                        @Filter2,  
                                                        @Filter3, 
                                                        @Filter4, 
                                                        @ResultValue OUTPUT


SELECT @ResultValue


So, to summarize, for effective reports testing, the following are needed:

- better than average SQL scripting knowledge

- good understanding of the OLTP database schema

- SQL scripts for verifying each element of the report

- change the SQL scripts into stored procedures

- build the SQL scripts in the stored procedure and then execute it

Dynamic WHERE clauses for SQL queries

Let's assume that the following example query is needed for getting the number or orders and the sum of order values from some tables:

select count(orders.orderid), sum(orders.value)
from orders
join customers on orders.customerid = customers.customerid
join stores on orders.storeid = stores.storeid
join cities on customers.cityid = cities.cityid
join products on orders.productid = products.productid
where orders.value > 0
and stores.type = 'wholesale'
and customers.id is not null
and cities.name is not null
and products.type = 'fruit'
and orders.value > 1000000

The query uses only data from wholesale stores, fruit products and orders with value greater than 1000000.

Since it is possible that this query will run for different parameter values, let's change it in a stored procedure:

create procedure getInfo
@StoreType as varchar(50),
@ProductType as varchar(50)

as

begin

select count(*), sum(value)
from orders
join customers on orders.customerid = customers.customerid
join stores on orders.storeid = stores.storeid
join cities on customers.cityid = cities.cityid
join products on orders.productid = products.productid
where orders.value > 0
and stores.type = @StoreType
and customers.id is not null
and cities.name is not null
and products.type = @ProductType
and orders.value > 1000000

end

Having the stored procedure, it is very easy to run it

declare @StoreType as varchar(50) = 'wholesale'
declare @ProductType as varchar(50) = 'fruit'

exec getInfo @StoreType, @ProductType



Let's assume now that for a certain product type (vegetable), we need a very similar query that has the operator in the last where clause different (< instead of >):

select count(*), sum(value)
from orders
join customers on orders.customerid = customers.customerid
join stores on orders.storeid = stores.storeid
join cities on customers.cityid = cities.cityid
join products on orders.productid = products.productid
where orders.value > 0
and stores.type = 'wholesale'
and customers.id is not null
and cities.name is not null
and products.type = 'vegetable'
and orders.value < 1000000


Since it is not a good idea to create another stored procedure, lets see how we can change the existing one:

alter procedure getInfo
@StoreType as varchar(50),
@ProductType as varchar(50)

as

begin

if @ProductType = 'fruit'

select count(*), sum(value)
from orders
join customers on orders.customerid = customers.customerid
join stores on orders.storeid = stores.storeid
join cities on customers.cityid = cities.cityid
join products on orders.productid = products.productid
where orders.value > 0
and stores.type = @StoreType
and customers.id is not null
and cities.name is not null
and products.type = @ProductType
and orders.value > 1000000

else
if @ProductType = 'vegetable'

select count(*), sum(value)
from orders
join customers on orders.customerid = customers.customerid
join stores on orders.storeid = stores.storeid
join cities on customers.cityid = cities.cityid
join products on orders.productid = products.productid
where orders.value > 0
and stores.type = @StoreType
and customers.id is not null
and cities.name is not null
and products.type = @ProductType
and orders.value < 1000000

else

return


end


This works but it is not very good.

What happens if we will have yet another special case as follows?

select count(*), sum(value)
from orders
join customers on orders.customerid = customers.customerid
join stores on orders.storeid = stores.storeid
join cities on customers.cityid = cities.cityid
join products on orders.productid = products.productid
where orders.value > 0
and stores.type = 'wholesale'
and customers.id is not null
and cities.name is not null
and products.type = 'herb'
and orders.value < 1000000
and orders.paid > 1000000

The stored procedure will become very complicated with lots of conditional statements and almost identical queries.



What is needed is a way of selecting the correct clause depending on the value of the Product Type parameter.

Please see below the final version of the stored procedure:


alter procedure getInfo
@StoreType as varchar(50),
@ProductType as varchar(50)

as

begin

select count(*), sum(value)
from orders
join customers on orders.customerid = customers.customerid
join stores on orders.storeid = stores.storeid
join cities on customers.cityid = cities.cityid
join products on orders.productid = products.productid
where orders.value > 0
and stores.type = @StoreType
and customers.id is not null
and cities.name is not null
and products.type = @ProductType
and
(
(isnull(@ProductType, 0) = 'fruit' and orders.value > 1000000)
OR
(isnull(@ProductType, 0) = 'vegetable' and orders.value < 1000000)
OR
(isnull(@ProductType, 0) = 'herb' and orders.value < 1000000 and orders.paid > 1000000)
)

end

Using the new technique, the stored procedure remains compact and efficient.

Why does this work?

This new clause uses the rules of Boolean algebra.








Lets see how the condition evaluates for @ProductType = 'fruit':

(ISNULL('fruit', 0) = 'fruit' AND orders.value > 1000000)
OR
(ISNULL('fruit', 0) = 'vegetable' AND orders.value < 1000000)
OR
(ISNULL('fruit', 0) = 'herb' AND orders.value < 1000000 and orders.paid > 1000000)

becomes

'fruit' = 'fruit' AND orders.value > 1000000
OR
'fruit' = 'vegetable' AND orders.value < 1000000)
OR
'fruit' = 'herb' AND orders.value < 1000000 and orders.paid > 1000000)

becomes

true AND orders.value > 1000000
OR
false AND orders.value < 1000000)
OR
false AND orders.value < 1000000 and orders.paid > 1000000)

becomes

true AND orders.value > 1000000
OR
false
 OR
false

becomes

true AND orders.value > 1000000 

becomes

orders.value > 1000000

The condition is evaluated similarly for the other values of the parameters.

Why are stored procedures sometimes so slow compared with plain SQL statements?

I have been doing lately lots of database testing by checking if data displayed in business intelligence reports is correct.

The reports data goes from the original database to an intermediate database (through some processing), then a cube is generated and finally, the reports.

The testing consists in writing SQL queries against the original database and comparing the query results with the BI reports data.

So far so good.

Some of the queries were very similar so I tried to re-use the code by creating stored procedures that have parameters.

For example, a simple SELECT statement like this

SELECT table1.field1, table2.field2
FROM table1
JOIN table2 ON table1.key = table2.key
WHERE table1.filter1 = 'value1' and table2.filter2 = 'value2'

was converted in the following stored procedure:

CREATE PROCEDURE doSomething

@Filter1Value as varchar(10)
, @Filter2Value as varchar(10)

AS
BEGIN

SET NOCOUNT ON;

        SELECT table1.field1, table2.field2
        FROM table1
        JOIN table2 ON table1.key = table2.key
        WHERE table1.filter1 = @Filter1Value and table2.filter2 = @Filter2Value

END

The stored procedure can then be executed as follows instead of running directly the SQL statement:

EXEC doSomething 'value1', 'value2'


So easy, right?

The problem is that the performance of the stored procedure was terrible compared with the performance of the SQL statement.

In some cases, the results of the SQL statement would be generated in 10 seconds and I would still wait for the results of the stored procedure after 15 minutes!

The SQL code could not be the cause of the performance problem since it is identical.

Why is the performance of the stored procedure so bad then?

The answer is that SQL Server does something called parameter sniffing for the parameters of stored procedures for performance optimization.

How can this be fixed?

See below the same stored procedure with a minor change:

CREATE PROCEDURE doSomething

@Filter1Value as varchar(10)
, @Filter2Value as varchar(10)

AS
BEGIN

       DECLARE @LocalFilter1Value as varchar(10)
       DECLARE @LocalFilter2Value as varchar(10)

       SET @LocalFilter1Value = @Filter1Value
       SET @LocalFilter2Value = @Filter2Value

SET NOCOUNT ON;

        SELECT table1.field1, table2.field2
        FROM table1
        JOIN table2 ON table1.key = table2.key
        WHERE table1.filter1 = @LocalFilter1Value and table2.filter2 = @LocalFilter2Value



END

The change consists in creating local variables in the stored procedure, initialize them with the values of the parameters and then use the local variables in the SQL statement.


As soon as this change is done, the performance of the stored procedure is the same with the performance of the plain SQL statement.

T- SQL checklist

I used this checklist recently as a T-SQL refresher.

The last job where I actually worked with this language was a few years back.

Since I interviewed for a position that requires T-SQL experience, a quick crash "course" was needed.



SELECT DISTINCT = selects distinct values for a column



WHERE clause

   - LIKE operator: for partial matches; the % wildcard is useful in these cases

  - IN operator: the column value is from a list of values

  - BETWEEN operator: the column value is in a range


ORDER BY - sorts the results of the query

GROUP BY - groups the results

HAVING - where clause for groups



|| - operator used for concatenating the values of 2 fields

UNION queries - has the results of both queries, minus the duplicated values

INTERSECT queries - has only the results that exist in both queries

MINUS queries - has only the results from the 1st query that do not exist in the 2nd query

EXISTS - returns true if the subquery has at least 1 result


FUNCTIONS

    COUNT - returns the number of rows that satisfy the WHERE clause

    SUM - sums all values for a column

    AVG - computes the average of all values for a column

    MAX - returns the maximum value for a column

    ADD_MONTHS - add a number of months to a date field

    LAST_DAY - returns the last day of a month

    STRING functions: LTRIM, RTRIM, REPLACE, SUBSTR, LENGTH

    TYPE CONVERSION functions




EMBEDDED QUERIES

JOIN TABLES

DML: INSERT INTO/UPDATE/DELETE




ADVANCED CONCEPTS

      VIEWS

     TRANSACTIONS: BEGIN, SAVEPOINT, ROLLBACK, COMMIT

     CURSORS

     STORED PROCEDURES

     TRIGGERS

      TEMPORARY TABLES