Aggregates across multiple joins

by kalai 2010-01-12 15:21:22

Given a parent table and two child tables, a query which sums values in both child tables, grouping on a parent table column, returns sums that are exactly twice as large as they should be. In this example from the MySQL General Discussion list:

DROP TABLE IF EXISTS packageItem,packageCredit,packageItemTax;
CREATE TABLE packageItem (
packageItemID INT,
packageItemName CHAR(20),
packageItemPrice DECIMAL(10,2)
);
INSERT INTO packageItem VALUES(1,'Delta Hotel',100.00);

CREATE TABLE packageCredit (
packageCreditID INT,
packageCreditItemID INT,
packageItemType CHAR(10),
packageCreditAmount DECIMAL(10,2)
);
INSERT INTO packageCredit VALUES
(1,1,'Deposit',25.00),
(2,1,'Balance',92.00);

CREATE TABLE packageItemTax (
packageItemTaxID INT,
packageItemTaxItemID INT,
packageItemTaxName CHAR(5),
packageItemTaxAmount DECIMAL(10,2)
);
INSERT INTO packageItemTax VALUES
(1,1,'GST',7.00),
(2,1,'HST',10.00);

The query ...

SELECT
packageItemID AS Item,
SUM(packageItemPrice) AS Price,
SUM(packageItemTaxAmount) AS Tax,
SUM(packageCreditAmount) AS Credit
FROM packageItem
LEFT JOIN packageCredit ON packageItemID=packageCreditItemID
LEFT JOIN packageItemTax ON packageItemTaxItemID=packageItemID
GROUP BY packageItemID
ORDER BY packageItemID;

returns this incorrect result ...

+------+--------+-------+--------+
| Item | Price | Tax | Credit |
+------+--------+-------+--------+
| 1 | 400.00 | 34.00 | 234.00 |
+------+--------+-------+--------+

With three child tables, the sums are tripled. Why? The query aggregates across each join. How then to get the correct results? One way is with correlated subqueries:

SELECT
packageItemID AS Item,
SUM(packageItemPrice) AS Price,
( SELECT SUM(c.packageCreditAmount)
FROM packageCredit c
WHERE c.packageCreditItemID = packageItemID
) AS Credit,
( SELECT SUM(t.packageItemTaxAmount)
FROM packageItemTax t
WHERE t.packageItemTaxItemID = packageItemID
) AS Tax
FROM packageItem
GROUP BY packageItemID;
+------+--------+--------+-------+
| Item | Price | Credit | Tax |
+------+--------+--------+-------+
| 1 | 100.00 | 117.00 | 17.00 |
+------+--------+--------+-------+

Moving the subquery logic to the JOIN level may speed up performance considerably:

SELECT
i.packageItemID AS Item,
SUM(i.packageItemPrice) AS Price,
c.Credit,
t.Tax
FROM packageItem i
JOIN (
SELECT packageCreditItemID, SUM(packageCreditAmount) AS Credit
FROM packageCredit
) c ON i.packageItemID = c.packageCreditItemID
JOIN (
SELECT packageItemTaxItemID, SUM(t.packageItemTaxAmount) AS Tax
FROM packageItemTax t
) t ON i.packageItemID = t.packageItemTaxItemID
GROUP BY packageItemID;

Tagged in:

1196
like
0
dislike
0
mail
flag

You must LOGIN to add comments