Correlated Subqueries, ranking, Stardog Arrays

Hi,
I have been playing with an e-commerce sample database I migrated from SQL Server to Stardog.
I have been translating SQL queries to SPARQL and have a few of them already. However, I'm finding difficult to do things like calculating running totals, moving averages, row numbers and ranking, etc.
Before I explain the two subqueries below, here is a snippet of the model:
product -> hasCategory -> category
order -> hasCustomer -> customer
The first query is working fine. However, I need to find an alternative for the second one, where I need to return the two most recent orders for each customers in the outer query. In SQL, one of the solutions would be to use a correlated query (pass customerID to the inner query for each record of the outer query).

I have notice a few issues with SPARQL, logged below:

I see you guys have proposed Arrays:
Extending the Solution | Stardog

Query 1: Select all products that belong to the Seafood category

SELECT
  ?productName
  ?unitPrice
  ?unitsInStock
WHERE { # outer query
  ?product a :product ;
             :productName ?productName ;
             :unitPrice ?unitPrice ;
             :unitsInStock ?unitsInStock ;
             :hasCategory ?category . 
    { # inner query
      SELECT 
        ?category
      WHERE {
        ?category a :category ;
                    :categoryID ?categoryID ;
                    :name "Seafood" .
      }
    }
  }
  ORDER BY
    ?productName

Query 2 (doesn't work): Select the two most recent orders of each customer

SELECT DISTINCT 
  ?customerID 
  ?city
WHERE { # outer query
  ?customer a :customer ;
              :customerID ?customerID ;
              :city ?city ;
              ^:hasCustomer ?order .
  { # inner query
    SELECT
        ?order
    WHERE {
      ?order a :order ;
               :orderID ?orderID ;
               :orderDate ?orderDate ;
               :hasCustomer ?customer .
    }
      ORDER BY
        DESC(?orderDate)
      LIMIT 2
    }
  }
ORDER BY
  ?customerID 
  ?city 
  DESC(?orderDate)

There are a bunch of questions here so I'll just answer a couple of the easy ones and then myself or someone can follow up on the query question.

The issues you posted from the github.com/s3c/sparql-12 aren't really issues but are actually proposals for a sparql 1.2 recommendation.

Stardog has implemented arrays. See the Stardog documentation here

Do you have some sample data you can share for these queries?

Yes, please just load the data in the zip file into Stardog and you will be able to execute the queries.
dumpdataNTRIPLE7.nt.zip (217.6 KB)

I have this same database on a SQL Server, so, i'm just leaving here the things I can do using SQL that I'm finding difficult to implement in SPARQL.

-- Query: Select the 3 most recent orders of each customer
-- For each customer record, go and get the two most recent orders.
-- An INNER JOIN could've been used, however, CROSS APPLY is more efficient when combined with SELECT TOP.

SELECT 
    cst.CustomerID, 
    cst.City,
    cpp.OrderID, 
    cpp.OrderDate   
FROM 
    Customer AS cst
CROSS APPLY (
    SELECT TOP 3 
        ord.OrderID, ord.OrderDate, cst.CustomerID
    FROM 
        [Order] AS ord
    WHERE 
        ord.customerid = cst.customerid -- reference to the outer query (correlated subquery)
    ORDER BY 
        ord.OrderDate DESC
) AS cpp
ORDER BY 
    cst.CustomerID, 
    cst.City,
    cpp.OrderDate DESC

-- Windowed Functions

-- Calculating row numbering and ranking, quantiles, moving averages, and running totals.
-- Reference: OVER Clause (Transact-SQL) - SQL Server | Microsoft Learn

-- Query: Select the 3 most recent orders of each customer
-- This query replaces the previous one by using the more efficient Windowed Function.

SELECT 
    ptt.*
FROM
(
    SELECT
    cst.CustomerID, 
    cst.City,
    ord.OrderID, 
    ord.OrderDate, 
    ROW_NUMBER() OVER(PARTITION BY cst.CustomerID ORDER BY ord.OrderDate DESC) AS [RowNumber]
    FROM Customer AS cst
    INNER JOIN [Order] AS ord 
    ON cst.CustomerID = ord.CustomerID
) ptt
WHERE 
    ptt.[RowNumber] <= 3

-- Query: Top 3 most expensive product in each product category

SELECT 
    ptt.*
FROM
(
    SELECT
        ctg.CategoryName,
        prd.ProductName, 
        prd.UnitPrice,
        ROW_NUMBER() OVER(PARTITION BY ctg.CategoryID ORDER BY prd.UnitPrice DESC) AS [RowNumber]
    FROM 
        Product prd
        INNER JOIN Category ctg 
        ON prd.CategoryID = ctg.CategoryID  
) ptt
WHERE 
    ptt.[RowNumber] <= 3

-- Query: Order total quantity and percentage by product

SELECT 
    ord.OrderID, 
    ord.ProductID, 
    ord.Quantity,  
    SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) AS Total,  
    CAST(1. * ord.Quantity / SUM(ord.Quantity) OVER(PARTITION BY ord.OrderID) * 100 AS DECIMAL(5,2)) AS "PercByProduct"  
FROM 
    OrderDetail ord 
WHERE 
    ord.OrderID IN(10248,10249, 10250); 

GO

A solution could be to run the following query for each customer in a loop and union the result sets in the end.

However, there is no loop functions in SPARQL either. Only way is to code it in the application.

SELECT *
WHERE {
  ?orderDetail :hasProduct ?product ; 
               :belongsToOrder ?order .
  ?order       :hasCustomer ?customer ;
               :orderDate ?orderDate .
  FILTER (?customer = :customer-ALFKI)
}
ORDER BY
  ?customer
  DESC(?orderDate)
LIMIT 3

Hi Marcelo,

Yes, you are correct about correlated subqueries, their absence in SPARQL causes this sort of issues. For now your application would have to loop externally.

The Stardog feature which comes closest is the recent Stored Query Service: Home | Stardog Documentation Latest . Right now its executes subqueries in the uncorrelated way, i.e. as normal subqueries in SPARQL, but it'd be relatively easy for us to support correlated evaluation, i.e. as LATERAL joins in Postgres.

We're monitoring sparql-12/100 and generally welcome it as a candidate for SPARQL 1.2.

Cheers,
Pavel

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.