How to query result with stream?

crapthings · March 1, 2021, 1:14pm

is there such feature with javascript api?

i've got this error after query all tripples

Cannot create a string from buffer longer than 0xffffff0 characters

zachary.whitley · March 1, 2021, 2:28pm

It's a javascript thing. Strings can't be larger than about 256Mb

crapthings · March 3, 2021, 4:30am

is there a way to stream query?

or does limit and offset work with query recursively?

this is my query

SELECT
    (stardog:functions:localname(?labelUri) as ?label)
    ?definition
    (stardog:functions:localname(?broaderUri) as ?broader)
    (stardog:functions:localname(?narrowerUri) as ?narrower)
    (stardog:functions:localname(?relatedUri) as ?related)
    (stardog:functions:localname(?similarUri) as ?similar)
  WHERE {
    {
      ?labelUri skos:definition ?definition .
    } UNION {
      ?labelUri skos:broader ?broaderUri .
    } UNION {
      ?labelUri skos:narrower ?narrowerUri .
    } UNION {
      ?labelUri skos:related ?relatedUri .
    }  UNION {
      ?labelUri skosextends:similar ?similarUri .
    }
  }

jess · March 3, 2021, 4:55am

The result is streamed from the server so it's up to the client to avoid buffering if you need that behavior.

crapthings · March 3, 2021, 5:06am

what is "client" here? javascript stardog.js client?

const fs = require('fs-extra')
const _ = require('lodash')

global.constants = require('../constants')

const sparqlQuery = require('../utils/stardog')

const query = `
  SELECT
    (stardog:functions:localname(?labelUri) as ?label)
    ?definition
    (stardog:functions:localname(?broaderUri) as ?broader)
    (stardog:functions:localname(?narrowerUri) as ?narrower)
    (stardog:functions:localname(?relatedUri) as ?related)
    (stardog:functions:localname(?similarUri) as ?similar)
  WHERE {
    {
      ?labelUri skos:definition ?definition .
    } UNION {
      ?labelUri skos:broader ?broaderUri .
    } UNION {
      ?labelUri skos:narrower ?narrowerUri .
    } UNION {
      ?labelUri skos:related ?relatedUri .
    }  UNION {
      ?labelUri skosextends:similar ?similarUri .
    }
  }
`

async function queryResult ({ limit, offset }) {
  const resp = await sparqlQuery(query, {
    reasoning: true,
    limit,
    offset,
  })
  return _.get(resp, 'body.results.bindings', [])
}

if (require.main === module) {
  async function run () {
    const data = {}

    let count = 10000
    let offset = 0
    let limit = count

    while (true) {
      console.log('query', limit)
      const result = await queryResult({ limit, offset })
      if (!result.length) break
      // if (limit > 200000) break

      for (const node of result) {
        const label = node?.label?.value
        if (!data[label]) {
          data[label] = {
            uri: label,
            definition: '',
            broader: [],
            narrower: [],
            related: [],
            similar: [],
          }
        }
        const definition = node?.definition?.value
        if (definition) data[label]['definition'] = definition
        const broader = node?.broader?.value
        if (broader) data[label]['broader'].push(broader)
        const narrower = node?.narrower?.value
        if (narrower) data[label]['narrower'].push(narrower)
        const related = node?.related?.value
        if (related) data[label]['related'].push(related)
        const similar = node?.similar?.value
        if (similar) data[label]['similar'].push(similar)
      }
      limit += count
      offset = limit
    }

    fs.writeFileSync('./test.json', JSON.stringify(data, null, 2))
  }
  run()
}

i'm have about ~17M tripples

i want to query them all and build static page

but this looks never stop, and it slower after each iterate

don't know if this is right approach for such scenario?

or maybe there's a db cursor, i can stream it somehow?

jason · March 3, 2021, 4:40pm

what is "client" here? javascript stardog.js client?

You don't need to use stardog.js for this, but we recommend it. It does support streaming of query results. You just need to pass an onResponseStart handler to the query.execute method of stardog.js. When the response from Stardog begins, that handler will receive (as its only argument) the response stream, and you can then do what you wish with it. Also, if the handler returns false, stardog.js will do no further processing, which is what it sounds like you're going for in this case (if the handler does not return false, then stardog.js will also try to buffer the response as usual -- that is not typically the behavior you want (because the stream can't be read twice), but it can be useful if you just need to know when the response has started for other reasons, but otherwise want stardog.js to proceed as usual).

Here's an example of what this might look like using the latest version of stardog.js (the last argument to query.execute here is the important part):

query.execute(
  conn,
  'myDatabaseName',
  'select distinct ?s where { ?s ?p ?o }',
  undefined,
  {
    limit: 10,
  },
  {
    onResponseStart: (response) => {
      // Stream the response body directly to file. No buffering or other processing.
      const stream = fs.createWriteStream(someFilePath);
      response.body.pipe(stream);
      return false;
    },
  }
);

A couple of other comments about the code you just posted. I haven't read it extremely carefully (and this isn't the place to diagnose end-user code generally), but I did notice a couple of things:

It seems that you are incrementing the limit every time you query. So, each query has a larger limit than the previous one. This is almost certainly not what you want to be doing, unless I really don't understand your intentions. You probably want the query to have the same limit every time, just starting from a different offset. (You can see in your console log that the limit is becoming quite enormous!)
The call in your code to JSON.stringify is a synchronous call and it will always force the JavaScript engine to buffer the entirety of your data into memory, with the result that you are losing most of the benefits of streaming (you ultimately have to "stop the world" while all data is buffered into memory). Instead, you probably want to stream results directly to file in the way that I indicated in the example above, or use some kind of streaming JSON stringifier.

I hope this helps.

Jason

crapthings · March 4, 2021, 2:39am

oh incrementing the limit that's a mistake in code.
thanks

will try onResponseStart~

system · March 18, 2021, 2:39am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Stardog.js limit issue for search queries Bug	4	514	November 26, 2018
Stardog query returns NULL Support	3	503	February 2, 2018
GraphQL query error (result.memory.limit) Support	1	344	February 22, 2022
ASK query returning weird result Support	7	546	April 5, 2018
Stardog, JavaScript, browser Support	5	494	September 6, 2018

How to query result with stream?

Related topics