SQLite3 Information

Over the last two year, some of us as spread knowledge across several post. I think it time that we have a consolidated and persitent place where we can simply update info, how-to and how it can be used in a single place. Anyone should feel free to update errorneous info

Why using sqlite

  1. SQLite footprint is ridiculously small which is great for demo and development as it will not kill your laptop.
  2. It really easy to use.
  3. It has a CLI, yes some of us still like those things.
  4. It scriptable, which can make starting a project from CSV a breeze

We will describe different ways it can be used later.

How to install SQLite on your machine.

Mac

OsX already come install with sqlite installed. However if you want to have a more up to date version simply run the command

brew install sqlite3

Linux

There are several flavor of Linux but they usually fall into two family

apt-get install sqlite3
or
yum install sqlite

Installing the JDBC driver to Stardog

Download driver and unzip

export STARDOG_EXT=$STARDOG_HOME/lib 
mkdir $STARDOG_EXT
cp sqlite-jdbc-3.34.0.jar $STARDOG_EXT.     #adjust version

Restart server. If you are using a docker install you will need to adjust the command to start slightly to

docker run -it -e STARDOG_EXT="/var/opt/stardog/lib" -v $STARDOG_HOME:/var/opt/stardog -p 5820:5820 stardog/stardog

Using with SMS2

Now you can use create mapping file that use SQLite just like you would any other database. Just like any others you will require a property file. Here what we recommend

jdbc.url=jdbc:sqlite:/var/opt/stardog/database.db
jdbc.username=whatever
jdbc.password=whatever
jdbc.driver=org.sqlite.JDBC
sql.default.schema=main
sql.dialect=POSTGRESQL
  • For the jdbc.url, you can use relative path, which would simply look for the database in the STARDOG_HOME environment variable. I like to use the absolute path as it's clear where it's located.
  • jdbc.username and jdbc.password are not required, but must be in the properties file.
  • jdbc.driver must be org.sqlite.JDBC, do not change.
  • sql.default.schema=main is required to make it work

:warning: putting the database file in STARDOG_HOME is simply a recommendation, but it does make life easier if your are working with docker sometime. Same procedure can be used.

There is no official sql.dialect, and you may need to adjust it based on your need. Here a table we started to keep track of which one to use.

=============================================
| Dialect   |  LIMIT  |  CONCAT | Need Quote|
=============================================
|            |  No     |    Yes  |    Yes   |
| ORACLE     |  No     |    Yes  |    Yes   |
| HIVE       |  Yes    |     No  |    No    |
| MYSQL      |  No     |    Yes  |    ?     |
| ATHENA     |  Yes    |    Yes  |    No    |
| POSTGRESQL |  Yes    |    Yes  |    No    |
---------------------------------------------
  • To use sqlite with stardog-admin virtual add you must have yes in the LIMIT column.
  • To use sqlite concat feature field1 || ' ' || field2 you must must have yes in the CONCAT column
  • The quote column is just a nice not to have to quote your column or table because theie not in upper case.

For now we recommend POSTGRES, however please update the table by adding a column if you have an issue with specify feature.

Example of how it can be used

Manipulating CSV

SMS2 support for CSV is great and I use it. However sometime you are given slew of CSV file where you need to filter the data you could load. A obviously solution is that you load all the data in Stardog in a staging area, drop the data that not wanted, and move it to your curated area.

The issue I have with that we now have the logic of transforming and dropping the data into two different location. Enters SQLite.

Loading the CSV is very easy, you can do it via the CLI
sqlite>.import ~/file.csv cities
or
SQLite Studio IDE (I personally never used)
or script it

(echo .separator ,; echo .import file1.csv table1) | sqlite3 database.db
(echo .separator ,; echo .import file2.csv table2) | sqlite3 database.db
(echo .separator ,; echo .import file3.csv table3) | sqlite3 database.db
(echo .separator ,; echo .import file4.csv table4) | sqlite3 database.db

Now remember to copy the database file in your STARDOG_HOME directory.

Now you have a self contain SQL database. Now in the SMS file you can drop can use SQL (join) to select the data you wish. You also have access to some transform function. For example, the CONCAT work for CSV in SMS2 mapping file, but it not really supported and will not work with SQL.

Once you tested your query, you can now import the data using just like you would normal
stardog-admin virtual import example ex.properties ex4.sms

Now you have all your Data logic around loading the CSV file encapsulated in one location. Your SMS file. An example

prefix ex: <https://example.org/>
prefix vcard: <http://www.w3.org/2006/vcard/ns#>
prefix owl: <http://www.w3.org/2002/07/owl#>

MAPPING
FROM SQL {
        SELECT
                first_name || ' ' || last_name AS FullName,
                CASE s.statusCode WHEN 1 THEN 'Full Time' ELSE 'Consultant' END Type,
                *
        FROM employee e
        INNER JOIN status s
        ON e.id = s.customer_id
        WHERE s.statusCode != 0
}
TO {
    ?Employee a ex:Employee;
        ex:employeeId ?employeeId;
        ex:type ?Type;
        ex:fullName ?FullName;
        ex:givenName ?fist_name;
        ex:familyName ?last_name;
        ex:email ?email;
}
WHERE {
    BIND(template("https://example.org/employee#{id}") AS ?Employee )
}

Looking at the mapping file, you can readily see all the mapping, tranforms and filtering that was performed on the CSV dataset, just like if you created them over a real Relational Database.

Demoing virtual graph capability

As we mentioned, SQLite is very light and therefore perfect for demo's on laptop, especially the one without much power. Best of all you can package the demo in a nice tightly zip file.

Getting CSV based project started fast

Eventually you may want to look in something like Nifi, but you got several CSV file and you need to deliver.

Like mentioned a couple of time, you can easily script the process to create the DB needed by the CSV file. Therefore, it relatively easy to integration with you devops pipeline.