Over the last two year, some of us as spread knowledge across several post. I think it time that we have a consolidated and persitent place where we can simply update info, how-to and how it can be used in a single place. Anyone should feel free to update errorneous info
Why using sqlite
- SQLite footprint is ridiculously small which is great for demo and development as it will not kill your laptop.
- It really easy to use.
- It has a CLI, yes some of us still like those things.
- It scriptable, which can make starting a project from CSV a breeze
We will describe different ways it can be used later.
How to install SQLite on your machine.
Mac
OsX already come install with sqlite installed. However if you want to have a more up to date version simply run the command
brew install sqlite3
Linux
There are several flavor of Linux but they usually fall into two family
apt-get install sqlite3
or
yum install sqlite
Installing the JDBC driver to Stardog
export STARDOG_EXT=$STARDOG_HOME/lib
mkdir $STARDOG_EXT
cp sqlite-jdbc-3.34.0.jar $STARDOG_EXT. #adjust version
Restart server. If you are using a docker install you will need to adjust the command to start slightly to
docker run -it -e STARDOG_EXT="/var/opt/stardog/lib" -v $STARDOG_HOME:/var/opt/stardog -p 5820:5820 stardog/stardog
Using with SMS2
Now you can use create mapping file that use SQLite just like you would any other database. Just like any others you will require a property file. Here what we recommend
jdbc.url=jdbc:sqlite:/var/opt/stardog/database.db
jdbc.username=whatever
jdbc.password=whatever
jdbc.driver=org.sqlite.JDBC
sql.default.schema=main
sql.dialect=POSTGRESQL
- For the
jdbc.url
, you can use relative path, which would simply look for the database in theSTARDOG_HOME
environment variable. I like to use the absolute path as it's clear where it's located. -
jdbc.username
andjdbc.password
are not required, but must be in the properties file. -
jdbc.driver
must beorg.sqlite.JDBC
, do not change. -
sql.default.schema=main
is required to make it work
putting the database file in STARDOG_HOME
is simply a recommendation, but it does make life easier if your are working with docker sometime. Same procedure can be used.
There is no official sql.dialect
, and you may need to adjust it based on your need. Here a table we started to keep track of which one to use.
=============================================
| Dialect | LIMIT | CONCAT | Need Quote|
=============================================
| | No | Yes | Yes |
| ORACLE | No | Yes | Yes |
| HIVE | Yes | No | No |
| MYSQL | No | Yes | ? |
| ATHENA | Yes | Yes | No |
| POSTGRESQL | Yes | Yes | No |
---------------------------------------------
- To use sqlite with
stardog-admin virtual add
you must have yes in theLIMIT
column. - To use sqlite concat feature
field1 || ' ' || field2
you must must have yes in the CONCAT column - The quote column is just a nice not to have to quote your column or table because theie not in upper case.
For now we recommend POSTGRES, however please update the table by adding a column if you have an issue with specify feature.
Example of how it can be used
Manipulating CSV
SMS2 support for CSV is great and I use it. However sometime you are given slew of CSV file where you need to filter the data you could load. A obviously solution is that you load all the data in Stardog in a staging area, drop the data that not wanted, and move it to your curated area.
The issue I have with that we now have the logic of transforming and dropping the data into two different location. Enters SQLite.
Loading the CSV is very easy, you can do it via the CLI
sqlite>.import ~/file.csv cities
or
SQLite Studio IDE (I personally never used)
or script it
(echo .separator ,; echo .import file1.csv table1) | sqlite3 database.db
(echo .separator ,; echo .import file2.csv table2) | sqlite3 database.db
(echo .separator ,; echo .import file3.csv table3) | sqlite3 database.db
(echo .separator ,; echo .import file4.csv table4) | sqlite3 database.db
Now remember to copy the database file in your STARDOG_HOME
directory.
Now you have a self contain SQL database. Now in the SMS file you can drop can use SQL (join) to select the data you wish. You also have access to some transform function. For example, the CONCAT
work for CSV in SMS2 mapping file, but it not really supported and will not work with SQL.
Once you tested your query, you can now import the data using just like you would normal
stardog-admin virtual import example ex.properties ex4.sms
Now you have all your Data logic around loading the CSV file encapsulated in one location. Your SMS file. An example
prefix ex: <https://example.org/>
prefix vcard: <http://www.w3.org/2006/vcard/ns#>
prefix owl: <http://www.w3.org/2002/07/owl#>
MAPPING
FROM SQL {
SELECT
first_name || ' ' || last_name AS FullName,
CASE s.statusCode WHEN 1 THEN 'Full Time' ELSE 'Consultant' END Type,
*
FROM employee e
INNER JOIN status s
ON e.id = s.customer_id
WHERE s.statusCode != 0
}
TO {
?Employee a ex:Employee;
ex:employeeId ?employeeId;
ex:type ?Type;
ex:fullName ?FullName;
ex:givenName ?fist_name;
ex:familyName ?last_name;
ex:email ?email;
}
WHERE {
BIND(template("https://example.org/employee#{id}") AS ?Employee )
}
Looking at the mapping file, you can readily see all the mapping, tranforms and filtering that was performed on the CSV dataset, just like if you created them over a real Relational Database.
Demoing virtual graph capability
As we mentioned, SQLite is very light and therefore perfect for demo's on laptop, especially the one without much power. Best of all you can package the demo in a nice tightly zip file.
Getting CSV based project started fast
Eventually you may want to look in something like Nifi, but you got several CSV file and you need to deliver.
Like mentioned a couple of time, you can easily script the process to create the DB needed by the CSV file. Therefore, it relatively easy to integration with you devops pipeline.