Hosted with nbsanity. See source notebook on GitHub.

The sqlite-utils tutorial

sqlite-utils is a Python library (and command-line tool for quickly creating and manipulating SQLite database files.

This tutorial will show you how to use the Python library to manipulate data.

Installation

To install the library, run:

pip install sqlite-utils

You can run this in a Jupyter notebook cell by executing:

%pip install sqlite-utils

Or use pip install -U sqlite-utils to ensure you have upgraded to the most recent version.

%pip install -U sqlite_utils

Requirement already satisfied: sqlite_utils in /usr/local/Cellar/jupyterlab/3.0.16_1/libexec/lib/python3.9/site-packages (3.14)
Requirement already satisfied: click-default-group in /usr/local/lib/python3.9/site-packages (from sqlite_utils) (1.2.2)
Requirement already satisfied: sqlite-fts4 in /usr/local/lib/python3.9/site-packages (from sqlite_utils) (1.0.1)
Requirement already satisfied: click in /Users/simon/Library/Python/3.9/lib/python/site-packages (from sqlite_utils) (7.1.2)
Requirement already satisfied: tabulate in /usr/local/lib/python3.9/site-packages (from sqlite_utils) (0.8.7)
Requirement already satisfied: python-dateutil in /usr/local/Cellar/jupyterlab/3.0.16_1/libexec/lib/python3.9/site-package (from sqlite-utils) (2.8.1)
Requirement already satisfied: six>=1.5 in /usr/local/Cellar/jupyterlab/3.0.16_1/libexec/lib/python3.9/site-package (from python-dateutil->sqlite-utils) (1.16.0)
WARNING: You are using pip version 21.1.1; however, version 21.2.2 is available.
You should consider upgrading via the '/usr/local/Cellar/jupyterlab/3.0.16_1/libexec/bin/python3.9 -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

import sqlite_utils

You can use the library with a database file on disk by running:

db = sqlite_utils.Database("path/to/my/database.db")

In this tutorial we will use an in-memory database. This is a quick way to try out new things, though you should note that when you close the notebook the data store in the in-memory database will be lost.

db = sqlite_utils.Database(memory=True)
db

<Database <sqlite3.Connection object at 0x139a16300>>

Creating a table

We are going to create a new table in our database called creatures by passing in a Python list of dictionaries.

db[name_of_table] will access a database table object with that name.

Inserting data into that table will create it if it does not already exist.

db["creatures"].insert_all([{
    "name": "Cleo",
    "species": "dog",
    "age": 6
}, {
    "name": "Lila",
    "species": "chicken",
    "age": 0.8,
}, {
    "name": "Bants",
    "species": "chicken",
    "age": 0.8,
}])

<Table creatures (name, species, age)>

Let’s grab a table reference to the new creatures table:

table = db["creatures"]

sqlite-utils automatically creates a table schema that matches the keys and data types of the dictionaries that were passed to .insert_all().

We can see that schema using table.schema:

print(table.schema)

CREATE TABLE [creatures] (
   [name] TEXT,
   [species] TEXT,
   [age] FLOAT
)

Accessing data

The table.rows property lets us loop through the rows in the table, returning each one as a Python dictionary:

for row in table.rows:
    print(row)

{'name': 'Cleo', 'species': 'dog', 'age': 6.0}
{'name': 'Lila', 'species': 'chicken', 'age': 0.8}
{'name': 'Bants', 'species': 'chicken', 'age': 0.8}

The db.query(sql) method can be used to execute SQL queries and return the results as dictionaries:

list(db.query("select * from creatures"))

[{'name': 'Cleo', 'species': 'dog', 'age': 6.0},
 {'name': 'Lila', 'species': 'chicken', 'age': 0.8},
 {'name': 'Bants', 'species': 'chicken', 'age': 0.8}]

Or in a loop:

for row in db.query("select name, species from creatures"):
    print(f'{row["name"]} is a {row["species"]}')

Cleo is a dog
Lila is a chicken
Bants is a chicken

SQL parameters

You can run a parameterized query using ? as placeholders and passing a list of variables. The variables you pass will be correctly quoted, protecting your code from SQL injection vulnerabilities.

list(db.query("select * from creatures where age > ?", [1.0]))

[{'name': 'Cleo', 'species': 'dog', 'age': 6.0}]

As an alternative to question marks we can use :name parameters and feed in the values using a dictionary:

list(db.query("select * from creatures where species = :species", {"species": "chicken"}))

[{'name': 'Lila', 'species': 'chicken', 'age': 0.8},
 {'name': 'Bants', 'species': 'chicken', 'age': 0.8}]

Primary keys

When we created this table we did not specify a primary key. SQLite automatically creates a primary key called rowid if no other primary key is defined.

We can run select rowid, * from creatures to see this hidden primary key:

list(db.query("select rowid, * from creatures"))

[{'rowid': 1, 'name': 'Cleo', 'species': 'dog', 'age': 6.0},
 {'rowid': 2, 'name': 'Lila', 'species': 'chicken', 'age': 0.8},
 {'rowid': 3, 'name': 'Bants', 'species': 'chicken', 'age': 0.8}]

We can also see that using table.pks_and_rows_where():

for pk, row in table.pks_and_rows_where():
    print(pk, row)

1 {'rowid': 1, 'name': 'Cleo', 'species': 'dog', 'age': 6.0}
2 {'rowid': 2, 'name': 'Lila', 'species': 'chicken', 'age': 0.8}
3 {'rowid': 3, 'name': 'Bants', 'species': 'chicken', 'age': 0.8}

Let’s recreate the table with our own primary key, which we will call id.

table.drop() drops the table:

table.drop()

table

<Table creatures (does not exist yet)>

We can see a list of tables in the database using db.tables:

db.tables

[]

We’ll create the table again, this time with an id column.

We use pk="id" to specify that the id column should be treated as the primary key for the table:

db["creatures"].insert_all([{
    "id": 1,
    "name": "Cleo",
    "species": "dog",
    "age": 6
}, {
    "id": 2,
    "name": "Lila",
    "species": "chicken",
    "age": 0.8,
}, {
    "id": 3,
    "name": "Bants",
    "species": "chicken",
    "age": 0.8,
}], pk="id")

<Table creatures (id, name, species, age)>

print(table.schema)

CREATE TABLE [creatures] (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT,
   [species] TEXT,
   [age] FLOAT
)

Inserting more records

We can call .insert_all() again to insert more records. Let’s add two more chickens.

table.insert_all([{
    "id": 4,
    "name": "Azi",
    "species": "chicken",
    "age": 0.8,
}, {
    "id": 5,
    "name": "Snowy",
    "species": "chicken",
    "age": 0.9,
}], pk="id")

<Table creatures (id, name, species, age)>

list(table.rows)

[{'id': 1, 'name': 'Cleo', 'species': 'dog', 'age': 6.0},
 {'id': 2, 'name': 'Lila', 'species': 'chicken', 'age': 0.8},
 {'id': 3, 'name': 'Bants', 'species': 'chicken', 'age': 0.8},
 {'id': 4, 'name': 'Azi', 'species': 'chicken', 'age': 0.8},
 {'id': 5, 'name': 'Snowy', 'species': 'chicken', 'age': 0.9}]

Since the id column is an integer primary key, we can insert a record without specifying an ID and one will be automatically added.

Since we are only adding one record we will use .insert() instead of .insert_all().

table.insert({"name": "Blue", "species": "chicken", "age": 0.9})

<Table creatures (id, name, species, age)>

We can use table.last_pk to see the ID of the record we just added.

table.last_pk

Here’s the full list of rows again:

list(table.rows)

[{'id': 1, 'name': 'Cleo', 'species': 'dog', 'age': 6.0},
 {'id': 2, 'name': 'Lila', 'species': 'chicken', 'age': 0.8},
 {'id': 3, 'name': 'Bants', 'species': 'chicken', 'age': 0.8},
 {'id': 4, 'name': 'Azi', 'species': 'chicken', 'age': 0.8},
 {'id': 5, 'name': 'Snowy', 'species': 'chicken', 'age': 0.9},
 {'id': 6, 'name': 'Blue', 'species': 'chicken', 'age': 0.9}]

If you try to add a new record with an existing ID, you will get an IntegrityError:

table.insert({"id": 6, "name": "Red", "species": "chicken", "age": 0.9})

---------------------------------------------------------------------------
IntegrityError                            Traceback (most recent call last)
<ipython-input-24-4222c6abc759> in <module>
----> 1 table.insert({"id": 6, "name": "Red", "species": "chicken", "age": 0.9})

/usr/local/Cellar/jupyterlab/3.0.16_1/libexec/lib/python3.9/site-packages/sqlite_utils/db.py in insert(self, record, pk, foreign_keys, column_order, not_null, defaults, hash_id, alter, ignore, replace, extracts, conversions, columns)
   2027         columns=DEFAULT,
   2028     ):
-> 2029         return self.insert_all(
   2030             [record],
   2031             pk=pk,

/usr/local/Cellar/jupyterlab/3.0.16_1/libexec/lib/python3.9/site-packages/sqlite_utils/db.py in insert_all(self, records, pk, foreign_keys, column_order, not_null, defaults, batch_size, hash_id, alter, ignore, replace, truncate, extracts, conversions, columns, upsert)
   2143             first = False
   2144 
-> 2145             self.insert_chunk(
   2146                 alter,
   2147                 extracts,

/usr/local/Cellar/jupyterlab/3.0.16_1/libexec/lib/python3.9/site-packages/sqlite_utils/db.py in insert_chunk(self, alter, extracts, chunk, all_columns, hash_id, upsert, pk, conversions, num_records_processed, replace, ignore)
   1955             for query, params in queries_and_params:
   1956                 try:
-> 1957                     result = self.db.execute(query, params)
   1958                 except OperationalError as e:
   1959                     if alter and (" column" in e.args[0]):

/usr/local/Cellar/jupyterlab/3.0.16_1/libexec/lib/python3.9/site-packages/sqlite_utils/db.py in execute(self, sql, parameters)
    255             self._tracer(sql, parameters)
    256         if parameters is not None:
--> 257             return self.conn.execute(sql, parameters)
    258         else:
    259             return self.conn.execute(sql)

IntegrityError: UNIQUE constraint failed: creatures.id

You can use replace=True to replace the matching record with a new one:

table.insert({"id": 6, "name": "Red", "species": "chicken", "age": 0.9}, replace=True)

<Table creatures (id, name, species, age)>

list(table.rows)

[{'id': 1, 'name': 'Cleo', 'species': 'dog', 'age': 6.0},
 {'id': 2, 'name': 'Lila', 'species': 'chicken', 'age': 0.8},
 {'id': 3, 'name': 'Bants', 'species': 'chicken', 'age': 0.8},
 {'id': 4, 'name': 'Azi', 'species': 'chicken', 'age': 0.8},
 {'id': 5, 'name': 'Snowy', 'species': 'chicken', 'age': 0.9},
 {'id': 6, 'name': 'Red', 'species': 'chicken', 'age': 0.9}]

Updating a record

We will rename that row back to Blue, this time using the table.update(pk, updates) method:

table.update(6, {"name": "Blue"})

<Table creatures (id, name, species, age)>

list(db.query("select * from creatures where id = ?", [6]))

[{'id': 6, 'name': 'Blue', 'species': 'chicken', 'age': 0.9}]

Extracting one of the columns into another table

Our current table has a species column with a string in it - let’s pull that out into a separate table.

We can do that using the table.extract() method.

table.extract("species")

<Table creatures (id, name, species_id, age)>

We now have a new table called species, which we can see using the db.tables method:

db.tables

[<Table species (id, species)>, <Table creatures (id, name, species_id, age)>]

Our creatures table has been modified - instead of a species column it now has species_id which is a foreign key to the new table:

print(db["creatures"].schema)
print(list(db["creatures"].rows))

CREATE TABLE "creatures" (
   [id] INTEGER PRIMARY KEY,
   [name] TEXT,
   [species_id] INTEGER,
   [age] FLOAT,
   FOREIGN KEY([species_id]) REFERENCES [species]([id])
)
[{'id': 1, 'name': 'Cleo', 'species_id': 1, 'age': 6.0}, {'id': 2, 'name': 'Lila', 'species_id': 2, 'age': 0.8}, {'id': 3, 'name': 'Bants', 'species_id': 2, 'age': 0.8}, {'id': 4, 'name': 'Azi', 'species_id': 2, 'age': 0.8}, {'id': 5, 'name': 'Snowy', 'species_id': 2, 'age': 0.9}, {'id': 6, 'name': 'Blue', 'species_id': 2, 'age': 0.9}]

The new species table has been created and populated too:

print(db["species"].schema)
print(list(db["species"].rows))

CREATE TABLE [species] (
   [id] INTEGER PRIMARY KEY,
   [species] TEXT
)
[{'id': 1, 'species': 'dog'}, {'id': 2, 'species': 'chicken'}]

We can use a join SQL query to combine data from these two tables:

list(db.query("""
    select
      creatures.id,
      creatures.name,
      creatures.age,
      species.id as species_id,
      species.species
    from creatures
      join species on creatures.species_id = species.id
"""))

[{'id': 1, 'name': 'Cleo', 'age': 6.0, 'species_id': 1, 'species': 'dog'},
 {'id': 2, 'name': 'Lila', 'age': 0.8, 'species_id': 2, 'species': 'chicken'},
 {'id': 3, 'name': 'Bants', 'age': 0.8, 'species_id': 2, 'species': 'chicken'},
 {'id': 4, 'name': 'Azi', 'age': 0.8, 'species_id': 2, 'species': 'chicken'},
 {'id': 5, 'name': 'Snowy', 'age': 0.9, 'species_id': 2, 'species': 'chicken'},
 {'id': 6, 'name': 'Blue', 'age': 0.9, 'species_id': 2, 'species': 'chicken'}]