Storing and Querying JSON in a Postgres Database with Exposed

A picture of a me, a man wearing black shorts, a red T-shirt and a blue life jacket. I'm standing on a paddle in the middle of a body of water. In the background is a grey sky, some clouds and mountains.

A long long time ago, in a galaxy country far away, a young student was doing his final group presentation. They’d built a multiplayer fighting game that would run on the Nintendo DS (it was a long time ago) and web/desktop using Flash (long long time ago!). On one of the presentation slide, they mentioned that they’d used an XML database to store games data, which one of the judge took interest in. That student was quick on his feet, so he replied that it was super convenient because you could just query data, get pre-formatted XML back and send it directly to the clients. That was me, I was that student and it was complete bullshit. We’d given up on using an XML database a long time ago and decided to go back to MySQL, I’d just forgotten to update the schema in our slides. This blogpost has nothing to do with XML, but I’ve used Postgres to store JSON data recently, and it made me think about this story. Anyway, let’s dig in.

Context

Why would you store json data into a table? Good question! In my situation, I wanted to store (and query) additional data in a table’s column. The complicated alternative would have been to either add a bunch of nullable columns, or multiple tables with 1-1 relationships. It’s hard to come up with examples when JSON felt like the perfect solution here, but the main idea was being able to store and query evolving data without having to run migrations on the schema every time.

Storing Webhook Events in a Postgres table

I’ve been trying to come up with a simple example, because I don’t want to reuse what I’ve been working on for obvious secret reasons, so picture this:

  • You’re working on a developer tool that integrates with GitHub, where something happens every time someone push new changes to a repository
  • You already log every webhook event but you want to display them somewhere in a UI, as part of a tool for your customers support team
  • As a result, you decide to store them in a database and provide simple ways to look for specific webhook events based on the content of the request sent to your server. This is completely hypothetical of course, but as someone who did developer support for years, this kind of tool made our lives so much easier.

Let’s start by creating a simple table with an auto-incremented identifier and the content of the request, then import a few events’ payload into the freshly created table.

CREATE TABLE webhook_events (
    // Some other fields like an identifier, creation dates...
    payload json NOT NULL
);

The content of the payloads doesn’t matter, but you can find some example on GitHub’s documentation. In my case, because of another side project, I had a bunch of workflow_run events available.

A JSON column is presented like any others in Postico, but you’re able to pretty-print and check syntax
A JSON column is presented like any others in Postico, but you’re able to pretty-print and check syntax

Now, let’s say I want to find a webhook event for a specific repository, I can do:

SELECT * 
	FROM webhook_events 
	WHERE payload->'repository'->>'full_name' = 'Palleas/Agamotto'

Here -> extracts the content of a json’s column based on a key while ->> extracts content as text. Postgresql supports more operators, but this is all I need here so I haven’t looked into it yet.

It looks like Postgres doesn’t care if you ask for completely keys that do not exist. For example this query simply yields no results:

SELECT * FROM webhook_events WHERE payload->'blup'->>'bloop' = 'Palleas/Agamotto';

… and you can cast the content of a field to use it. For example, instead of comparing the content of a column with the ’true’ string, it’s possible to cast it first to real boolean, like so:

SELECT * FROM webhook_events WHERE (payload->'repository'->>'allow_forking')::boolean = true

Once More, With Exposed

I’ve mentioned that I really enjoyed working with Ktor so many times already that I’m considering tagging my posts to make it easier to find them all. While I’m not the biggest fan of most ORM, I’ve found that Exposed makes it really easy to interact with PostgreSQL. Let’s recreate our example, using Kotlin and Exposed.

// Create the Table
object WebhookEvents: UUIDTable() {
    val payload = json<WebhookPayload>("payload", Json.Default)
}

You can then query the content of this table using the extract function, like so:

val repoIdentifiers = newSuspendedTransaction {
    WebhookEvents
        .selectAll()
        .where { WebhookEvents.payload.extract<String>("repository", "full_name") eq "Palleas/Agamotto" }
        .map { it[WebhookEvents.payload].repository.id }
}

Probable Caveats

There are a few things to consider before storing JSON data and a lot more than I haven’t thought about yet I’m sure:

  • It’s probably very easy to shoot yourself in the foot with minor changes like formatting the JSON differently (going from snake case to camel case for the keys, for example)
  • Postgres doesn’t enforce any schema by default and doesn’t care if you’re trying to query keys that do not exist (as I mentioned earlier). This makes me a little nervous, as someone who likes to know he can trust the database’s schema, but that’s where a good testing coverage helps. Note that I’ve seen ways to add schema validation, but I haven’t looked into it yet.
  • I’m not sure how (or even if) indexes work with a JSON columns, so that’s something I’m planning to look into next.

PostgreSQL is cool and I love learning new ways to use it.