skyfirehose.com

skyfirehose.com

Query the Bluesky Jetstream with DuckDB

Getting started

1. Get some credentials

Before you can start querying the Bluesky Jetstream data, you need to get some credentials.

duckdb -c "$(curl -s https://skyfirehose.com/bootstrap.sql)"

This will print a SQL statement (CREATE SECRET s3secret (...);) which contains the temporary credentials. You need to copy this statement for the next step.

2. Start your local DuckDB and create secret

The secret from step 1) is valid for 15 minutes. Please paste the copied statement into your local DuckDB instance and execute it. Once you have executed the statement, you can start accessing the Bluesky Jetstream data.

duckdb

3. Attach the remote database

This will attach the remote database to the local database, so you can query it.

ATTACH 'https://skyfirehose.com/database' AS bluesky;

4. Have a look at the schema

You can inspect the schema of the remote database by the query below.

SELECT * FROM bluesky.schema;

5. Run queries

The schema contains five tables: jetstream, likes, follows, posts, and reports.

All of them are partitioned by event_dt and event_hour.

SELECT count(*) FROM bluesky.likes WHERE event_dt = '2024-11-18' and event_hour = '12';