Hypertables
Hypertables are usually used to ingest time-series data. They are a high-performance version of regular Postgres tables focussed on time-based bucketting, chunking, and partitioning.
Hypertables make less sense as dbt models to store transformed data. However, you can still use them as such. A more useful version right now is the empty
option, which will create empty hypertables.
Look into virtual hypertables
If you're looking to use dbt to configure your leverage pre-existing hypertables, check out the virtual hypertables guide.
Only run hypertable models once
dbt will always recreate your entire model. This means that all existing data in your hypertables will be lost when you run them again. If you're using hypertables for ingesting time-series data, you probably don't want this.
Info
Consult the Timescale docs for more information regarding hypertables.
Usage
To materialize a model as a hypertable, simply set its materialization
in the config to hypertable
. Every hypertable also requires you to set the name of the time column.
{{
config(
materialized='hypertable',
main_dimension='time_column'
)
}}
select current_timestamp as time_column
models:
your_project_name:
folder_containing_the_hypertables:
+materialized: hypertable
model_one:
+main_dimension: time_column # (1)!
model_two:
+main_dimension: time_column_in_model_two
# ...
- While you can set the
hypertable
materialization for multiple models, you'll still have to configure themain_dimension
for each model individually.
Configuration options
dbt-specific options
The following options are not taken from the TimescaleDB APIs, but are specific to this adapter.
empty_hypertable
: If set totrue
, the hypertable will be truncated right after creation (as a regular table) and right before converting it into a hypertable. Defaults tofalse
.
TimescaleDB hypertable options
The TimescaleDB option create_default_indexes
can be set to true
or false
. It defaults to true
.
Dimensions
Hypertables have one or more dimensions, defined upon creation of the hypertable. The main dimension of a hypertable is provided using the main_dimension
configuration option. Additional dimensions can be added to the hypertable using the dimensions
configuration option.
In this adapter, dimensions can be provided as a dictionary with the following options:
column_name
type
:by_hash
orby_range
(default isby_range
)partition_interval
(only forby_range
)number_partitions
(only forby_hash
)partitioning_func
Since most dimensions will probably be by_range
dimensions with a column name, you can also provide the name of the column as a shorthand instead of a dictionary.
Empty hypertable required
You can only add dimensions to an empty hypertable.
Info
Consult the Timescale docs for more information regarding adding dimensions or the documentation on dimension builders.
{{ config(
materialized = 'hypertable',
main_dimension = 'time_column',
dimensions=[
{"column_name": "id", "type": "by_hash", "number_partitions": 5},
{"column_name": "col_1", "type": "by_range", "partition_interval": "interval '1 day'"},
{"column_name": "another_column", "type": "by_range"}
]
}}
select
current_timestamp as time_column,
1 as id,
2 as col_1,
3 as another_column
models:
your_project_name:
model_name:
+materialized: hypertable
+main_dimension:
column_name: time_column
type: by_range
# the above would be equivalent to +main_dimension: time_column
+dimensions:
- column_name: id
type: by_hash
number_partitions: 5
- column_name: another_time_column
type: by_range
partition_interval: interval '1 day'
# ...
integer_now_func
The following 2 options are available for (virtual) hypertables where the time column is not a timestamp:
integer_now_func
(string): name of a function to be used to generate the current time as an integer.integer_now_func_sql
(string, optional): SQL code for the function mentioned above. If provided, the function with the name set ininteger_now_func
will be created. If not provided, an error will be thrown if the function does not exist already.
Use a macro
You could also call a macro for your integer_now_func_sql
.
Idempotent
The integer_now_func_sql
is idempotent and will replace an existing function if a function with the given name already exists. So while it may cause some overhead during the dbt run, it doesn't matter if you share this config across multiple models.
The name is enough
You don't have to provide the SQL code for the function if you already have a function with the name set in integer_now_func
in your database. You could create the function once in a single model or with dbt run-operation
and then reuse it in all other models.
Info
Consult the Timescale docs for more information regarding this functionality.
{{
config(
materialized='hypertable',
main_dimension='time_column',
integer_now_func='my_hypertable_int_to_now',
integer_now_func_sql='select extract(epoch from now())::bigint'
)
}}
select 1::bigint as time_column
chunk_time_interval
The chunk_time_interval
config option allows you to set the interval at which TimescaleDB will chunk your (virtual) hypertable. This is useful for optimizing query performance and storage efficiency. The default value is 1 week
.
Note that the type of the interval depends on the type of your time column and has to match.
```sql+jinja title="models/my_hypertable.sql" {{ config( materialized='hypertable', main_dimension='time_column', chunk_time_interval="interval '1 day'" ) }}