rails database architecture

TimescaleDB in Rails: A Practical Implementation Guide

- 23 min read

Implement TimescaleDB in Rails with hypertable migrations, columnar compression, continuous aggregates, retention policies, and testing helpers for production.

Implementing TimescaleDB in a Rails application - from migration to production

My previous post covered when TimescaleDB makes sense in Rails - and when it doesn’t. Readers wanted the “how.” This is that post.

I’m going to walk through a complete TimescaleDB implementation in a Rails application, from initial setup through compression, retention, continuous aggregates, and the testing gotchas that nobody warns you about. The scenario is a real one I’ve encountered multiple times: an analytics events table that has outgrown plain PostgreSQL and needs time-series treatment.

If you haven’t read the previous post, the short version is this: TimescaleDB is the right choice when your data is append-only, time-indexed, and growing predictably. If that describes your situation, read on.

Hypertable vs Regular Table: What Changes

  Regular PostgreSQL Table TimescaleDB Hypertable
Primary key Auto-incrementing id No surrogate key (id: false), time column is the primary dimension
Partitioning Manual or none Automatic time-based chunks
Writes INSERT, UPDATE, DELETE equally supported Optimized for INSERT; UPDATE/DELETE slower, impossible on compressed chunks
Compression None built-in Columnar compression with 90%+ storage reduction
Retention Manual DELETE queries (slow, locks table) add_retention_policy drops entire chunks instantly
Aggregation Computed on every query Continuous aggregates precompute and auto-refresh
ActiveRecord Full compatibility Works, but update/destroy should be avoided; some raw SQL needed
Schema dumps schema.rb works perfectly Requires structure.sql or test helpers to preserve hypertable metadata

Choosing Your Gem

Use the timescaledb gem for most Rails projects. Maintained by Timescale themselves, it provides acts_as_hypertable, schema dumper integration, continuous aggregate helpers, and a rich model-level DSL. It’s opinionated in the right places and handles the ActiveRecord integration thoughtfully.

The alternative is timescaledb-rails, which extends the ActiveRecord PostgreSQL adapter directly. It provides migration helpers like create_hypertable and enable_hypertable_compression as first-class migration methods. It’s lighter, closer to raw SQL, and better if you want minimal abstraction.

  timescaledb gem timescaledb-rails gem
Maintainer Timescale (official) Crunchloop (community)
Approach Model-level DSL ActiveRecord adapter extension
Key features acts_as_hypertable, continuous aggregate helpers, schema dumper create_hypertable migration method, compression helpers
Abstraction Higher - more Rails-like Lower - closer to raw SQL
Best for Most teams; full-featured integration Teams wanting minimal abstraction

For this guide, I’m going to use the timescaledb gem because it covers more ground. But the underlying concepts are the same regardless of which gem you choose, and I’ll note where the approaches diverge.

Installation and Database Setup

Start with the gem:

# Gemfile
gem 'timescaledb'

TimescaleDB is a PostgreSQL extension, not a separate database. If you’re running PostgreSQL locally, you’ll need to install the extension. On macOS with Homebrew:

brew install timescaledb
timescaledb-tune --quiet --yes

On Ubuntu/Debian:

sudo apt install timescaledb-2-postgresql-16
sudo timescaledb-tune --quiet --yes
sudo systemctl restart postgresql

Then enable the extension in your database. You can do this via a migration:

class EnableTimescaledb < ActiveRecord::Migration[7.2]
  def change
    enable_extension 'timescaledb' unless extension_enabled?('timescaledb')
  end
end

If you’re running on a managed PostgreSQL service, check whether TimescaleDB is supported. Some providers bundle it, some offer it as an add-on, and some don’t support it at all. This is worth verifying before you commit to anything.

The Initializer

Set up the gem in an initializer so your models have access to the TimescaleDB macros:

# config/initializers/timescaledb.rb
ActiveSupport.on_load(:active_record) do
  extend Timescaledb::ActsAsHypertable
end

This makes acts_as_hypertable available on all models. If you’d prefer to be explicit, you can skip the initializer and extend individual models instead.

Creating Your First Hypertable

Here’s where things diverge from standard Rails patterns. A hypertable is a PostgreSQL table that TimescaleDB automatically partitions by time. The key difference from a normal table: you typically don’t want an auto-incrementing id column. TimescaleDB partitions by time, and a sequential integer primary key fights that partitioning.

Let’s create an analytics_events hypertable for tracking user activity:

class CreateAnalyticsEvents < ActiveRecord::Migration[7.2]
  def up
    hypertable_options = {
      time_column: 'occurred_at',
      chunk_time_interval: '1 day',
      compress_segmentby: 'event_type',
      compress_orderby: 'occurred_at DESC',
      compress_after: '7 days'
    }

    create_table(:analytics_events, id: false, hypertable: hypertable_options) do |t|
      t.timestamptz :occurred_at, null: false
      t.references :user, null: false, foreign_key: true
      t.string :event_type, null: false
      t.string :resource_type
      t.bigint :resource_id
      t.jsonb :properties, default: {}
      t.inet :ip_address
      t.string :user_agent
    end

    add_index :analytics_events, [:event_type, :occurred_at]
    add_index :analytics_events, [:user_id, :occurred_at]
  end

  def down
    drop_table :analytics_events
  end
end

A few things to notice here.

The id: false is intentional. Hypertables don’t need surrogate keys because rows are identified by their time dimension plus whatever natural key makes sense for your data. If you absolutely need a unique identifier per row, use a UUID column instead of an auto-incrementing integer, but consider whether you need it.

The chunk_time_interval determines how TimescaleDB partitions data. One day per chunk is reasonable for most Rails applications writing thousands to hundreds of thousands of events per day. If you’re writing millions per day, consider a shorter interval. The goal is chunks that are large enough to be worth partitioning but small enough that the planner can skip irrelevant ones efficiently.

The compression settings are declared upfront. compress_segmentby tells TimescaleDB which column to use for grouping compressed data, and compress_after defines how soon chunks become eligible for compression. Seven days is a conservative starting point: recent data stays uncompressed for fast writes and queries, while older data gets compressed for storage savings.

The Model

# app/models/analytics_event.rb
class AnalyticsEvent < ApplicationRecord
  acts_as_hypertable time_column: 'occurred_at'

  belongs_to :user
  belongs_to :resource, polymorphic: true, optional: true

  validates :event_type, presence: true
  validates :occurred_at, presence: true

  scope :of_type, ->(type) { where(event_type: type) }
  scope :for_user, ->(user) { where(user_id: user.id) }
  scope :in_range, ->(range) { where(occurred_at: range) }
end

The acts_as_hypertable macro gives your model awareness of its hypertable nature. It adds scopes like last_week, this_month, yesterday, and today automatically. It also provides access to hypertable metadata through AnalyticsEvent.hypertable, which returns information about chunks, dimensions, and compression state.

One thing to internalize: this model does not behave like a typical ActiveRecord model in some important ways. You should avoid calling update or update! on records. Hypertables are optimized for append-only workloads. Updates work, but they’re slower than on regular tables, and they become significantly slower once compression is enabled. If you need to correct data, prefer deleting and reinserting over updating in place.

Recording Events

The write path should be straightforward:

class EventTracker
  def self.track(user:, event_type:, resource: nil, properties: {}, request: nil)
    AnalyticsEvent.create!(
      user: user,
      event_type: event_type,
      occurred_at: Time.current,
      resource: resource,
      properties: properties,
      ip_address: request&.remote_ip,
      user_agent: request&.user_agent
    )
  end
end

For high-throughput scenarios, consider batching inserts. ActiveRecord’s insert_all works with hypertables:

class EventTracker
  def self.track_batch(events)
    AnalyticsEvent.insert_all(
      events.map do |event|
        {
          user_id: event[:user].id,
          event_type: event[:event_type],
          occurred_at: event[:occurred_at] || Time.current,
          resource_type: event[:resource]&.class&.name,
          resource_id: event[:resource]&.id,
          properties: (event[:properties] || {}).to_json
        }
      end
    )
  end
end

Batch inserts bypass validations, so make sure your data is clean before it gets here. In production, I typically run validation logic in the caller and use insert_all as a dumb pipe.

Querying with time_bucket

TimescaleDB’s time_bucket function is the workhorse of time-series queries. It groups rows into fixed time intervals, which is exactly what you need for dashboards, trend analysis, and aggregation.

ActiveRecord can express these queries, but you’ll be writing some SQL. This is one of those places where dropping below the abstraction is the right call:

# Events per hour over the last 24 hours
AnalyticsEvent
  .select("time_bucket('1 hour', occurred_at) AS bucket, count(*) AS total")
  .where(occurred_at: 24.hours.ago..Time.current)
  .group('bucket')
  .order('bucket')

For more complex aggregations, wrap them in scopes:

class AnalyticsEvent < ApplicationRecord
  acts_as_hypertable time_column: 'occurred_at'

  scope :hourly_counts, -> {
    select("time_bucket('1 hour', occurred_at) AS bucket, event_type, count(*) AS total")
      .group('bucket, event_type')
      .order('bucket')
  }

  scope :daily_counts, -> {
    select("time_bucket('1 day', occurred_at) AS bucket, event_type, count(*) AS total")
      .group('bucket, event_type')
      .order('bucket')
  }

  scope :daily_unique_users, -> {
    select("time_bucket('1 day', occurred_at) AS bucket, count(DISTINCT user_id) AS unique_users")
      .group('bucket')
      .order('bucket')
  }
end

These scopes compose naturally with other scopes:

AnalyticsEvent.of_type('page_view').in_range(1.week.ago..Time.current).daily_counts

The query planner is doing real work here. Because the data is partitioned by time, queries with time predicates only scan the relevant chunks. A query for the last 24 hours against a table with a year of data will not touch 364 days worth of partitions. This is the core performance benefit, and it happens automatically.

Compression in Practice

Compression is where TimescaleDB starts saving you real money. Uncompressed time-series data at scale is expensive, and TimescaleDB’s columnar compression can reduce storage by 90% or more depending on your data.

If you declared compression settings in your migration (as shown above), TimescaleDB will automatically compress chunks older than the compress_after interval. But you can also manage compression manually through the model:

# Check compression stats
AnalyticsEvent.hypertable.compression_stats

# See which chunks are compressed
AnalyticsEvent.hypertable.chunks.each do |chunk|
  puts "#{chunk.chunk_name}: compressed=#{chunk.is_compressed}"
end

# Manually compress old chunks
AnalyticsEvent.hypertable.chunks
  .where(is_compressed: false)
  .where('range_end < ?', 1.week.ago)
  .each(&:compress!)

One critical gotcha: you cannot update or delete individual rows in compressed chunks. If you need to modify compressed data, you must decompress the chunk first, make your changes, and then recompress. This is another reason to treat hypertable data as immutable.

Data Retention Policies

Retention policies let you automatically drop old data. This is essential for tables that grow indefinitely but where historical data beyond a certain age has no value.

You can set a retention policy in your migration:

class AddRetentionPolicyToAnalyticsEvents < ActiveRecord::Migration[7.2]
  def up
    execute "SELECT add_retention_policy('analytics_events', INTERVAL '6 months');"
  end

  def down
    execute "SELECT remove_retention_policy('analytics_events');"
  end
end

This automatically drops chunks older than six months. The key insight is that TimescaleDB drops entire chunks, not individual rows. This means retention cleanup is nearly instantaneous regardless of data volume, because it’s just dropping child tables. Compare that to a DELETE FROM events WHERE created_at < ? on a regular PostgreSQL table, which can lock the table and take hours on large datasets.

If you need to keep aggregated data longer than raw data, the pattern is: set a short retention policy on the hypertable, and use continuous aggregates (covered next) to preserve rolled-up summaries indefinitely.

Continuous Aggregates

Continuous aggregates are materialized views that TimescaleDB automatically refreshes. They solve the “dashboard query” problem: instead of aggregating millions of rows on every page load, you precompute the aggregation and query the result.

The timescaledb gem provides a DSL for defining continuous aggregates directly in your model:

class AnalyticsEvent < ApplicationRecord
  extend Timescaledb::ActsAsHypertable
  include Timescaledb::ContinuousAggregatesHelper

  acts_as_hypertable time_column: 'occurred_at'

  scope :events_by_type, -> {
    select("event_type, count(*) AS total")
      .group(:event_type)
  }

  scope :unique_users, -> {
    select("count(DISTINCT user_id) AS unique_users")
  }

  continuous_aggregates(
    scopes: [:events_by_type, :unique_users],
    timeframes: [:hour, :day, :month],
    refresh_policy: {
      hour: {
        start_offset: '4 hours',
        end_offset: '1 hour',
        schedule_interval: '1 hour'
      },
      day: {
        start_offset: '3 days',
        end_offset: '1 day',
        schedule_interval: '1 day'
      },
      month: {
        start_offset: '3 months',
        end_offset: '1 day',
        schedule_interval: '1 day'
      }
    }
  )
end

Then create them via a migration:

class CreateAnalyticsEventContinuousAggregates < ActiveRecord::Migration[7.2]
  def up
    AnalyticsEvent.create_continuous_aggregates
  end

  def down
    AnalyticsEvent.drop_continuous_aggregates
  end
end

This creates materialized views like analytics_events_events_by_type_per_hour, analytics_events_events_by_type_per_day, and so on. Each view is itself backed by a hypertable, so it inherits the same chunking and compression benefits.

Querying them feels natural:

# Hourly event breakdown, last 24 hours
AnalyticsEvent::EventsByTypePerHour
  .where(occurred_at: 24.hours.ago..Time.current)
  .all

# Daily unique users, last 30 days
AnalyticsEvent::UniqueUsersPerDay
  .where(occurred_at: 30.days.ago..Time.current)
  .all

The refresh policies control how often TimescaleDB updates the aggregate. The start_offset and end_offset define the time window that gets refreshed on each run. The schedule_interval controls how frequently the refresh job runs. These jobs run inside the database itself, so there’s no cron job or Sidekiq/Solid Queue worker to manage.

One architectural decision to make early: continuous aggregates survive data retention. If you drop raw data after six months but your monthly aggregate has been running the whole time, you’ll still have monthly summaries going back to the beginning. This is the right pattern for most analytics: keep raw data for short-term debugging and detailed queries, keep aggregates for long-term trends.

The Schema Dump Problem

This is the part that catches most Rails teams off guard. TimescaleDB’s hypertable metadata doesn’t survive a standard db:schema:dump and db:schema:load cycle cleanly. Your schema.rb won’t include the create_hypertable calls, compression policies, or continuous aggregates.

The timescaledb gem has improved this significantly with its custom schema dumper, but there are still edge cases. Here’s what I recommend:

Option 1: Use structure.sql instead of schema.rb. Set config.active_record.schema_format = :sql in your application config. This dumps the actual SQL structure, which preserves TimescaleDB metadata more faithfully. The downside is that structure.sql files are harder to read and diff.

Option 2: Stick with schema.rb but handle hypertable setup in test helpers. This is the approach I usually take:

# spec/support/timescaledb.rb
RSpec.configure do |config|
  config.before(:suite) do
    ActiveRecord::Base.connection.execute(
      "SELECT create_hypertable('analytics_events', 'occurred_at', if_not_exists => TRUE, migrate_data => TRUE);"
    )
  end
end

Or, if you’re using the timescaledb gem’s built-in support:

# spec/spec_helper.rb
RSpec.configure do |config|
  config.before(:suite) do
    hypertable_models = ActiveRecord::Base.descendants.select { |m|
      m.respond_to?(:acts_as_hypertable?) && m.acts_as_hypertable?
    }

    hypertable_models.each do |klass|
      next if klass.try(:hypertable).present?

      ApplicationRecord.connection.create_hypertable(
        klass.table_name,
        time_column: klass.hypertable_options[:time_column],
        chunk_time_interval: '1 day'
      )
    end
  end
end

This ensures your test database has proper hypertables even when loaded from schema.rb.

Using a Separate Database

For applications where TimescaleDB is handling a specific concern (analytics, metrics, audit logs) while the rest of the app uses plain PostgreSQL, the multi-database approach is clean and keeps concerns isolated.

Rails supports multiple databases natively since version 6:

# config/database.yml
development:
  primary:
    adapter: postgresql
    database: myapp_development
  timescale:
    adapter: postgresql
    database: myapp_timescale_development
    migrations_paths: db/timescale_migrate

Then create an abstract base class for your time-series models:

# app/models/timescale_record.rb
class TimescaleRecord < ApplicationRecord
  self.abstract_class = true

  connects_to database: { writing: :timescale, reading: :timescale }

  extend Timescaledb::ActsAsHypertable
end

# app/models/analytics_event.rb
class AnalyticsEvent < TimescaleRecord
  acts_as_hypertable time_column: 'occurred_at'
  # ...
end

This isolates TimescaleDB concerns from your main database entirely. Migrations for time-series tables go in db/timescale_migrate/, and you run them with rails db:migrate:timescale. The main benefit is operational: you can scale, tune, and manage your TimescaleDB instance independently from your primary database.

The trade-off is that you lose foreign key constraints across databases. The user_id column in analytics_events can’t have a real foreign key to the users table. You’ll need to enforce referential integrity at the application level instead. For append-only analytics data, this is usually an acceptable trade-off.

Converting an Existing Table

If you already have a large events table in PostgreSQL and want to convert it to a hypertable, TimescaleDB can do this, but you need to be careful about downtime and data migration.

The simplest path for small-to-medium tables (under 10 million rows):

class ConvertEventsToHypertable < ActiveRecord::Migration[7.2]
  def up
    # Remove the primary key if it exists
    execute "ALTER TABLE analytics_events DROP CONSTRAINT IF EXISTS analytics_events_pkey;"

    # Convert to hypertable with data migration
    execute <<-SQL
      SELECT create_hypertable(
        'analytics_events',
        'occurred_at',
        migrate_data => TRUE,
        chunk_time_interval => INTERVAL '1 day'
      );
    SQL
  end

  def down
    # There is no clean way to revert a hypertable to a regular table
    raise ActiveRecord::IrreversibleMigration
  end
end

For larger tables, migrate_data => TRUE can take a long time and will lock the table. The alternative is to create a new hypertable, backfill data in batches, then swap:

class MigrateEventsToHypertable < ActiveRecord::Migration[7.2]
  def up
    # Create new hypertable
    create_table(:analytics_events_new, id: false, hypertable: {
      time_column: 'occurred_at',
      chunk_time_interval: '1 day'
    }) do |t|
      t.timestamptz :occurred_at, null: false
      t.bigint :user_id, null: false
      t.string :event_type, null: false
      t.jsonb :properties, default: {}
    end

    # Backfill in batches
    execute <<-SQL
      INSERT INTO analytics_events_new
      SELECT occurred_at, user_id, event_type, properties
      FROM analytics_events
      ORDER BY occurred_at;
    SQL

    # Swap tables
    rename_table :analytics_events, :analytics_events_old
    rename_table :analytics_events_new, :analytics_events
  end

  def down
    rename_table :analytics_events, :analytics_events_new
    rename_table :analytics_events_old, :analytics_events
    drop_table :analytics_events_new
  end
end

The backfill approach takes longer overall but doesn’t hold locks for the duration. You’ll need to handle any events written during the migration window, which typically means a brief maintenance window or a dual-write strategy.

Production Monitoring

Once TimescaleDB is running in production, you’ll want visibility into how it’s performing. A few queries worth running periodically or wiring into your monitoring:

# Check hypertable sizes
ActiveRecord::Base.connection.execute(
  "SELECT hypertable_name, pg_size_pretty(hypertable_size(format('%I', hypertable_name)::regclass)) AS size
   FROM timescaledb_information.hypertables;"
).to_a

# Check chunk compression status
ActiveRecord::Base.connection.execute(
  "SELECT chunk_name,
          pg_size_pretty(before_compression_total_bytes) AS before,
          pg_size_pretty(after_compression_total_bytes) AS after
   FROM chunk_compression_stats('analytics_events')
   ORDER BY chunk_name DESC
   LIMIT 10;"
).to_a

# Check running background jobs
ActiveRecord::Base.connection.execute(
  "SELECT * FROM timescaledb_information.jobs
   WHERE hypertable_name = 'analytics_events';"
).to_a

Wire these into your existing monitoring. If compression jobs start failing or chunks start growing unexpectedly, you want to know before disk space becomes a problem.

What I’d Do Differently

Having implemented TimescaleDB in Rails applications across analytics, IoT, and audit logging, a few lessons stand out.

First, start with compression from day one. I’ve seen teams skip it during initial development and then struggle to retrofit it later. Declaring compression settings in your initial migration costs nothing and saves significant operational pain later.

Second, be deliberate about chunk intervals. The default of seven days is often too large for high-volume tables. I’ve found one day to be a better starting point for most Rails applications, with the option to adjust based on actual data volume.

Third, don’t fight ActiveRecord. Accept that some queries will involve raw SQL. The time_bucket function, compression management, and continuous aggregate queries don’t map cleanly to ActiveRecord’s query builder, and trying to force them through scopes and where clauses just produces confusing code. Write the SQL, put it in a scope, and move on.

Finally, keep hypertable concerns isolated. Whether through a separate database, an abstract base class, or simply careful naming conventions, make it obvious which parts of your system are using time-series patterns. The ActiveRecord conventions that work for regular CRUD tables (updates, deletes, point lookups by id) will mislead developers who don’t know they’re working with a hypertable.

Limitations and Trade-offs

Every decision in this guide has a cost. Before adopting TimescaleDB, be aware of these:

  • Compressed chunks are read-only. You cannot update or delete individual rows without decompressing first. This means your application must treat hypertable data as immutable, or accept the overhead of decompress-modify-recompress cycles.
  • No cross-database foreign keys. If you use the separate database approach, you lose referential integrity between TimescaleDB tables and your primary database. Application-level enforcement is less reliable than database constraints.
  • Schema dumps are fragile. Neither schema.rb nor structure.sql perfectly captures hypertable state. Every new developer and CI environment needs extra setup to work correctly.
  • ActiveRecord friction. time_bucket, compression management, and continuous aggregate queries require raw SQL. Teams uncomfortable dropping below ActiveRecord’s abstraction will find this frustrating.
  • Operational complexity. Compression jobs, retention policies, and continuous aggregate refreshes all run as background jobs inside the database. When they fail, debugging requires TimescaleDB-specific knowledge that most Rails developers don’t have.
  • Not worth it for small tables. If your table has fewer than 10 million rows and grows slowly, plain PostgreSQL with proper indexes will perform just as well with none of the added complexity.

For a deeper discussion of when TimescaleDB is the wrong choice, see the previous post’s section on bad fits.

TimescaleDB is one of those tools that rewards deliberate adoption. When applied to the right problem with the right expectations, it just works. When applied broadly or carelessly, it creates a new class of problems. This guide gives you the mechanics. The previous post gives you the judgment to apply them well.


Need help with database architecture? I help teams with PostgreSQL optimization, schema design, and scaling decisions. If you’re implementing TimescaleDB or evaluating whether it’s the right fit, reach out at nikita.sinenko@gmail.com.

Further Reading