technology

AI Coding Assistants: Honest Productivity Data and Limits

- 15 min read

Measured productivity gains from AI coding assistants across real projects. Specific workflows, prompt patterns, failure modes, and when AI slows you down.

Comparison of AI coding tools showing productivity gains for software development teams

AI coding assistants cut my typical feature development time by roughly 50%. Not on everything - some tasks see 3x speedup, others see zero improvement or even slowdown. The difference depends entirely on what kind of work you’re doing.

This post breaks down where AI tools actually help, where they fail, and how to tell the difference - with specific examples from production projects.

Task Type AI Speed Gain AI Quality Review Effort Net Benefit
Boilerplate/CRUD 3-4x faster High Low Large
Test writing 2-3x faster Medium-high Medium Large
Documentation 3-5x faster High Low Large
Bug investigation 2x faster Medium Medium Medium
Refactoring known patterns 2-3x faster Medium Medium Medium
Complex business logic 1-1.5x faster Low-medium High Small
System architecture No speedup Low High Negative
Debugging race conditions No speedup Low High Negative

The biggest mistake I see developers make is assuming AI gives a uniform speedup. It doesn’t. The gain is task-dependent, and knowing which tasks to delegate to AI versus do yourself is the actual skill.

Where AI Tools Deliver Real Gains

Boilerplate generation, test writing, and documentation account for 60-70% of AI productivity gains. These tasks share a pattern: well-defined inputs, predictable outputs, and low ambiguity. AI handles them reliably.

Here’s what AI-assisted development looks like in practice for a typical Rails feature:

Example: Building a Reporting Endpoint

Without AI, this is roughly a 5-hour task. With AI, it’s about 2 hours - but the time savings come from specific subtasks, not from AI doing everything.

What AI handles well (saved roughly 3 hours):

# AI generates the migration, model, controller, serializer,
# request specs, and API documentation from a single prompt:
#
# "Create a monthly_reports endpoint that aggregates orders
# by status for a given date range. Include pagination,
# date validation, and JSON:API serialization."

# The generated migration is usually correct:
class CreateMonthlyReports < ActiveRecord::Migration[7.1]
  def change
    create_table :monthly_reports do |t|
      t.date :report_month, null: false
      t.string :status, null: false
      t.integer :order_count, default: 0
      t.decimal :total_revenue, precision: 12, scale: 2, default: 0
      t.references :organization, null: false, foreign_key: true

      t.timestamps
    end

    add_index :monthly_reports, [:organization_id, :report_month, :status],
              unique: true,
              name: "idx_monthly_reports_org_month_status"
  end
end

What AI gets wrong (required 30+ minutes of manual fixes):

# AI generated this aggregation query:
def self.generate_for_month(organization, month)
  orders = organization.orders.where(created_at: month.all_month)
  # BUG: .all_month doesn't exist in Ruby.
  # AI hallucinated this method. The correct call is:
  #   month.beginning_of_month..month.end_of_month
  # or: month.all_month if using ActiveSupport's Range
  # Except .all_month is not a real method either way.

  orders.group(:status).select(
    "status, COUNT(*) as order_count, SUM(total) as total_revenue"
  )
  # BUG: assumes column is named "total" - in our schema it's
  # "amount_cents". AI doesn't know your column names unless
  # you feed it the schema.
end

This is the pattern I see on every AI-assisted task: the scaffolding is excellent, but the business-specific details are wrong. You save time on structure and lose some on debugging AI’s assumptions.

Example: Writing Request Specs

Test generation is where AI provides the most consistent value. Hand it a controller and it produces comprehensive specs faster than I can write them.

# Prompt: "Write request specs for this ReportsController.
# Cover happy path, authorization, pagination, invalid dates,
# and edge cases. Use FactoryBot and follow RSpec conventions."

# AI generates 15-20 test cases in seconds. Most are correct.
# The ones that need fixing:

RSpec.describe "GET /api/v1/reports", type: :request do
  # AI wrote this - works fine
  it "returns paginated reports for the organization" do
    create_list(:monthly_report, 30, organization: organization)
    get "/api/v1/reports", headers: auth_headers, params: { page: 2, per: 10 }
    expect(response).to have_http_status(:ok)
    expect(json_body["data"].size).to eq(10)
  end

  # AI wrote this - WRONG. It assumed a 403, but our app
  # returns 404 for resources outside your organization
  # (to avoid leaking existence of other orgs' data).
  it "rejects access to other organization reports" do
    other_report = create(:monthly_report)
    get "/api/v1/reports/#{other_report.id}", headers: auth_headers
    expect(response).to have_http_status(:forbidden) # Should be :not_found
  end

  # AI missed this edge case entirely - what happens when
  # the date range spans a timezone boundary? Our app stores
  # dates in UTC but users are in Dubai (UTC+4).
end

I typically accept 70-80% of AI-generated tests as-is, fix 15-20%, and add 2-3 edge cases the AI missed. The total time is still less than half of writing everything by hand.

The Tools and When to Use Each

Four tools dominate my workflow. Each has a specific role, and using the wrong one for the wrong task wastes time.

Tool Best For Worst For Monthly Cost
Claude Complex reasoning, large refactors, explaining tradeoffs Quick inline completions $20
GitHub Copilot Real-time autocomplete, learning new APIs Multi-file refactoring $10-20
Cursor Codebase-aware edits, multi-file changes Quick one-off questions $20
ChatGPT Fast debugging, code explanation Deep context retention $20

Claude

I use Claude for anything requiring reasoning across multiple files or understanding tradeoffs. When I need to refactor a service object that touches five models and three controllers, Claude handles the full context better than alternatives.

Where it excels: Architecture discussions, explaining why code works a certain way, generating complete implementations from detailed specs.

Where it falls short: Slower than Copilot for quick inline completions. Sometimes over-engineers simple solutions.

GitHub Copilot

Copilot’s inline completion is hard to beat for flow state. It predicts the next line based on context and gets it right often enough that typing feels 2x faster.

Where it excels: Autocompleting patterns you’ve already established in the file. Learning unfamiliar APIs by suggesting correct usage.

Where it falls short: Limited reasoning. Can’t explain why it suggests something. Repeats patterns even when they’re wrong.

Cursor

Cursor is the newest in my toolkit and handles codebase-aware edits - tell it what to change and it modifies the right files. Useful for cross-cutting refactors.

Where it excels: Multi-file edits, finding relevant code across the project, applying consistent changes.

Where it falls short: Newer ecosystem, occasionally applies changes to the wrong files. Requires learning its specific workflow.

ChatGPT

My go-to for quick questions and debugging. Paste an error, get an explanation. Fast and reliable for simple problems.

Where it excels: Speed. Error diagnosis. Explaining unfamiliar code.

Where it falls short: Loses context in long conversations. Less code-aware than Claude or Cursor.

The total monthly cost for all four is about $70. The productivity gains justify the cost many times over, but you don’t need all four - Claude plus Copilot covers 80% of use cases.

Measured Impact on Project Timelines

AI tools compress development timelines unevenly. Routine work shrinks dramatically; complex work barely changes.

Feature Development: SaaS Reporting Dashboard

Without AI: 3 weeks estimated, 15 working days.

  • Data modeling and migrations: 2 days
  • API endpoints and serializers: 3 days
  • Business logic and aggregations: 4 days
  • Test suite: 3 days
  • Documentation and code review: 3 days

With AI: 8 working days.

  • Data modeling and migrations: 0.5 days (AI-generated, minor fixes)
  • API endpoints and serializers: 0.5 days (AI-generated, review needed)
  • Business logic and aggregations: 3.5 days (mostly manual, AI helped with SQL)
  • Test suite: 1.5 days (AI-generated bulk, manual edge cases)
  • Documentation and code review: 2 days (AI-generated docs, but review took longer because I had to verify AI-written code more carefully)

Notice where the time savings actually came from: boilerplate and tests. The business logic barely changed. And code review actually took longer - AI-generated code needs more careful review than human-written code because the failure modes are different. A human developer writes bugs you expect. AI writes bugs that look correct.

API Integration: Third-Party Payment Provider

Without AI: 2 days. With AI: 4 hours.

This is the ideal AI use case - well-documented APIs with standard patterns. I described the integration requirements, AI generated the client, error handling, webhook processing, and tests. First deployment worked correctly.

The reason this worked so well: payment APIs are heavily represented in AI training data. The AI had seen thousands of Stripe/payment integrations and knew the patterns cold.

Legacy Codebase Debugging

Without AI: Estimated 2-3 days of investigation. With AI: 2 hours to identify and fix.

I pasted a slow controller action into Claude and asked “why is this slow?” It identified three N+1 queries, a missing index, and an unnecessary eager load in about 10 minutes. The fix itself took another hour to implement and test.

This works because database optimization patterns are well-established - AI has seen thousands of examples in its training data and recognizes them instantly.

Where AI Tools Fail - Specific Failure Modes

AI coding tools fail predictably on four categories: hallucinated APIs, project-specific conventions, stateful debugging, and architectural decisions. Understanding these failure modes is more valuable than knowing where AI succeeds, because the failures waste time.

Failure Mode 1: Hallucinated Methods and APIs

AI invents methods that look plausible but don’t exist. This is the most common failure and the hardest to catch because the code reads naturally.

# AI suggested this for a Rails 7.1 app:
class Order < ApplicationRecord
  encrypts :credit_card_number, deterministic: true

  scope :recent, -> { where(created_at: 1.week.ago..) }

  def self.revenue_summary
    # .summarize doesn't exist on ActiveRecord::Relation.
    # AI hallucinated a method that sounds like it should exist.
    select(:status).group(:status).summarize(:total_cents, :avg)
  end
end

# The correct version:
def self.revenue_summary
  select("status, AVG(total_cents) as avg_total, COUNT(*) as count")
    .group(:status)
end

I now keep a mental list of “things AI likes to hallucinate” for each framework I use. For Rails, it frequently invents scoping methods, configuration options, and ActiveSupport extensions.

Failure Mode 2: Ignoring Project Conventions

AI doesn’t know your team’s patterns unless you tell it. It generates code that works but doesn’t match your codebase.

# Your codebase uses service objects with a .call convention:
class CreateOrder
  def self.call(params)
    new(params).call
  end

  def call
    # ...
  end
end

# AI generates a completely different pattern:
class OrderCreationService
  def initialize(params)
    @params = params
  end

  def execute  # Not .call
    # ... uses different error handling pattern ...
  end
end

This isn’t wrong - it’s just inconsistent. And inconsistent code creates maintenance burden over time. You either refactor the AI output to match your patterns (eating into the time savings) or you end up with a codebase that has two different styles.

Failure Mode 3: Stateful Debugging

AI can’t reproduce or observe runtime behavior. When a bug depends on state - race conditions, order of operations, caching issues - AI can only guess.

I’ve wasted hours on prompts like “this test passes individually but fails when run with the full suite” where AI suggests increasingly wrong fixes because it can’t observe the actual test execution. For stateful bugs, traditional debugging tools (byebug, pry, logging) are faster.

Failure Mode 4: Architecture Decisions

AI can generate any architecture you describe, but it can’t tell you which one is right for your situation. Ask it “should I use microservices or a monolith?” and it gives a balanced answer that doesn’t help you decide.

Architecture requires understanding your team size, deployment constraints, traffic patterns, and organizational structure. AI has none of this context.

The Economics for Development Teams

AI tools cut effective development costs on routine work by 50-60%. The overall project cost reduction is smaller - typically 30-40% - because complex work (where AI helps least) makes up a significant portion of any real project.

Cost Breakdown

  Without AI With AI Savings
Tool cost per developer/month $0 $40-70 -$70
Routine work hours (per sprint) 40 hrs 15 hrs 62%
Complex work hours (per sprint) 40 hrs 35 hrs 12%
Code review hours (per sprint) 10 hrs 14 hrs -40%
Total effective hours 90 hrs 64 hrs 29%

The increase in code review hours is real and often ignored in AI productivity claims. AI-generated code needs more careful review because:

  1. It looks correct even when it isn’t
  2. It follows generic patterns that may not fit your use case
  3. It can introduce subtle security issues (like the deterministic encryption example above - using deterministic encryption on a credit card number is a security problem, but the code compiles fine)

When the Economics Don’t Work

AI tools provide negative ROI when:

  • The codebase is highly unconventional - AI can’t learn your patterns mid-session
  • The domain is niche - If AI hasn’t seen many examples of your problem space, its suggestions are worse than useless
  • Regulatory review is required - If every line of code needs compliance review, AI-generated code doubles the review burden
  • The team is very senior - Senior developers writing in their area of expertise often code as fast as they can review AI output, so the net gain is minimal

How to Actually Use AI Tools Effectively

The difference between developers who get 3x gains and those who get minimal benefit comes down to three practices:

1. Feed It Context

Don’t just describe what you want. Give AI your schema, your existing patterns, and your constraints.

Bad prompt:  "Create a user registration endpoint"

Good prompt: "Create a user registration endpoint for this Rails 7.1 API.
Here's the User model [paste]. Here's how our other endpoints
look [paste controller]. We use Pundit for authorization,
Blueprinter for serialization, and raise custom ApiError
exceptions. Include request specs using our test helpers
[paste spec_helper excerpt]."

The good prompt takes 2 minutes longer to write and saves 30 minutes of fixing output.

2. Use AI for the Right Tasks

Delegate boilerplate, tests, documentation, and well-known patterns. Do architecture, business logic, and debugging yourself.

3. Review AI Output Like You Would a Junior Developer’s Pull Request

Don’t rubber-stamp AI code. Read every line. Question every assumption. Check that method names actually exist. Verify business logic matches requirements.

Limitations and When NOT to Use AI Coding Tools

AI tools are not appropriate for every situation, and overselling their capabilities leads to worse outcomes:

Don’t use AI for security-critical code. AI-generated authentication, encryption, and authorization logic should be treated as a starting draft that requires expert review, not production-ready code.

Don’t use AI as a substitute for understanding. If you don’t understand the code AI generates, you can’t maintain it, debug it, or extend it. Copy-pasting AI output without comprehension creates technical debt faster than writing bad code manually.

Don’t expect uniform gains across a team. Senior developers get more value from AI because they can evaluate output quality. Junior developers using AI tools may ship faster but accumulate hidden bugs and inconsistencies.

Don’t use AI for compliance-heavy work without adjusting your review process. AI doesn’t understand GDPR, PCI-DSS, or HIPAA requirements - it generates code that might violate them while looking correct.

The Bottom Line

AI coding tools are a genuine productivity multiplier for experienced developers working on routine tasks. The gains are real - I ship features measurably faster than I did two years ago. But the gains are uneven, the failure modes are specific, and the tools require skill to use effectively.

The developers who benefit most are those who understand where AI helps and where it doesn’t, and who adjust their workflow accordingly rather than trying to AI-ify everything.


Need help integrating AI tools into your development workflow? I help teams adopt AI-assisted development practices, optimize productivity, and ship products faster. Reach out at nikita.sinenko@gmail.com.

Further Reading