AI Coding Assistants: Honest Productivity Data and Limits
Measured productivity gains from AI coding assistants across real projects. Specific workflows, prompt patterns, failure modes, and when AI slows you down.
AI coding assistants cut my typical feature development time by roughly 50%. Not on everything - some tasks see 3x speedup, others see zero improvement or even slowdown. The difference depends entirely on what kind of work you’re doing.
This post breaks down where AI tools actually help, where they fail, and how to tell the difference - with specific examples from production projects.
| Task Type | AI Speed Gain | AI Quality | Review Effort | Net Benefit |
|---|---|---|---|---|
| Boilerplate/CRUD | 3-4x faster | High | Low | Large |
| Test writing | 2-3x faster | Medium-high | Medium | Large |
| Documentation | 3-5x faster | High | Low | Large |
| Bug investigation | 2x faster | Medium | Medium | Medium |
| Refactoring known patterns | 2-3x faster | Medium | Medium | Medium |
| Complex business logic | 1-1.5x faster | Low-medium | High | Small |
| System architecture | No speedup | Low | High | Negative |
| Debugging race conditions | No speedup | Low | High | Negative |
The biggest mistake I see developers make is assuming AI gives a uniform speedup. It doesn’t. The gain is task-dependent, and knowing which tasks to delegate to AI versus do yourself is the actual skill.
Where AI Tools Deliver Real Gains
Boilerplate generation, test writing, and documentation account for 60-70% of AI productivity gains. These tasks share a pattern: well-defined inputs, predictable outputs, and low ambiguity. AI handles them reliably.
Here’s what AI-assisted development looks like in practice for a typical Rails feature:
Example: Building a Reporting Endpoint
Without AI, this is roughly a 5-hour task. With AI, it’s about 2 hours - but the time savings come from specific subtasks, not from AI doing everything.
What AI handles well (saved roughly 3 hours):
# AI generates the migration, model, controller, serializer,
# request specs, and API documentation from a single prompt:
#
# "Create a monthly_reports endpoint that aggregates orders
# by status for a given date range. Include pagination,
# date validation, and JSON:API serialization."
# The generated migration is usually correct:
class CreateMonthlyReports < ActiveRecord::Migration[7.1]
def change
create_table :monthly_reports do |t|
t.date :report_month, null: false
t.string :status, null: false
t.integer :order_count, default: 0
t.decimal :total_revenue, precision: 12, scale: 2, default: 0
t.references :organization, null: false, foreign_key: true
t.timestamps
end
add_index :monthly_reports, [:organization_id, :report_month, :status],
unique: true,
name: "idx_monthly_reports_org_month_status"
end
end
What AI gets wrong (required 30+ minutes of manual fixes):
# AI generated this aggregation query:
def self.generate_for_month(organization, month)
orders = organization.orders.where(created_at: month.all_month)
# BUG: .all_month doesn't exist in Ruby.
# AI hallucinated this method. The correct call is:
# month.beginning_of_month..month.end_of_month
# or: month.all_month if using ActiveSupport's Range
# Except .all_month is not a real method either way.
orders.group(:status).select(
"status, COUNT(*) as order_count, SUM(total) as total_revenue"
)
# BUG: assumes column is named "total" - in our schema it's
# "amount_cents". AI doesn't know your column names unless
# you feed it the schema.
end
This is the pattern I see on every AI-assisted task: the scaffolding is excellent, but the business-specific details are wrong. You save time on structure and lose some on debugging AI’s assumptions.
Example: Writing Request Specs
Test generation is where AI provides the most consistent value. Hand it a controller and it produces comprehensive specs faster than I can write them.
# Prompt: "Write request specs for this ReportsController.
# Cover happy path, authorization, pagination, invalid dates,
# and edge cases. Use FactoryBot and follow RSpec conventions."
# AI generates 15-20 test cases in seconds. Most are correct.
# The ones that need fixing:
RSpec.describe "GET /api/v1/reports", type: :request do
# AI wrote this - works fine
it "returns paginated reports for the organization" do
create_list(:monthly_report, 30, organization: organization)
get "/api/v1/reports", headers: auth_headers, params: { page: 2, per: 10 }
expect(response).to have_http_status(:ok)
expect(json_body["data"].size).to eq(10)
end
# AI wrote this - WRONG. It assumed a 403, but our app
# returns 404 for resources outside your organization
# (to avoid leaking existence of other orgs' data).
it "rejects access to other organization reports" do
other_report = create(:monthly_report)
get "/api/v1/reports/#{other_report.id}", headers: auth_headers
expect(response).to have_http_status(:forbidden) # Should be :not_found
end
# AI missed this edge case entirely - what happens when
# the date range spans a timezone boundary? Our app stores
# dates in UTC but users are in Dubai (UTC+4).
end
I typically accept 70-80% of AI-generated tests as-is, fix 15-20%, and add 2-3 edge cases the AI missed. The total time is still less than half of writing everything by hand.
The Tools and When to Use Each
Four tools dominate my workflow. Each has a specific role, and using the wrong one for the wrong task wastes time.
| Tool | Best For | Worst For | Monthly Cost |
|---|---|---|---|
| Claude | Complex reasoning, large refactors, explaining tradeoffs | Quick inline completions | $20 |
| GitHub Copilot | Real-time autocomplete, learning new APIs | Multi-file refactoring | $10-20 |
| Cursor | Codebase-aware edits, multi-file changes | Quick one-off questions | $20 |
| ChatGPT | Fast debugging, code explanation | Deep context retention | $20 |
Claude
I use Claude for anything requiring reasoning across multiple files or understanding tradeoffs. When I need to refactor a service object that touches five models and three controllers, Claude handles the full context better than alternatives.
Where it excels: Architecture discussions, explaining why code works a certain way, generating complete implementations from detailed specs.
Where it falls short: Slower than Copilot for quick inline completions. Sometimes over-engineers simple solutions.
GitHub Copilot
Copilot’s inline completion is hard to beat for flow state. It predicts the next line based on context and gets it right often enough that typing feels 2x faster.
Where it excels: Autocompleting patterns you’ve already established in the file. Learning unfamiliar APIs by suggesting correct usage.
Where it falls short: Limited reasoning. Can’t explain why it suggests something. Repeats patterns even when they’re wrong.
Cursor
Cursor is the newest in my toolkit and handles codebase-aware edits - tell it what to change and it modifies the right files. Useful for cross-cutting refactors.
Where it excels: Multi-file edits, finding relevant code across the project, applying consistent changes.
Where it falls short: Newer ecosystem, occasionally applies changes to the wrong files. Requires learning its specific workflow.
ChatGPT
My go-to for quick questions and debugging. Paste an error, get an explanation. Fast and reliable for simple problems.
Where it excels: Speed. Error diagnosis. Explaining unfamiliar code.
Where it falls short: Loses context in long conversations. Less code-aware than Claude or Cursor.
The total monthly cost for all four is about $70. The productivity gains justify the cost many times over, but you don’t need all four - Claude plus Copilot covers 80% of use cases.
Measured Impact on Project Timelines
AI tools compress development timelines unevenly. Routine work shrinks dramatically; complex work barely changes.
Feature Development: SaaS Reporting Dashboard
Without AI: 3 weeks estimated, 15 working days.
- Data modeling and migrations: 2 days
- API endpoints and serializers: 3 days
- Business logic and aggregations: 4 days
- Test suite: 3 days
- Documentation and code review: 3 days
With AI: 8 working days.
- Data modeling and migrations: 0.5 days (AI-generated, minor fixes)
- API endpoints and serializers: 0.5 days (AI-generated, review needed)
- Business logic and aggregations: 3.5 days (mostly manual, AI helped with SQL)
- Test suite: 1.5 days (AI-generated bulk, manual edge cases)
- Documentation and code review: 2 days (AI-generated docs, but review took longer because I had to verify AI-written code more carefully)
Notice where the time savings actually came from: boilerplate and tests. The business logic barely changed. And code review actually took longer - AI-generated code needs more careful review than human-written code because the failure modes are different. A human developer writes bugs you expect. AI writes bugs that look correct.
API Integration: Third-Party Payment Provider
Without AI: 2 days. With AI: 4 hours.
This is the ideal AI use case - well-documented APIs with standard patterns. I described the integration requirements, AI generated the client, error handling, webhook processing, and tests. First deployment worked correctly.
The reason this worked so well: payment APIs are heavily represented in AI training data. The AI had seen thousands of Stripe/payment integrations and knew the patterns cold.
Legacy Codebase Debugging
Without AI: Estimated 2-3 days of investigation. With AI: 2 hours to identify and fix.
I pasted a slow controller action into Claude and asked “why is this slow?” It identified three N+1 queries, a missing index, and an unnecessary eager load in about 10 minutes. The fix itself took another hour to implement and test.
This works because database optimization patterns are well-established - AI has seen thousands of examples in its training data and recognizes them instantly.
Where AI Tools Fail - Specific Failure Modes
AI coding tools fail predictably on four categories: hallucinated APIs, project-specific conventions, stateful debugging, and architectural decisions. Understanding these failure modes is more valuable than knowing where AI succeeds, because the failures waste time.
Failure Mode 1: Hallucinated Methods and APIs
AI invents methods that look plausible but don’t exist. This is the most common failure and the hardest to catch because the code reads naturally.
# AI suggested this for a Rails 7.1 app:
class Order < ApplicationRecord
encrypts :credit_card_number, deterministic: true
scope :recent, -> { where(created_at: 1.week.ago..) }
def self.revenue_summary
# .summarize doesn't exist on ActiveRecord::Relation.
# AI hallucinated a method that sounds like it should exist.
select(:status).group(:status).summarize(:total_cents, :avg)
end
end
# The correct version:
def self.revenue_summary
select("status, AVG(total_cents) as avg_total, COUNT(*) as count")
.group(:status)
end
I now keep a mental list of “things AI likes to hallucinate” for each framework I use. For Rails, it frequently invents scoping methods, configuration options, and ActiveSupport extensions.
Failure Mode 2: Ignoring Project Conventions
AI doesn’t know your team’s patterns unless you tell it. It generates code that works but doesn’t match your codebase.
# Your codebase uses service objects with a .call convention:
class CreateOrder
def self.call(params)
new(params).call
end
def call
# ...
end
end
# AI generates a completely different pattern:
class OrderCreationService
def initialize(params)
@params = params
end
def execute # Not .call
# ... uses different error handling pattern ...
end
end
This isn’t wrong - it’s just inconsistent. And inconsistent code creates maintenance burden over time. You either refactor the AI output to match your patterns (eating into the time savings) or you end up with a codebase that has two different styles.
Failure Mode 3: Stateful Debugging
AI can’t reproduce or observe runtime behavior. When a bug depends on state - race conditions, order of operations, caching issues - AI can only guess.
I’ve wasted hours on prompts like “this test passes individually but fails when run with the full suite” where AI suggests increasingly wrong fixes because it can’t observe the actual test execution. For stateful bugs, traditional debugging tools (byebug, pry, logging) are faster.
Failure Mode 4: Architecture Decisions
AI can generate any architecture you describe, but it can’t tell you which one is right for your situation. Ask it “should I use microservices or a monolith?” and it gives a balanced answer that doesn’t help you decide.
Architecture requires understanding your team size, deployment constraints, traffic patterns, and organizational structure. AI has none of this context.
The Economics for Development Teams
AI tools cut effective development costs on routine work by 50-60%. The overall project cost reduction is smaller - typically 30-40% - because complex work (where AI helps least) makes up a significant portion of any real project.
Cost Breakdown
| Without AI | With AI | Savings | |
|---|---|---|---|
| Tool cost per developer/month | $0 | $40-70 | -$70 |
| Routine work hours (per sprint) | 40 hrs | 15 hrs | 62% |
| Complex work hours (per sprint) | 40 hrs | 35 hrs | 12% |
| Code review hours (per sprint) | 10 hrs | 14 hrs | -40% |
| Total effective hours | 90 hrs | 64 hrs | 29% |
The increase in code review hours is real and often ignored in AI productivity claims. AI-generated code needs more careful review because:
- It looks correct even when it isn’t
- It follows generic patterns that may not fit your use case
- It can introduce subtle security issues (like the deterministic encryption example above - using deterministic encryption on a credit card number is a security problem, but the code compiles fine)
When the Economics Don’t Work
AI tools provide negative ROI when:
- The codebase is highly unconventional - AI can’t learn your patterns mid-session
- The domain is niche - If AI hasn’t seen many examples of your problem space, its suggestions are worse than useless
- Regulatory review is required - If every line of code needs compliance review, AI-generated code doubles the review burden
- The team is very senior - Senior developers writing in their area of expertise often code as fast as they can review AI output, so the net gain is minimal
How to Actually Use AI Tools Effectively
The difference between developers who get 3x gains and those who get minimal benefit comes down to three practices:
1. Feed It Context
Don’t just describe what you want. Give AI your schema, your existing patterns, and your constraints.
Bad prompt: "Create a user registration endpoint"
Good prompt: "Create a user registration endpoint for this Rails 7.1 API.
Here's the User model [paste]. Here's how our other endpoints
look [paste controller]. We use Pundit for authorization,
Blueprinter for serialization, and raise custom ApiError
exceptions. Include request specs using our test helpers
[paste spec_helper excerpt]."
The good prompt takes 2 minutes longer to write and saves 30 minutes of fixing output.
2. Use AI for the Right Tasks
Delegate boilerplate, tests, documentation, and well-known patterns. Do architecture, business logic, and debugging yourself.
3. Review AI Output Like You Would a Junior Developer’s Pull Request
Don’t rubber-stamp AI code. Read every line. Question every assumption. Check that method names actually exist. Verify business logic matches requirements.
Limitations and When NOT to Use AI Coding Tools
AI tools are not appropriate for every situation, and overselling their capabilities leads to worse outcomes:
Don’t use AI for security-critical code. AI-generated authentication, encryption, and authorization logic should be treated as a starting draft that requires expert review, not production-ready code.
Don’t use AI as a substitute for understanding. If you don’t understand the code AI generates, you can’t maintain it, debug it, or extend it. Copy-pasting AI output without comprehension creates technical debt faster than writing bad code manually.
Don’t expect uniform gains across a team. Senior developers get more value from AI because they can evaluate output quality. Junior developers using AI tools may ship faster but accumulate hidden bugs and inconsistencies.
Don’t use AI for compliance-heavy work without adjusting your review process. AI doesn’t understand GDPR, PCI-DSS, or HIPAA requirements - it generates code that might violate them while looking correct.
The Bottom Line
AI coding tools are a genuine productivity multiplier for experienced developers working on routine tasks. The gains are real - I ship features measurably faster than I did two years ago. But the gains are uneven, the failure modes are specific, and the tools require skill to use effectively.
The developers who benefit most are those who understand where AI helps and where it doesn’t, and who adjust their workflow accordingly rather than trying to AI-ify everything.
Need help integrating AI tools into your development workflow? I help teams adopt AI-assisted development practices, optimize productivity, and ship products faster. Reach out at nikita.sinenko@gmail.com.