The Strangler Fig Pattern: How to Modernize Legacy Systems Without Downtime
The Strangler Fig Pattern: How to Modernize Legacy Systems Without Downtime
A Fortune 500 fintech came to us with a problem that sounds familiar: a monolithic payment processing system built 12 years ago, handling $4.2M in quarterly downtime costs, and an engineering team that was afraid to deploy.
The CEO wanted a rewrite. We talked them out of it.
Why Big-Bang Rewrites Fail
The industry data is clear: approximately 70% of large-scale system rewrites fail to deliver on their original scope, budget, or timeline. The reasons are predictable:
-
You're building two systems at once. The legacy system still needs maintenance while the new one is under construction. Engineering capacity is split, and neither system gets the attention it needs.
-
Requirements drift. A rewrite that takes 18 months means 18 months of business evolution that the new system hasn't accounted for.
-
The "last mile" problem. The first 80% of a rewrite is relatively straightforward. The last 20% — the edge cases, integrations, and undocumented business logic buried in legacy code — takes as long as everything else combined.
The Strangler Fig Approach
Named after the tropical fig vine that gradually envelops and replaces its host tree, the strangler fig pattern works by:
- Placing a facade (API gateway or reverse proxy) in front of the legacy system
- Building new components behind the facade, one at a time
- Routing traffic to new components as they're ready
- Decommissioning legacy components once their replacements are proven
The key insight: at every point during the migration, the system is fully functional. There's no "cutover day" where everything has to work perfectly on the first try.
How We Applied It: The Fintech Case Study
Phase 1: The Facade (Week 1-2)
We deployed an API gateway (Kong) in front of the monolith. Initially, it did nothing — 100% of traffic passed through to the legacy system. But it gave us:
- Request/response logging for every API call
- Traffic analysis to identify the highest-value migration targets
- A routing layer we could configure without redeploying anything
Phase 2: Event Extraction (Week 3-4)
Before decomposing the monolith, we needed to understand data flows. We introduced Kafka as an event bus and began publishing domain events from the monolith:
payment.initiatedpayment.processedpayment.failedfraud.detected
The monolith still handled everything — but now other services could listen to what was happening.
Phase 3: Service Extraction (Week 5-12)
We extracted services in order of business impact:
-
Fraud detection — moved to a standalone service with ML-based anomaly detection. High value, relatively isolated, and the legacy implementation was the single biggest source of false positives.
-
Payment routing — the logic for selecting payment processors based on merchant, currency, and transaction type. Complex business rules that were buried in if-else chains spanning thousands of lines.
-
Reconciliation — automated matching of transaction records across systems. Previously a manual process that required a dedicated team.
Each extraction followed the same pattern:
- Build the new service
- Run it in shadow mode (processes real data, but results aren't used)
- Compare outputs with the legacy system
- Gradually shift traffic (10% → 25% → 50% → 100%)
- Decommission the legacy component
Results
After 16 weeks:
- 99.99% uptime (from ~97%)
- 340ms average response time (from ~1.2s)
- 62% infrastructure cost reduction
- Zero production incidents during migration
When Strangler Fig Doesn't Work
This pattern isn't universal. It's a poor fit when:
-
The legacy system has no API layer. If you can't intercept and route requests, you can't strangle incrementally. (Solution: build the facade first, even if it means modifying the legacy system.)
-
Data schemas are deeply coupled. If every component reads and writes from the same database tables with tight coupling, you need a data migration strategy alongside the service extraction.
-
The team lacks operational maturity. Running two architectures simultaneously requires solid monitoring, deployment automation, and incident response. If the team can't manage one system reliably, managing two during a transition will be harder.
The Playbook
For teams considering this approach:
-
Start with the audit. You can't plan a migration without understanding what you have. Our 5-Day Technical Audit maps every dependency, data flow, and risk surface.
-
Choose your first extraction carefully. Pick something high-value but low-risk. A successful first extraction builds confidence and validates the approach.
-
Invest in observability before you start. You need to see what's happening in both systems, in real-time, before you start routing traffic.
-
Plan for rollback at every step. Every traffic shift should be reversible within minutes, not hours.
Dealing with a legacy system that's blocking your AI adoption or creating operational risk? Book a 5-Day Technical Audit — we'll map your architecture and build a sequenced migration roadmap.