Event-driven architecture adoption
A document that captures important architectural decisions and their context
ADR-001: Adoption of Event-Driven Architecture for FlowMart E-commerce Platform
Status
Approved (2024-07-15)
Context
FlowMart is building a new e-commerce platform to replace our legacy monolithic application. The current system faces several challenges:
- Scalability Issues: During peak shopping periods (e.g., Black Friday, holiday season), the system struggles to handle increased traffic, resulting in degraded performance and occasional outages.
- Maintenance Complexity: Adding new features or modifying existing ones requires extensive regression testing and often leads to unexpected side effects.
- Technology Constraints: The monolithic architecture limits our ability to adopt new technologies or update components independently.
- Data Consistency: Ensuring data consistency across different parts of the application has become increasingly difficult.
- Team Independence: Multiple development teams working on different aspects of the application frequently block each other.
We need an architecture that addresses these challenges while enabling rapid innovation and scaling to meet our projected growth over the next 3-5 years.
Decision
We will adopt an Event-Driven Architecture (EDA) using a microservices approach for the new FlowMart e-commerce platform. Specifically:
-
Domain-Driven Design (DDD): We will organize our services around business domains (Orders, Inventory, Payment, Shipping, etc.) with clearly defined bounded contexts.
-
Event Sourcing: Critical business transactions will be stored as a sequence of immutable events that can be used to reconstruct the system state at any point in time.
-
Command Query Responsibility Segregation (CQRS): We will separate read and write operations where appropriate to optimize for different performance and scaling requirements.
-
Apache Kafka will serve as our primary event streaming platform for asynchronous communication between services.
-
Eventual Consistency Model: We acknowledge that the system will prioritize availability and partition tolerance over immediate consistency (following the CAP theorem), with mechanisms to ensure eventual consistency.
-
Service Autonomy: Each service will:
- Have its own database
- Be independently deployable
- Have well-defined APIs and event contracts
- Be responsible for publishing domain events when state changes occur
-
Choreography Over Orchestration: Services will primarily react to events rather than being orchestrated by a central coordinator, though we will use orchestration for complex workflows when necessary.
Consequences
Positive
-
Improved Scalability: Individual services can scale independently based on demand, allowing us to allocate resources more efficiently.
-
Better Fault Isolation: Failures in one service are less likely to cascade across the entire system, improving overall reliability.
-
Technology Flexibility: Teams can choose the most appropriate technologies for their specific domains, allowing for incremental adoption of new technologies.
-
Team Autonomy: Domain-aligned teams can develop, test, and deploy their services independently, reducing cross-team dependencies.
-
Enhanced Auditability: Event sourcing provides a complete audit trail of all system changes, which is valuable for debugging, compliance, and business analytics.
-
Improved Extensibility: New capabilities can be added by creating new consumers of existing events without modifying the original producers.
Negative
-
Increased Complexity: Distributed systems are inherently more complex to develop, test, debug, and operate compared to monolithic applications.
-
Learning Curve: The team will need to learn new patterns, technologies, and operational practices, which may slow initial development.
-
Eventual Consistency Challenges: Business operations and UI design must account for data that might not be immediately consistent across services.
-
Operational Overhead: Managing multiple services, event streams, and databases requires more sophisticated monitoring, deployment, and operational tools.
-
Transaction Management: Ensuring transactional integrity across service boundaries requires careful design and implementation of compensation patterns.
-
Testing Complexity: End-to-end testing becomes more challenging, requiring new testing strategies and tools.
Compliance Requirements
Our implementation must adhere to the following requirements:
-
Data Privacy: Personal customer data must be handled in compliance with GDPR, CCPA, and other applicable regulations.
-
PCI DSS: Payment processing components must comply with Payment Card Industry Data Security Standards.
-
Audit Trail: All critical business transactions must be traceable and auditable for a minimum of 7 years.
-
Security: Authentication, authorization, and data encryption standards must be consistently applied across all services.
Implementation Details
Phase 1: Core Domain Decomposition (Q3 2024)
- Identify and define core domain boundaries
- Establish event schemas and contracts
- Implement Kafka infrastructure and operational tooling
- Migrate the first domain (Orders) to the new architecture
- Set up CI/CD pipelines and monitoring
Phase 2: Domain Expansion (Q4 2024)
- Migrate Inventory and Payment domains
- Implement event sourcing for critical domains
- Develop read models for reporting and analytics
- Establish cross-domain consistency patterns
Phase 3: Legacy Decommissioning (Q1-Q2 2025)
- Migrate remaining domains
- Implement advanced monitoring and alerting
- Gradually decommission legacy system components
- Complete performance tuning and optimization
Considered Alternatives
1. Modular Monolith
Pros: Simpler development model, transactional integrity, easier testing
Cons: Limited independent scaling, technology constraints, deployment coupling
This approach would address some concerns (maintainability, modularity) but would not solve our scalability and team independence challenges.
2. Microservices with REST-only Communication
Pros: Well-understood patterns, synchronous communication simplicity
Cons: Tighter coupling, limited resilience, cascading failures
This approach would improve modularity but would not adequately address resilience and scalability concerns.
3. Serverless Architecture
Pros: Minimal infrastructure management, high elasticity, pay-per-use model
Cons: Vendor lock-in, cold start latency, limited control over infrastructure
While appealing for certain scenarios, this approach would not provide the control and predictability needed for our core business functions.
References
- Building Event-Driven Microservices (Adam Bellemare, O’Reilly)
- Domain-Driven Design (Eric Evans, Addison-Wesley)
- Enterprise Integration Patterns (Gregor Hohpe, Bobby Woolf, Addison-Wesley)
- Kafka Documentation
- CQRS Pattern (Martin Fowler)
- Event Sourcing Pattern (Martin Fowler)
Decision Record History
Date | Version | Description | Author |
---|---|---|---|
2024-06-22 | 0.1 | Initial draft | David Boyne |
2024-06-30 | 0.2 | Incorporate feedback from architecture review | Amy Smith |
2024-07-10 | 0.3 | Added implementation phasing and compliance requirements | Kiran Patel |
2024-07-15 | 1.0 | Approved by Architecture Board | Architecture Board |
Appendix A: High-Level Architecture Diagram
Loading graph...
Appendix B: Key Event Flows
Order Placement Flow
Loading graph...
Out-of-Stock Handling Flow
Loading graph...
Appendix C: Service Ownership
Domain | Service | Team |
---|---|---|
Customer | Customer Service | Full Stack Team |
Customer | Authentication Service | Security Team |
Order | Order Service | Order Management Team |
Order | Order History Service | Order Management Team |
Inventory | Inventory Service | Full Stack Team |
Inventory | Stock Management | Full Stack Team |
Payment | Payment Service | Payment Team |
Payment | Refund Service | Payment Team |
Shipping | Shipping Service | Logistics Team |
Shipping | Tracking Service | Logistics Team |