ShippingService Runbook
Operational runbook for troubleshooting and maintaining the ShippingService
This runbook provides operational procedures for the ShippingService, which is responsible for managing shipping options, carrier integration, and delivery tracking in the FlowMart e-commerce platform.
Architecture
The ShippingService is responsible for:
- Calculating shipping costs and delivery estimates
- Managing shipping carriers and integration
- Generating shipping labels
- Tracking shipments
- Handling delivery exceptions and returns
Service Dependencies
Loading graph...
Monitoring and Alerting
Key Metrics
Metric | Description | Warning Threshold | Critical Threshold |
---|---|---|---|
shipping_rate_calculation_rate | Rate calculations per minute | < 10 | < 2 |
shipping_label_generation_success | Label generation success % | < 98% | < 95% |
carrier_api_response_time | Carrier API response time | > 2s | > 5s |
carrier_api_error_rate | Carrier API errors % | > 2% | > 5% |
tracking_update_processing_rate | Tracking updates processed per minute | < 50 | < 10 |
shipment_tracking_lag | Delay in tracking information | > 15m | > 1h |
Dashboards
Common Alerts
Alert | Description | Troubleshooting Steps |
---|---|---|
ShippingServiceHighErrorRate | Shipping API error rate above threshold | See High Error Rate |
ShippingCarrierAPIDown | Carrier API connection issues | See Carrier API Issues |
ShippingServiceHighLatency | Shipping service latency issues | See High Latency |
ShippingServiceDatabaseIssues | Database connection issues | See Database Issues |
Troubleshooting Guides
High Error Rate
If the service is experiencing a high error rate:
Check application logs for error patterns:
kubectl logs -l app=shipping-service -n shipping --tail=100
Check specific error types:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/error-analyzer.jar --last-hour
Check for patterns in failed shipments:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/failed-shipments-analyzer.jar
Check for recent deployments that might have introduced issues:
kubectl rollout history deployment/shipping-service -n shipping
Verify if the issue is specific to a carrier (FedEx, UPS, etc.):
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/carrier-success-rates.jar
Carrier API Issues
If there are issues with carrier APIs:
Check carrier API connectivity:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/carrier-health-check.jar
Check carrier API credentials and rotation status:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/check-carrier-credentials.jar
Check carrier status pages for announced outages:
Check carrier timeouts in application logs:
kubectl logs -l app=shipping-service -n shipping | grep "carrier timeout"
Enable fallback shipping carrier:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/shipping/enable-fallback-carrier -H "Content-Type: application/json" -d '{"primaryCarrier": "fedex", "fallbackCarrier": "ups", "reason": "FedEx API outage"}'
High Latency
If the service is experiencing high latency:
Check system metrics:
kubectl top pods -n shipping
Check JVM memory and GC metrics:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/jvm-metrics.jar
Check MongoDB performance:
kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.currentOp()"
Check carrier API response times:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/carrier-response-times.jar
Scale the service if needed:
kubectl scale deployment shipping-service -n shipping --replicas=5
Database Issues
If there are MongoDB issues:
Check MongoDB status:
kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "rs.status()"
Check for slow queries:
kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.currentOp({ 'active': true, 'secs_running': { '$gt': 5 } })"
Check database connection pool:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/db-pool-stats.jar
Restart database connections if needed:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/refresh-db-connections
Common Operational Tasks
Managing Carrier API Credentials
Rotating Carrier API Keys
Generate new API keys in the carrier portal:
- FedEx Developer Portal: https://developer.fedex.com
- UPS Developer Portal: https://developer.ups.com
- USPS Web Tools: https://www.usps.com/business/web-tools-apis
Store the new keys in AWS Secrets Manager:
aws secretsmanager update-secret --secret-id flowmart/shipping/fedex-api-key --secret-string '{"api_key": "NEW_KEY", "password": "NEW_PASSWORD", "account_number": "ACCOUNT_NUMBER"}'
Trigger key rotation in the service:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/reload-carrier-credentials
Verify the new keys are working by testing label generation:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/test-label-generation.jar --carrier=fedex
Managing Shipping Rates
Updating Shipping Rate Tables
When carrier rates change:
Prepare the new rate table in the required JSON format.
Upload the rate table to S3:
aws s3 cp new-fedex-rates.json s3://flowmart-configs/shipping/rates/
Trigger rate table reload:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/shipping/reload-rate-tables -H "Content-Type: application/json" -d '{"carrier": "fedex"}'
Verify rate calculations with test scenarios:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/test-rate-calculation.jar
Shipping Label Generation Troubleshooting
Debugging Failed Label Generation
If labels are failing to generate:
# Find recent failed label generation attempts
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/find-failed-labels.jar --hours=2
# Get detailed error for a specific shipment
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/label-error-details.jar --shipment-id=SHIP123456
Manual Label Generation
For special cases requiring manual intervention:
curl -X POST https://api.internal.flowmart.com/shipping/shipments/{shipmentId}/generate-label \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-H "Content-Type: application/json" \
-d '{"carrier": "fedex", "service": "PRIORITY_OVERNIGHT", "forceGeneration": true}'
Tracking Updates
Triggering Manual Tracking Updates
To manually trigger tracking updates:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/sync-tracking.jar --shipment-id=SHIP123456
# For bulk tracking updates
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/sync-tracking.jar --status=in_transit --hours=24
Tracking Webhook Troubleshooting
If tracking webhooks from carriers are failing:
# Check recent webhook failures
kubectl logs -l app=shipping-webhook-service -n shipping | grep "Webhook failure"
# Replay failed webhooks
kubectl exec -it $(kubectl get pods -l app=shipping-webhook-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/replay-webhooks.jar --hours=2
Recovery Procedures
Failed Shipment Recovery
If shipments are stuck or failed:
Identify stuck shipments:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/find-stuck-shipments.jar
Check shipment status with the carrier:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/check-carrier-shipment.jar --shipment-id=SHIP123456
Resolve shipments that completed at carrier but failed in our system:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/resolve-shipment.jar --shipment-id=SHIP123456 --tracking-number=1Z999AA10123456784 --status=label_created
Carrier API Failure Recovery
If a carrier API is unavailable:
Enable automatic carrier fallback:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/enable-carrier-fallback
Monitor carrier API status for recovery:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/monitor-carrier-health.jar --carrier=fedex
Switch back to primary carrier once it’s restored:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/disable-carrier-fallback
Database Failure Recovery
If the MongoDB database becomes unavailable:
Verify the status of the MongoDB cluster:
kubectl get pods -l app=mongodb -n data
Check if automatic failover has occurred:
kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "rs.status()"
Once database availability is restored, validate ShippingService functionality:
curl -X GET https://api.internal.flowmart.com/shipping/health
Disaster Recovery
Complete Service Failure
In case of a complete service failure:
Initiate incident response by notifying the on-call team through PagerDuty.
Deploy to the disaster recovery environment if necessary:
./scripts/dr-failover.sh shipping-service
Update DNS records to point to the DR environment:
aws route53 change-resource-record-sets --hosted-zone-id $HOSTED_ZONE_ID --change-batch file://dr-dns-change.json
Enable simplified shipping flow (if necessary):
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/enable-simplified-flow
Regularly check primary environment recovery status.
Maintenance Tasks
Deploying New Versions
kubectl set image deployment/shipping-service -n shipping shipping-service=ecr.aws/flowmart/shipping-service:$VERSION
Database Maintenance
MongoDB Index Maintenance
Periodically verify and optimize MongoDB indexes:
# Check current indexes
kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.shipments.getIndexes()"
# Add new index (example)
kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.shipments.createIndex({carrier: 1, status: 1, createdAt: -1})"
Database Backups
Verify scheduled MongoDB backups:
# Check recent backups
aws s3 ls s3://flowmart-mongodb-backups/shipping/ --human-readable
# Trigger manual backup if needed
kubectl apply -f shipping-db-backup-job.yaml
Carrier Integration Updates
When a carrier updates their API:
Test the API changes in the staging environment:
kubectl exec -it $(kubectl get pods -l app=shipping-service-staging -n shipping-staging -o jsonpath='{.items[0].metadata.name}') -n shipping-staging -- java -jar /app/tools/test-carrier-integration.jar --carrier=fedex --mode=new
Update integration configuration if needed:
kubectl apply -f updated-fedex-integration-config.yaml
Validate the updated integration:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/validate-carrier-integration.jar --carrier=fedex
Contact Information
Primary On-Call: Logistics Team (rotating schedule)
Secondary On-Call: Platform Team
Escalation Path: Logistics Team Lead > Engineering Manager > CTO
Slack Channels:
- #shipping-support (primary support channel)
- #shipping-alerts (automated alerts)
- #incident-response (for major incidents)
External Contacts:
- FedEx API Support: apisupport@fedex.com, 1-800-555-1234
- UPS Developer Support: developer@ups.com, 1-800-555-5678
- USPS Web Tools Support: uspstechsupport@usps.gov, 1-800-555-9012