ShippingService Runbook
Operational runbook for troubleshooting and maintaining the ShippingService
This runbook provides operational procedures for the ShippingService, which is responsible for managing shipping options, carrier integration, and delivery tracking in the FlowMart e-commerce platform.
Architecture
The ShippingService is responsible for:
- Calculating shipping costs and delivery estimates
- Managing shipping carriers and integration
- Generating shipping labels
- Tracking shipments
- Handling delivery exceptions and returns
Service Dependencies
Loading graph...
Monitoring and Alerting
Key Metrics
| Metric | Description | Warning Threshold | Critical Threshold |
|---|---|---|---|
shipping_rate_calculation_rate | Rate calculations per minute | < 10 | < 2 |
shipping_label_generation_success | Label generation success % | < 98% | < 95% |
carrier_api_response_time | Carrier API response time | > 2s | > 5s |
carrier_api_error_rate | Carrier API errors % | > 2% | > 5% |
tracking_update_processing_rate | Tracking updates processed per minute | < 50 | < 10 |
shipment_tracking_lag | Delay in tracking information | > 15m | > 1h |
Dashboards
Common Alerts
| Alert | Description | Troubleshooting Steps |
|---|---|---|
ShippingServiceHighErrorRate | Shipping API error rate above threshold | See High Error Rate |
ShippingCarrierAPIDown | Carrier API connection issues | See Carrier API Issues |
ShippingServiceHighLatency | Shipping service latency issues | See High Latency |
ShippingServiceDatabaseIssues | Database connection issues | See Database Issues |
Troubleshooting Guides
High Error Rate
If the service is experiencing a high error rate:
-
Check application logs for error patterns:
Terminal window kubectl logs -l app=shipping-service -n shipping --tail=100 -
Check specific error types:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/error-analyzer.jar --last-hour -
Check for patterns in failed shipments:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/failed-shipments-analyzer.jar -
Check for recent deployments that might have introduced issues:
Terminal window kubectl rollout history deployment/shipping-service -n shipping -
Verify if the issue is specific to a carrier (FedEx, UPS, etc.):
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/carrier-success-rates.jar
Carrier API Issues
If there are issues with carrier APIs:
-
Check carrier API connectivity:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/carrier-health-check.jar -
Check carrier API credentials and rotation status:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/check-carrier-credentials.jar -
Check carrier status pages for announced outages:
-
Check carrier timeouts in application logs:
Terminal window kubectl logs -l app=shipping-service -n shipping | grep "carrier timeout" -
Enable fallback shipping carrier:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/shipping/enable-fallback-carrier -H "Content-Type: application/json" -d '{"primaryCarrier": "fedex", "fallbackCarrier": "ups", "reason": "FedEx API outage"}'
High Latency
If the service is experiencing high latency:
-
Check system metrics:
Terminal window kubectl top pods -n shipping -
Check JVM memory and GC metrics:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/jvm-metrics.jar -
Check MongoDB performance:
Terminal window kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.currentOp()" -
Check carrier API response times:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/carrier-response-times.jar -
Scale the service if needed:
Terminal window kubectl scale deployment shipping-service -n shipping --replicas=5
Database Issues
If there are MongoDB issues:
-
Check MongoDB status:
Terminal window kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "rs.status()" -
Check for slow queries:
Terminal window kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.currentOp({ 'active': true, 'secs_running': { '$gt': 5 } })" -
Check database connection pool:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/db-pool-stats.jar -
Restart database connections if needed:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/refresh-db-connections
Common Operational Tasks
Managing Carrier API Credentials
Rotating Carrier API Keys
-
Generate new API keys in the carrier portal:
- FedEx Developer Portal: https://developer.fedex.com
- UPS Developer Portal: https://developer.ups.com
- USPS Web Tools: https://www.usps.com/business/web-tools-apis
-
Store the new keys in AWS Secrets Manager:
Terminal window aws secretsmanager update-secret --secret-id flowmart/shipping/fedex-api-key --secret-string '{"api_key": "NEW_KEY", "password": "NEW_PASSWORD", "account_number": "ACCOUNT_NUMBER"}' -
Trigger key rotation in the service:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/reload-carrier-credentials -
Verify the new keys are working by testing label generation:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/test-label-generation.jar --carrier=fedex
Managing Shipping Rates
Updating Shipping Rate Tables
When carrier rates change:
-
Prepare the new rate table in the required JSON format.
-
Upload the rate table to S3:
Terminal window aws s3 cp new-fedex-rates.json s3://flowmart-configs/shipping/rates/ -
Trigger rate table reload:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/shipping/reload-rate-tables -H "Content-Type: application/json" -d '{"carrier": "fedex"}' -
Verify rate calculations with test scenarios:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/test-rate-calculation.jar
Shipping Label Generation Troubleshooting
Debugging Failed Label Generation
If labels are failing to generate:
# Find recent failed label generation attemptskubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/find-failed-labels.jar --hours=2
# Get detailed error for a specific shipmentkubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/label-error-details.jar --shipment-id=SHIP123456Manual Label Generation
For special cases requiring manual intervention:
curl -X POST https://api.internal.flowmart.com/shipping/shipments/{shipmentId}/generate-label \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{"carrier": "fedex", "service": "PRIORITY_OVERNIGHT", "forceGeneration": true}'Tracking Updates
Triggering Manual Tracking Updates
To manually trigger tracking updates:
kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/sync-tracking.jar --shipment-id=SHIP123456
# For bulk tracking updateskubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/sync-tracking.jar --status=in_transit --hours=24Tracking Webhook Troubleshooting
If tracking webhooks from carriers are failing:
# Check recent webhook failureskubectl logs -l app=shipping-webhook-service -n shipping | grep "Webhook failure"
# Replay failed webhookskubectl exec -it $(kubectl get pods -l app=shipping-webhook-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/replay-webhooks.jar --hours=2Recovery Procedures
Failed Shipment Recovery
If shipments are stuck or failed:
-
Identify stuck shipments:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/find-stuck-shipments.jar -
Check shipment status with the carrier:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/check-carrier-shipment.jar --shipment-id=SHIP123456 -
Resolve shipments that completed at carrier but failed in our system:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/resolve-shipment.jar --shipment-id=SHIP123456 --tracking-number=1Z999AA10123456784 --status=label_created
Carrier API Failure Recovery
If a carrier API is unavailable:
-
Enable automatic carrier fallback:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/enable-carrier-fallback -
Monitor carrier API status for recovery:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/monitor-carrier-health.jar --carrier=fedex -
Switch back to primary carrier once it’s restored:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/disable-carrier-fallback
Database Failure Recovery
If the MongoDB database becomes unavailable:
-
Verify the status of the MongoDB cluster:
Terminal window kubectl get pods -l app=mongodb -n data -
Check if automatic failover has occurred:
Terminal window kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "rs.status()" -
Once database availability is restored, validate ShippingService functionality:
Terminal window curl -X GET https://api.internal.flowmart.com/shipping/health
Disaster Recovery
Complete Service Failure
In case of a complete service failure:
-
Initiate incident response by notifying the on-call team through PagerDuty.
-
Deploy to the disaster recovery environment if necessary:
Terminal window ./scripts/dr-failover.sh shipping-service -
Update DNS records to point to the DR environment:
Terminal window aws route53 change-resource-record-sets --hosted-zone-id $HOSTED_ZONE_ID --change-batch file://dr-dns-change.json -
Enable simplified shipping flow (if necessary):
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- curl -X POST localhost:8080/internal/api/system/enable-simplified-flow -
Regularly check primary environment recovery status.
Maintenance Tasks
Deploying New Versions
kubectl set image deployment/shipping-service -n shipping shipping-service=ecr.aws/flowmart/shipping-service:$VERSIONDatabase Maintenance
MongoDB Index Maintenance
Periodically verify and optimize MongoDB indexes:
# Check current indexeskubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.shipments.getIndexes()"
# Add new index (example)kubectl exec -it $(kubectl get pods -l app=mongodb -n data -o jsonpath='{.items[0].metadata.name}') -n data -- mongo --eval "db.shipments.createIndex({carrier: 1, status: 1, createdAt: -1})"Database Backups
Verify scheduled MongoDB backups:
# Check recent backupsaws s3 ls s3://flowmart-mongodb-backups/shipping/ --human-readable
# Trigger manual backup if neededkubectl apply -f shipping-db-backup-job.yamlCarrier Integration Updates
When a carrier updates their API:
-
Test the API changes in the staging environment:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service-staging -n shipping-staging -o jsonpath='{.items[0].metadata.name}') -n shipping-staging -- java -jar /app/tools/test-carrier-integration.jar --carrier=fedex --mode=new -
Update integration configuration if needed:
Terminal window kubectl apply -f updated-fedex-integration-config.yaml -
Validate the updated integration:
Terminal window kubectl exec -it $(kubectl get pods -l app=shipping-service -n shipping -o jsonpath='{.items[0].metadata.name}') -n shipping -- java -jar /app/tools/validate-carrier-integration.jar --carrier=fedex
Contact Information
Primary On-Call: Logistics Team (rotating schedule)
Secondary On-Call: Platform Team
Escalation Path: Logistics Team Lead > Engineering Manager > CTO
Slack Channels:
- #shipping-support (primary support channel)
- #shipping-alerts (automated alerts)
- #incident-response (for major incidents)
External Contacts:
- FedEx API Support: apisupport@fedex.com, 1-800-555-1234
- UPS Developer Support: developer@ups.com, 1-800-555-5678
- USPS Web Tools Support: uspstechsupport@usps.gov, 1-800-555-9012