Illustrative Sample
Grounded Work · Ground Truth
Technical Deep Dive
Northwind Software, Inc.
Engagement: GT-2026-051
Delivered: May 2026
Codebase: ~280K lines · 14 services · 4 repos
Confidential — Engineering Leadership Only

System Health Scorecard

Overall Health
5.6
Moderate, with Critical Items
Architecture
6.8
Moderate
Security
5.4
Elevated
Coupling
Hi-C
Tightly Coupled
Test Coverage
38%
Below Threshold
01

Architecture Map (Generated from Code)

Generated by static analysis of the deployed codebase as of engagement start. Service boundaries, call graphs, and external dependencies are derived from imports, route definitions, and infrastructure config, not from documentation.

Client Layer
Web App (Next.js 13, App Router)
iOS App (Swift, native)
Admin Console (React)
Edge / Gateway
API Gateway + custom auth Lambda
WebSocket Gateway behind ALB
Core Services
catalog-svc
checkout-svc (deg 14/9)
search-svc
user-svc
Service Tier
inventory-svc
notification-svc
recs-svc
analytics-svc
Utility Tier
admin-svc
report-svc
import-svc
webhook-svc · cron-svc · audit-svc
Data Layer
PostgreSQL (over-provisioned)
Elasticsearch (9-node)
Redis (single node, no HA)
S3 (14 buckets, no lifecycle)
External Dependencies
Stripe (sync, no retry)
SendGrid (sync, blocks checkout)
Twilio (async, healthy)
AWS S3 · SES · CloudFront
Three external integrations, Stripe, SendGrid, and Redis, are deployed without retry, circuit breaker, or high-availability configuration. Checkout has incoming dependency degree 14, making it both the largest single point of failure and the highest-cost refactor target.
02

Critical Findings

CriticalSecurity · Compliance
22 live credentials committed to source code
  • Stripe live secret key in services/checkout-svc/src/config/stripe.config.ts:14
  • AWS IAM access key with s3:* permissions in infrastructure/scripts/migrate-data.sh:8
  • Production PostgreSQL password in services/user-svc/.env.production committed in a3f2c9b and never rotated
  • 19 additional API keys, OAuth secrets, and service account tokens across webhook-svc, admin-svc, and import-svc
  • Git history retains all credentials; rotation alone is insufficient.
Engineering Implication: Immediate rotation required. Stand up secrets management before further deployments and plan a four-hour repo-history cleanup window.
CriticalArchitecture · Coupling
Checkout service: incoming degree 14, outgoing degree 9
  • Fourteen services consume checkout's current contract directly.
  • Nine downstream calls happen synchronously inside the checkout request path with no retry logic, no circuit breaker, and no async queueing.
  • Documented incident: April 18 partial outage traced to SendGrid latency spike. Checkout degraded for 47 minutes.
  • Refactor sequence: extract notifications to async → extract inventory decrement to event-driven → introduce circuit breakers around Stripe → reduce incoming dependencies via versioned API contract.
Engineering Implication: Six to eight weeks of focused work eliminates the highest cascade-risk service in the system. The sequence can run in parallel with two engineers.
HighTech Debt · AI Readiness
38% of api/ is unreachable from any entry point
  • 47,200 lines in service api directories have no inbound route or call site.
  • 12 deprecated route handlers remain wired but unused since the Q3 2025 deprecation.
  • 31 stale feature flags remain; 19 are permanently on or off.
  • Dead code pollutes both human onboarding context and AI agent context windows. Completion samples suggested dead-code patterns 28% of the time.
Engineering Implication: Start with deprecated routes, proceed to unreachable handlers, then clean feature flags. Bring coverage above 50% on the remaining code before agentic refactoring is enabled.
HighCloud Cost · Operational
$190K/yr cloud waste with no functional change

Detailed in §03 below. Compute right-sizing, RDS downsizing, S3 lifecycle policies, and Elasticsearch reduction together create a recoverable savings pool that covers the engagement and the first phase of the refactor work above.

Engineering Implication: Sequence cost work in parallel with security and architecture remediation. Different teams, no contention.
MediumReliability · Single Points of Failure
Redis deployed without high availability
  • Single Redis node, no replication, no automated failover.
  • Used as session store, cache, and rate limiter on one instance.
  • Failure mode: loss of logged-in sessions, rate-limiting collapse, and cache stampede across dependent services.
Engineering Implication: Recommend ElastiCache with Multi-AZ replication and concern separation. Two-week effort eliminates a not-if-but-when incident.
MediumCompliance · Pre-Audit
SOC 2 Type II readiness gaps
  • Structured logs absent in 6 of 14 services; access-log retention below 90 days.
  • Service account permissions broader than required in 9 cases.
  • 23% of production deploys in the last 90 days lacked PR review record.
  • Data in transit between internal services is unencrypted in 4 places.
Engineering Implication: None are blockers individually. Together they extend SOC 2 Type II prep by four to six weeks if attempted today.
03

Cloud Cost Analysis

Current annual cloud spend: $560K. Identified recoverable spend: $190K (34%) with no functional changes.

ServiceCurrentOptimizedSavingsAssessment
EC2 Compute (8 instances)$228,000$116,000$112,0003 instances at <4% CPU. Right-size + Reserved Instances.
RDS (db.r6g.4xlarge)$164,000$82,000$82,000Provisioned for ~3x actual workload. Downsize + read replica strategy.
S3 Storage (14 buckets)$76,000$44,000$32,000No lifecycle policies. Move year-old raw data to Glacier.
Elasticsearch (9-node)$52,000$40,000$12,000Sized for projected document volume that did not materialize.
CloudWatch / Other$40,000$36,000$4,000Minor log retention and metric cleanup.
Total$560,000/yr$318,000/yr$190,000/yr34% cloud cost reduction opportunity
04

Refactor Backlog & Sequencing

Sequenced so that no two streams contend for the same files. Three engineers can run streams A, B, and C in parallel from week one.

#ItemStreamEffortDepends OnEngineering Outcome
1Rotate 22 committed credentials, stand up Secrets Manager, rewrite git historyA — Security1 weekNoneEliminates breach liability and unblocks compliance work.
2Right-size EC2 + apply Reserved InstancesB — Cost2 daysNone$112K/yr recovered.
3RDS downsize + S3 lifecycle policiesB — Cost1 week(2)$114K/yr recovered.
4Extract notifications and inventory from checkout to asyncC — Architecture3 weeksNoneReduces checkout coupling and enables retry semantics.
5Add circuit breakers around Stripe, SendGrid, RedisC — Architecture1 week(4)Eliminates cascade failures from external dependencies.
6Redis HA migration (ElastiCache Multi-AZ)C — Architecture2 weeksNoneRemoves the highest-probability future incident.
7Dead code deletion campaign (deprecated routes → unreachable handlers → feature flags)D — Tech Debt6 weeks(1)Recovers velocity. Cleans context for AI-assisted development.
8Test coverage floor for new code (60%); retroactive coverage on hot pathsD — Tech DebtOngoing(7)Required before agentic refactoring is enabled.
9SOC 2 readiness gaps (logging, access review, change management)E — Compliance4 weeksNonePre-audit posture. Parallelizable with all other streams.
This is an illustrative sample. All company names, figures, findings, and data in this document are fictional and generated for demonstration purposes only.
Ground Truth by Grounded Work · ConfidentialBack to CTO landinghello@grounded-work.com