Designing APIs That Don't Break: A Practical Guide
How reactive API design leads to chaos—and the patterns we used at VMware and hipages to build consistent, maintainable interfaces across hundreds of microservices.
Designing APIs That Don't Break: A Practical Guide
It was 2019, and our SD-WAN orchestrator at VMware had just crossed the 50-microservice mark. Each team owned their domain—device management, network configuration, monitoring, billing—and they all built APIs the way they thought best. One team used /getDevicesByCustomerId, another used /customers/{id}/devices, and a third just went with /devices?customer_id={id}.
Fast forward to hipages in 2022, and I saw the same pattern repeating: job management endpoints mixing camelCase and snake_case, tradie profile APIs returning 47 fields when the frontend only needed 5, and mobile apps making 8 separate requests just to render a dashboard.
Both times, we had to stop and ask: How did we get here?
The Reactive API Trap
The Death by a Thousand Endpoints
When requirements come in as "We need to show X on the dashboard," the natural response is to create exactly what's needed, right now, in isolation. Here's how that plays out:
// Team A: Device management service
@Get('/getDevicesByCustomerId/:customerId')
async getDevicesByCustomerId(@Param() params: { customerId: string }) {
return this.deviceService.findByCustomer(params.customerId);
}
// Team B: Network service
@Get('/customers/:customerId/network-devices')
async fetchNetworkDevices(@Param() params: { customerId: string }) {
return this.networkService.getDevices(params.customerId);
}
// Team C: Monitoring service
@Get('/api/v1/monitoring/customer-devices')
async customerDevices(@Query() query: { customer_id: string }) {
return this.monitoringService.listDevices(query.customer_id);
}
Three teams, three different conventions, same basic concept. When the frontend needed to show a customer's devices across all services, they had to remember which service used which pattern. Debugging became a nightmare—"Is it customerId or customer_id? Do I POST to /createDevice or /devices?"
The Over-Fetching Problem
Worse than inconsistent conventions was the data bloat. When you build APIs reactively, you tend to return everything "just in case":
// What we built - reactive approach
@Get('/jobs/:jobId')
async getJob(@Param('jobId') jobId: string) {
return this.jobRepository.findOne({
where: { id: jobId },
relations: [
'customer',
'customer.address',
'tradies',
'tradies.profile',
'tradies.profile.skills',
'quotes',
'quotes.tradie',
'messages',
'reviews',
'auditLog',
'metadata'
]
});
}
The mobile app just needed the job title and status. Instead, it got 3MB of JSON including every tradie's skill history and audit logs from 2021. Response times suffered, mobile data plans suffered, and our CDN bills suffered.
The Gateway Realization
At VMware, we hit a breaking point when the enterprise sales team wanted a public API. Our internal services were a maze:
- Authentication happened in the user service
- Device data lived in the device service
- Network configs were in the network service
- Billing was split across three services
A customer wanting to "list my devices with their network status and billing details" would need to call 5 different services with different auth mechanisms, different rate limits, and different response formats.
We needed an API gateway—but we also needed consistent API design across all services first.
The Reset: Building API Standards
Step 1: URL Structure and Naming Conventions
We established a simple, consistent pattern:
/api/v{version}/{resource}/{id}/{sub-resource}
Rules:
- Plural nouns only:
/devices, not/deviceor/getDevices - Lowercase with hyphens:
/network-policies, not/networkPoliciesor/network_policies - No verbs in URLs: Use HTTP methods (
GET /devices, not/getDevices) - Consistent query parameters:
?status=active&limit=20, always snake_case for query params
// Before (chaos)
@Get('/getDevicesByCustomerId/:customerId')
@Get('/customers/:customerId/network-devices')
@Get('/api/v1/monitoring/customer-devices')
// After (consistent)
@Get('/api/v1/customers/:customerId/devices')
@Get('/api/v1/customers/:customerId/network-policies')
@Get('/api/v1/customers/:customerId/monitored-devices')
Step 2: Response Structure
Every response followed a predictable envelope:
interface ApiResponse<T> {
data: T;
meta?: {
total?: number;
page?: number;
limit?: number;
};
links?: {
self: string;
next?: string;
prev?: string;
};
}
interface ApiError {
error: {
code: string;
message: string;
details?: Record<string, string[]>;
};
}
This meant clients could write one response handler that worked across all endpoints:
// Consistent client-side handling
async function apiRequest<T>(url: string): Promise<ApiResponse<T>> {
const response = await fetch(url);
const json = await response.json();
if (!response.ok) {
throw new ApiError(json.error);
}
return json; // Always has .data, optionally .meta and .links
}
Step 3: Solving Over-Fetching with Field Selection
Instead of returning everything, we implemented sparse fieldsets:
@Get('/api/v1/jobs/:jobId')
async getJob(
@Param('jobId') jobId: string,
@Query('fields') fields?: string // e.g., "title,status,customer.name"
) {
const job = await this.jobService.findById(jobId);
if (fields) {
return this.selectFields(job, fields.split(','));
}
return job; // Returns default minimal fields
}
Usage:
# Mobile app - minimal data
GET /api/v1/jobs/123?fields=title,status,budget
# Dashboard - more context
GET /api/v1/jobs/123?fields=title,status,customer.name,customer.phone,quotes.count
# Admin tool - everything
GET /api/v1/jobs/123?fields=title,status,customer,tradies,quotes,messages,auditLog
Result: Response sizes dropped by 60-80%, and page load times improved dramatically.
The OpenAPI Contract
Standards without enforcement are just suggestions. We made OpenAPI 3.0 specifications the source of truth:
openapi: 3.0.0
info:
title: Hipages Job API
version: 1.0.0
paths:
/api/v1/jobs/{jobId}:
get:
summary: Get job by ID
parameters:
- name: jobId
in: path
required: true
schema:
type: string
format: uuid
- name: fields
in: query
schema:
type: string
description: Comma-separated list of fields to return
responses:
'200':
description: Job found
content:
application/json:
schema:
$ref: '#/components/schemas/JobResponse'
'404':
description: Job not found
content:
application/json:
schema:
$ref: '#/components/schemas/ErrorResponse'
components:
schemas:
JobResponse:
type: object
required:
- data
properties:
data:
$ref: '#/components/schemas/Job'
meta:
$ref: '#/components/schemas/Meta'
Job:
type: object
required:
- id
- title
- status
properties:
id:
type: string
format: uuid
title:
type: string
maxLength: 200
status:
type: string
enum: [draft, posted, assigned, completed, cancelled]
customer:
$ref: '#/components/schemas/CustomerReference'
tradies:
type: array
items:
$ref: '#/components/schemas/TradieReference'
CustomerReference:
type: object
properties:
id:
type: string
format: uuid
name:
type: string
phone:
type: string
ErrorResponse:
type: object
required:
- error
properties:
error:
type: object
properties:
code:
type: string
message:
type: string
Generating TypeScript from OpenAPI
We used openapi-typescript to generate TypeScript types from our specs:
npx openapi-typescript ./api-spec.yaml --output ./types/api.ts
This ensured our frontend and backend stayed in sync:
import { paths } from './types/api';
// Type-safe API client
type GetJobResponse = paths['/api/v1/jobs/{jobId}']['get']['responses']['200']['content']['application/json'];
async function getJob(jobId: string): Promise<GetJobResponse> {
const response = await fetch(`/api/v1/jobs/${jobId}`);
return response.json();
// Returns fully typed data - no more `any`!
}
Versioning Strategy That Actually Works
URL Versioning with Deprecation Headers
We chose URL versioning (/api/v1/, /api/v2/) because it's explicit and cache-friendly:
// Current version
@Controller('/api/v1/jobs')
export class JobControllerV1 {
@Get(':jobId')
async getJob(@Param('jobId') jobId: string) { /* ... */ }
}
// New version with breaking changes
@Controller('/api/v2/jobs')
export class JobControllerV2 {
@Get(':jobId')
async getJob(@Param('jobId') jobId: string) {
// Returns different structure
}
}
For gradual deprecations, we used Sunset headers:
@Get('/api/v1/jobs/:jobId')
async getJob(@Param('jobId') jobId: string, @Res() response: Response) {
response.set('Sunset', 'Wed, 01 Jun 2026 00:00:00 GMT');
response.set('Deprecation', 'true');
// ... return data
}
Clients received:
Sunset: Wed, 01 Jun 2026 00:00:00 GMT
Deprecation: true
This gave them 6 months to migrate, with clear deadlines.
The API Gateway Layer
With consistent internal APIs, building a gateway became straightforward:
@Controller('/api/v1')
export class GatewayController {
constructor(
private readonly deviceClient: DeviceServiceClient,
private readonly networkClient: NetworkServiceClient,
private readonly billingClient: BillingServiceClient
) {}
@Get('customers/:customerId/device-dashboard')
async getDeviceDashboard(@Param('customerId') customerId: string) {
// Parallel requests to internal services
const [devices, networkStats, billing] = await Promise.all([
this.deviceClient.getDevices(customerId),
this.networkClient.getStats(customerId),
this.billingClient.getUsage(customerId)
]);
// Compose into a unified response
return {
data: {
devices: devices.map(device => ({
id: device.id,
name: device.name,
status: device.status,
networkHealth: networkStats.find(s => s.deviceId === device.id)?.health,
monthlyCost: billing.find(b => b.deviceId === device.id)?.amount
})),
summary: {
totalDevices: devices.length,
healthyDevices: networkStats.filter(s => s.health === 'good').length,
totalMonthlyCost: billing.reduce((sum, b) => sum + b.amount, 0)
}
}
};
}
}
The gateway handled:
- Authentication: JWT validation at the edge
- Rate limiting: Per-client token buckets
- Request composition: Combining multiple service calls
- Caching: Redis-backed response caching for read-heavy endpoints
- Transformation: Converting internal service formats to public API format
Briefly: Why Not GraphQL?
GraphQL solves many of these problems beautifully—clients request exactly what they need, and you get a single endpoint. We evaluated it at hipages but ultimately stuck with REST for a few reasons:
- Team expertise: The team knew REST well; GraphQL would be a learning curve
- Caching: HTTP caching is simpler and more mature than GraphQL caching
- Existing infrastructure: Our monitoring, rate limiting, and auth systems were built around REST
- Field selection solved 80%: Our sparse fieldset approach addressed the over-fetching problem without the complexity
That said, for a greenfield project with a complex data graph and mobile-heavy usage, GraphQL would be worth serious consideration.
Results
After implementing these patterns:
- Developer velocity: New developers could understand the API in hours, not days
- Error rates: 40% reduction in client-side integration bugs
- Performance: 60% reduction in average response payload size
- Time to market: New features shipped 2x faster because teams weren't fighting API inconsistencies
- Mobile app: Reduced from 8 requests per dashboard to 1 composed gateway request
Lessons Learned
1. Design APIs for the Consumer, Not the Database
Your API should reflect how clients use your system, not your internal data model. Just because you have a job_tradie_matching table doesn't mean you need /job-tradie-matching endpoint.
2. Standards Are Worth the Upfront Investment
Yes, it takes longer to write an OpenAPI spec than to just code the endpoint. But that spec becomes:
- Documentation
- Type-safe client code
- Contract tests
- Mock servers for frontend development
3. Breaking Changes Are a Feature, Not a Bug
Don't be afraid to version. It's better to have /api/v2/ that's clean than /api/v1/ with 47 optional parameters handling 6 years of edge cases.
4. Gateways Buy You Time
When you inevitably have inconsistent internal services (and you will), a gateway can paper over the cracks while you fix the underlying issues. It's technical debt, but intentional and documented.
Best Practices Checklist
- Use plural nouns and lowercase-hyphenated URLs
- Return consistent envelope structures (
{ data, meta, links }) - Implement field selection to prevent over-fetching
- Use OpenAPI specs as the source of truth
- Generate types from specs for type safety
- Version in the URL (
/api/v1/) - Add Sunset headers for deprecations
- Return appropriate HTTP status codes (don't use 200 for errors)
- Implement detailed error responses in development (with stack traces), sparse/generic in production
- Use an API gateway for composition, auth, and rate limiting
- Document breaking changes in a changelog
- Provide SDKs or generated clients for critical consumers
Conclusion
APIs are a product. Like any product, they need intentional design, consistent standards, and care for the user experience. The chaos we experienced at VMware and hipages wasn't because our engineers were bad—it was because we treated APIs as an implementation detail rather than a critical interface.
Take the time to design your APIs thoughtfully. Your future self—and every developer who integrates with your system—will thank you.
Want to discuss API design patterns? Connect with me on LinkedIn or email me.