Design a Simple API Documentation Service

System Design
Easy
Salesforce
64.2K views

Design a system to automatically generate and host API documentation (like Swagger/OpenAPI) for thousands of internal microservices.

Why Interviewers Ask This

Interviewers at Salesforce ask this to evaluate your ability to design scalable, automated infrastructure for developer productivity. They specifically want to see how you handle the complexity of ingesting metadata from thousands of microservices while ensuring consistency, versioning, and rapid updates without manual intervention.

How to Answer This Question

1. Clarify Requirements: Immediately define scope, such as supporting Swagger/OpenAPI standards, handling internal vs. public APIs, and latency constraints for documentation generation. 2. Propose a High-Level Architecture: Sketch a pipeline where services push metadata to a central queue (e.g., Kafka), which triggers a generator service. 3. Detail the Generation Engine: Explain how you parse code or annotations to create YAML/JSON specs, emphasizing automation and error handling for malformed inputs. 4. Address Storage and Hosting: Discuss using object storage like S3 with CDN caching for global access, and database indexing for searchability across thousands of services. 5. Consider Scalability and Observability: Outline strategies for horizontal scaling during peak build times and monitoring metrics like generation failure rates or update latency.

Key Points to Cover

  • Demonstrating understanding of decoupled architectures using message queues
  • Specific knowledge of OpenAPI/Swagger standards and parsing challenges
  • Strategies for global content delivery using CDNs and object storage
  • Handling versioning and rollback scenarios for thousands of services
  • Prioritizing developer experience through searchability and fast load times

Sample Answer

To design an API documentation service for thousands of microservices, I would start by defining the ingestion workflow. Each microservice should automatically publish its OpenAPI specification to a message broker like Kafka upon deployment. This decouples the generation process from the build pipeline, ensuring reliability. Next, a distributed worker pool consumes these messages. These workers validate the spec against a schema, resolve references, and generate human-readable HTML or interactive UIs using libraries like Redoc. For storage, I would use an object store like S3 organized by service ID and version, served via a global CDN to ensure low latency for developers worldwide. A relational database would track metadata, such as service ownership and last updated timestamps, enabling powerful search capabilities. Scalability is critical here. Since Salesforce handles massive scale, the system must auto-scale based on queue depth. We'd implement circuit breakers to prevent one failing service from blocking the entire generation pipeline. Finally, we need robust observability; if a service fails to generate docs, alerts should trigger immediately so engineering teams can fix the source code before it impacts downstream consumers. This approach ensures high availability and keeps documentation in sync with live code.

Common Mistakes to Avoid

  • Focusing only on the UI rendering instead of the backend data pipeline
  • Ignoring the challenge of managing version conflicts across many services
  • Proposing a monolithic generator that cannot scale to thousands of requests
  • Overlooking error handling when a specific microservice pushes invalid metadata

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 49 Salesforce questions