Design a System for Identity Verification (KYC)

System Design
Hard
Stripe
54.5K views

Design a Know Your Customer (KYC) service to verify user identity. Focus on document processing, integration with third-party verification APIs, and data security.

Why Interviewers Ask This

Interviewers at Stripe ask this to evaluate your ability to balance high-scale system architecture with critical compliance and security requirements. They specifically test if you can design a robust pipeline for document ingestion, OCR processing, and third-party API orchestration while prioritizing data privacy, auditability, and fault tolerance in a financial context.

How to Answer This Question

1. Clarify requirements immediately: define scope (real-time vs batch), latency SLAs, and specific regulations like GDPR or AML. 2. Define the high-level architecture: sketch a flow from user upload through a load balancer to an object store. 3. Detail the processing pipeline: explain how you handle image normalization, OCR extraction, and heuristic validation before calling external APIs. 4. Discuss integration strategy: describe circuit breakers and retry logic for third-party verification services to ensure resilience. 5. Address security deeply: mandate encryption at rest and in transit, PII masking, and strict IAM policies suitable for fintech standards.

Key Points to Cover

  • Explicitly mention handling third-party API failures with circuit breakers and retries
  • Demonstrate knowledge of specific regulations like GDPR or AML in the data retention strategy
  • Detail the separation of concerns between ingestion, processing, and storage layers
  • Propose encryption standards for both data at rest and in transit
  • Include an audit logging mechanism for compliance traceability

Sample Answer

To design a KYC service for a platform like Stripe, I would start by defining non-functional requirements: 99.99% availability, sub-second response for simple checks, and strict adherence to PCI-DSS and GDPR. The architecture begins with a client uploading documents to a secure S3 bucket with pre-signed URLs, ensuring no direct public access. These events trigger an asynchronous workflow via SQS. The core processing engine uses a microservice that normalizes images, runs local OCR to extract metadata, and then orchestrates calls to third-party identity providers like Jumio or Onfido. Crucially, we implement a circuit breaker pattern here; if the external API fails, we queue the request for later retry rather than blocking the user. For data security, all PII is encrypted using AES-256 at rest and TLS 1.3 in transit. We also enforce a 'data minimization' principle where raw documents are purged after successful verification unless legally required. Finally, every step generates immutable audit logs stored in a separate account to satisfy regulatory auditors, ensuring full traceability of identity decisions.

Common Mistakes to Avoid

  • Focusing only on the OCR technology while ignoring the orchestration and error handling of the system
  • Neglecting to discuss data privacy and compliance regulations which are central to KYC systems
  • Designing a synchronous flow that blocks users when waiting for slow third-party API responses
  • Overlooking the need for immutable audit logs required for financial regulatory compliance

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 57 Stripe questions