Design a System for E-commerce Search Filtering

System Design
Medium
Amazon
69.9K views

Design the technical architecture for faceted search and filtering on an e-commerce site. Focus on indexing techniques (Solr/Elasticsearch) and low-latency querying.

Why Interviewers Ask This

Interviewers ask this to evaluate your ability to architect scalable systems handling high-concurrency read traffic, a core requirement for Amazon's massive marketplace. They specifically assess your understanding of inverted indexes, sharding strategies, and how to balance consistency with low-latency response times under heavy load.

How to Answer This Question

1. Clarify requirements by defining scale (e.g., millions of SKUs, thousands of queries per second) and latency targets (sub-200ms). 2. Outline the data flow: ingestion pipeline from product database to search index using tools like Kafka. 3. Design the indexing schema, explaining inverted indices for faceted attributes like brand or color. 4. Discuss query processing, focusing on distributed search across shards and caching layers like Redis. 5. Address scaling challenges, such as reindexing during peak sales events without downtime, referencing Amazon's event-driven architecture principles.

Key Points to Cover

  • Explicitly mention inverted indices as the core mechanism for efficient text and facet searching
  • Demonstrate knowledge of horizontal sharding strategies to handle massive SKU volumes
  • Include a specific caching layer strategy like Redis to reduce database load
  • Address the trade-off between consistency and latency in real-time inventory scenarios
  • Reference event-driven architectures common in Amazon's ecosystem for data ingestion

Sample Answer

To design an e-commerce search system at Amazon scale, I would start by defining non-functional requirements: sub-200ms latency for 99th percentile queries and support for millions of concurrent users. The architecture begins with an ingestion pipeline where product updates are streamed via Kafka to ensure eventual consistency between the transactional database and the search index. For storage, I would deploy Elasticsearch clusters with horizontal sharding based on product IDs to distribute load evenly. Each shard maintains an inverted index mapping terms to document IDs, enabling fast retrieval for text and faceted filters like 'Brand' or 'Price Range'. To handle filtering efficiently, I'd use term vectors for exact matches and range queries for numeric fields. Query routing would involve a gateway service that splits requests into parallel sub-queries across shards, aggregating results locally before returning the top N items. Caching is critical; I'd implement a multi-tier cache with Redis storing frequent filter combinations and result sets to reduce index pressure. Finally, to maintain performance during flash sales, the system must support dynamic scaling of nodes and asynchronous reindexing to prevent blocking writes. This approach ensures high availability and low latency, aligning with Amazon's customer-centric focus on speed and reliability.

Common Mistakes to Avoid

  • Focusing only on SQL databases without explaining why they fail for complex faceted search
  • Ignoring the need for horizontal scaling when discussing cluster architecture
  • Overlooking the importance of caching frequently accessed filter combinations
  • Failing to define clear metrics for success like P99 latency or throughput capacity

Practice This Question with AI

Answer this question orally or via text and get instant AI-powered feedback on your response quality, structure, and delivery.

Start Practicing

Related Interview Questions

Browse all 150 System Design questionsBrowse all 73 Amazon questions