CORNELL CS 5410 - Growing and Evolving a Large eCommerce Site

Unformatted text preview:

eBay’s Scaling OdysseyGrowing and Evolving a Large eCommerce SiteRandy Shoup and Franco TravostinoeBay Distinguished ArchitectsThe 2nd Workshop on Large-Scale DistributedSystems and Middleware (LADIS ’08)IBM T.J. Watson Research Center, Yorktown, NY, USASeptember 15-17, 2008© 2008 eBay Inc. Challenges at Internet Scale• eBay manages …– Over 276,000,000 registered users– Over 2 Billion photos– eBay users trade $2040 in goods every second --$60 billion per year– eBay averages over 2 billion page views per day– eBay has roughly 120 million items for sale in over50,000 categories– eBay site stores over 2 Petabytes of data– eBay Data Warehouse processes 25 Petabytes ofdata per day• In a dynamic environment– 300+ features per quarter– We roll 100,000+ lines of code every two weeks• In 39 countries, in 8 languages, 24x7x365>48 Billion SQL executions/day!© 2008 eBay Inc. Scaling dimensions• Data• Transactions• SLAs• Operations• Deployment• Productivity• Feature velocity• Adaptability© 2008 eBay Inc. Principles and Patterns that we live by© 2008 eBay Inc. Principles for Scaling1. Partition Everything2. Asynchrony Everywhere3. Automate Everything4. Remember Everything Fails5. Embrace Inconsistency© 2008 eBay Inc. Principle 1: Partition EverythingPattern: Functional Segmentation– Segment processing into pools, services, and stages– Segment data along usage boundariesPattern: Horizontal Split– Load-balance processing• Within a pool, all servers are created equal– Split (or “shard”) data along primary access path• Partition by range, modulo of a key, lookup, etc.Corollary: No Session State– User session flow moves through multiple application pools– Absolutely no session state in application tierItem TransactionProductUserAccount FeedbackSearch View ItemSellingBidding Checkout Feedback© 2008 eBay Inc. Principle 2: Asynchrony EverywherePattern: Event Queue– Primary use-case produces event• Create event (ITEM.NEW, ITEM.SOLD) transactionallywith primary insert/update– Consumers subscribe to event• At least once delivery• No guaranteed order• Idempotency and readbackPattern: Message Multicast– Search Feeder publishes item updates• Reads item updates from primary database• Publishes sequenced updates via SRM-inspired protocol– Nodes listen to assigned subset of messages• Update in-memory index in real time• Request recovery (NAK) when messagesare missedItemHost NImage ProcessingSelling Summary UpdateUser MetricsITEM.NEW© 2008 eBay Inc. Consumer AConsumer BConsumer CSLA 15 secondsSLA 30 secondsSLA 5 minutesPrinciple 3: Automate EverythingPattern: Adaptive Configuration– Define SLA for a given logical consumer• E.g., 99% of events processed in 15 seconds– Dynamically adjust config to meet defined SLAPattern: Machine Learning– Dynamically adapt search experience• Determine best inventory and assemble optimal page for that userand context– Feedback loop enables system to learn and improve over time• Collect user behavior• Aggregate and analyze offline• Deploy updated metadata• Decide and serve appropriate experience– Perturbation and dampening© 2008 eBay Inc. Principle 4: Everything FailsPattern: Failure Detection– Servers log all requests• Log all application activity, database and service calls onmulticast message bus• Over 2TB of log messages per day– Listeners automate failure detection and notificationPattern: Rollback– Absolutely no changes to the site which cannot be undone (!)– Every feature has on / off state driven by central configuration• Feature can be immediately turned off for operational or business reasons• Features can be deployed “wired-off” to unroll dependenciesPattern: Graceful Degradation– Application “marks down” an unavailable or distressed resource– Non-critical functionality is removed or ignored– Critical functionality is retried or deferredMessage BusFile LogData CubeAlert ListenerReport ListenerSelling Search View Item© 2008 eBay Inc. Principle 5: Embrace InconsistencyChoose Appropriate Consistency Guarantees– Brewer’s CAP Theorem• To guarantee availability and partition-tolerance, we trade off immediate consistency– Consistency is a spectrum, not binary– Prefer eventual consistency to immediate consistencyImmediate ConsistencyBids , PurchasesEventual ConsistencySearch Engine , Billing System , etc.No ConsistencyPreferencesAvoid Distributed Transactions– eBay does absolutely no distributed transactions – no two-phase commit– Minimize inconsistency through state machines and careful ordering of operations– Eventual consistency through asynchronous event or reconciliation batch© 2008 eBay Inc. The journey ahead© 2008 eBay Inc. What sets the new courseMany-coresVirtualizationSSDModel-driven ManagementInfrastructureBuild-outKPIsCSFsdrivedefinitiongenerateempirical datasay/do matchdrivedefinitionLower cost of operation– power, power, power– manage complexityBusiness enablers– Flexibility– Feature velocityTPS/WattTTMParallel EfficiencyGap from modelIncident repeat rate© 2008 eBay Inc. Challenge: Meta-data unstable equilibrium• We manage relationships rather than isolated thingies• Syntax is easy, semantics are tough• Relax requirements for 100% semantic consistency across the site– for business meta-data– for infrastructure meta-data• Leap over islands of semantically consistent meta-data• Specialize in ontology build-outs and code-gens© 2008 eBay Inc. Challenge: From reactive to proactive-und-predictable• Model-centric architecture– Model reflects relationships, behaviors, constraints– Applies to discrete components throughout the whole system of systems• Failure ≡ gap from model– Failure management enacted upon crossing tolerance threshold T• What’s the Erlang-B equivalent of failing a service request by a random user?– Once I know what that is, can I actively manage it??© 2008 eBay Inc. Example: “Surprise Theory”• Causality between what’s out there and what we see in our Data Centers• We need real time, stable burst detection systemNish Parikh, Neel Sundaresan, Scalable and Near Real-time Burst Detectionfrom eCommerce Queries, KDD ‘08, Aug 24-27, 2008 Las Vegas© 2008 eBay Inc. Challenge: Emergent behaviors• From a complicated system to a complex system• Machine-learning pattern supported by selective feedback loops– Damp excess


View Full Document

CORNELL CS 5410 - Growing and Evolving a Large eCommerce Site

Download Growing and Evolving a Large eCommerce Site
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Growing and Evolving a Large eCommerce Site and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Growing and Evolving a Large eCommerce Site 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?