Grokking streaming systems real-time event processing

Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you'll b...

Descripción completa

Detalles Bibliográficos
Otros Autores: Fischer, Joshua, author (author), Wang, Ning, author
Formato: Libro electrónico
Idioma:Inglés
Publicado: Shelter Island, New York : Manning Publications [2021]
Materias:
Ver en Biblioteca Universitat Ramon Llull:https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009657507106719
Tabla de Contenidos:
  • Intro
  • inside front cover
  • Grokking Streaming Systems
  • Copyright
  • brief contents
  • contents
  • front matter
  • preface
  • acknowledgments
  • about this book
  • about the authors
  • Part 1. Getting started with streaming
  • 1 Welcome to Grokking Streaming Systems
  • What is stream processing?
  • Streaming system examples
  • Streaming systems and real time
  • How a streaming system works
  • Applications
  • Backend services
  • Inside a backend service
  • Batch processing systems
  • Inside a batch processing system
  • Stream processing systems
  • Inside a stream processing system
  • The advantages of multi-stage architecture
  • The multi-stage architecture in batch and stream processing systems
  • Compare the systems
  • A model stream processing system
  • Summary
  • Exercise
  • 2 Hello, streaming systems!
  • The chief needs a fancy tollbooth
  • It started as HTTP requests, and it failed
  • AJ and Miranda take time to reflect
  • AJ ponders about streaming systems
  • Comparing backend service and streaming
  • How a streaming system could fit
  • Queues: A foundational concept
  • Data transfer via queues
  • Our streaming framework (the start of it)
  • The Streamwork framework overview
  • Zooming in on the Streamwork engine
  • Core streaming concepts
  • More details of the concepts
  • The streaming job execution flow
  • Your first streaming job
  • Executing the job
  • Inspecting the job execution
  • Look inside the engine
  • Keep events moving
  • The life of a data element
  • Reviewing streaming concepts
  • Summary
  • Exercises
  • 3 Parallelization and data grouping
  • The sensor is emitting more events
  • Even in streaming, real time is hard
  • New concepts: Parallelism is important
  • New concepts: Data parallelism
  • New concepts: Data execution independence
  • New concepts: Task parallelism
  • Data parallelism vs. task parallelism.
  • Parallelism and concurrency
  • Parallelizing the job
  • Parallelizing components
  • Parallelizing sources
  • Viewing job output
  • Parallelizing operators
  • Viewing job output
  • Events and instances
  • Event ordering
  • Event grouping
  • Shuffle grouping
  • Shuffle grouping: Under the hood
  • Fields grouping
  • Fields grouping: Under the hood
  • Event grouping execution
  • Look inside the engine: Event dispatcher
  • Applying fields grouping in your job
  • Event ordering
  • Comparing grouping behaviors
  • Summary
  • Exercises
  • 4 Stream graph
  • A credit card fraud detection system
  • More about the credit card fraud detection system
  • The fraud detection business
  • Streaming isn't always a straight line
  • Zoom into the system
  • The fraud detection job in detail
  • New concepts
  • Upstream and downstream components
  • Stream fan-out and fan-in
  • Graph, directed graph, and DAG
  • DAG in stream processing systems
  • All new concepts in one page
  • Stream fan-out to the analyzers
  • Look inside the engine
  • There is a problem: Efficiency
  • Stream fan-out with different streams
  • Look inside the engine again
  • Communication between the components via channels
  • Multiple channels
  • Stream fan-in to the score aggregator
  • Stream fan-in in the engine
  • A brief introduction to anotherstream fan-in: Join
  • Look at the whole system
  • Graph and streaming jobs
  • The example systems
  • Summary
  • Exercises
  • 5 Delivery semantics
  • The latency requirement of the fraud detection system
  • Revisit the fraud detection job
  • About accuracy
  • Partial result
  • A new streaming job to monitor system usage
  • The new system usage job
  • The requirements of the new system usage job
  • New concepts: (The number of) times delivered and times processed
  • New concept: Delivery semantics
  • Choosing the right semantics
  • At-most-once.
  • The fraud detection job
  • At-least-once
  • At-least-once with acknowledging
  • Track events
  • Handle event processing failures
  • Track early out events
  • Acknowledging code in components
  • New concept: Checkpointing
  • New concept: State
  • Checkpointing in the system usage job for the at-least-once semantic
  • Checkpointing and state manipulation functions
  • State handling code in the transaction source component
  • Exactly-once or effectively-once?
  • Bonus concept: Idempotent operation
  • Exactly-once, finally
  • State handling code in the system usage analyzer component
  • Comparing the delivery semantics again
  • Summary
  • Exercises
  • Up next ...
  • 6 Streaming systems review and a glimpse ahead
  • Streaming system pieces
  • Parallelization and event grouping
  • DAGs and streaming jobs
  • Delivery semantics (guarantees)
  • Delivery semantics used in the credit card fraud detection system
  • Which way to go from here
  • Windowed computations
  • Joining data in real time
  • Backpressure
  • Stateless and stateful computations
  • Part 2. Stepping up
  • 7 Windowed computations
  • Slicing up real-time data
  • Breaking down the problem in detail
  • Breaking down the problem in detail (continued)
  • Two different contexts
  • Windowing in the fraud detection job
  • What exactly are windows?
  • Looking closer into the window
  • New concept: Windowing strategy
  • Fixed windows
  • Fixed windows in the windowed proximity analyzer
  • Detecting fraud with a fixed time window
  • Fixed windows: Time vs. count
  • Sliding windows
  • Sliding windows: Windowed proximity analyzer
  • Detecting fraud with a sliding window
  • Session windows
  • Session windows (continued)
  • Detecting fraud with session windows
  • Summary of windowing strategies
  • Slicing an event stream into data sets
  • Windowing: Concept or implementation
  • Another look.
  • Key-value store 101
  • Implement the windowed proximity analyzer
  • Event time and other times for events
  • Windowing watermark
  • Late events
  • Summary
  • Exercise
  • 8 Join operations
  • Joining emission data on the fly
  • The emissions job version 1
  • The emission resolver
  • Accuracy becomes an issue
  • The enhanced emissions job
  • Focusing on the join
  • What is a join again?
  • How the stream join works
  • Stream join is a different kind of fan-in
  • Vehicle events vs. temperature events
  • Table: A materialized view of streaming
  • Vehicle events are less efficient to be materialized
  • Data integrity quickly became an issue
  • What's the problem with this join operator?
  • Inner join
  • Outer join
  • The inner join vs. outer join
  • Different types of joins
  • Outer joins in streaming systems
  • A new issue: Weak connection
  • Windowed joins
  • Joining two tables instead of joining a stream and table
  • Revisiting the materialized view
  • Summary
  • 9 Backpressure
  • Reliability is critical
  • Review the system
  • Streamlining streaming jobs
  • New concepts: Capacity, utilization, and headroom
  • More about utilization and headroom
  • New concept: Backpressure
  • Measure capacity utilization
  • Backpressure in the Streamwork engine
  • Backpressure in the Streamwork engine: Propagation
  • Our streaming job during a backpressure
  • Backpressure in distributed systems
  • New concept: Backpressure watermarks
  • Another approach to handle lagging instances: Dropping events
  • Why do we want to drop events?
  • Backpressure could be a symptom when the underlying issue is permanent
  • Stopping and resuming may lead to thrashing if the issue is permanent
  • Handle thrashing
  • Summary
  • 10 Stateful computation
  • The migration of the streaming jobs
  • Stateful components in the system usage job
  • Revisit: State.
  • The states in different components
  • State data vs. temporary data
  • Stateful vs. stateless components: The code
  • The stateful source and operator in the system usage job
  • States and checkpoints
  • Checkpoint creation: Timing is hard
  • Event-based timing
  • Creating checkpoints with checkpoint events
  • A checkpoint event is handled by instance executors
  • A checkpoint event flowing through a job
  • Creating checkpoints with checkpoint events at the instance level
  • Checkpoint event synchronization
  • Checkpoint loading and backward compatibility
  • Checkpoint storage
  • Stateful vs. stateless components
  • Manually managed instance states
  • Lambda architecture
  • Summary
  • Exercises
  • 11 Wrap-up: Advanced concepts in streaming systems
  • Is this really the end?
  • Windowed computations
  • The major window types
  • Joining data in real time
  • SQL vs. stream joins
  • Inner joins vs. outer joins
  • Unexpected things can happen in streaming systems
  • Backpressure: Slow down sources or upstream components
  • Another approach to handle lagging instances: Dropping events
  • Backpressure can be a symptom when the underlying issue is permanent
  • Stateful components with checkpoints
  • Event-based timing
  • Stateful vs. stateless components
  • You did it!
  • Appendix. Key concepts covered in this book
  • index.