Grokking streaming systems real-time event processing
Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you'll b...
Otros Autores: | , |
---|---|
Formato: | Libro electrónico |
Idioma: | Inglés |
Publicado: |
Shelter Island, New York :
Manning Publications
[2021]
|
Materias: | |
Ver en Biblioteca Universitat Ramon Llull: | https://discovery.url.edu/permalink/34CSUC_URL/1im36ta/alma991009657507106719 |
Tabla de Contenidos:
- Intro
- inside front cover
- Grokking Streaming Systems
- Copyright
- brief contents
- contents
- front matter
- preface
- acknowledgments
- about this book
- about the authors
- Part 1. Getting started with streaming
- 1 Welcome to Grokking Streaming Systems
- What is stream processing?
- Streaming system examples
- Streaming systems and real time
- How a streaming system works
- Applications
- Backend services
- Inside a backend service
- Batch processing systems
- Inside a batch processing system
- Stream processing systems
- Inside a stream processing system
- The advantages of multi-stage architecture
- The multi-stage architecture in batch and stream processing systems
- Compare the systems
- A model stream processing system
- Summary
- Exercise
- 2 Hello, streaming systems!
- The chief needs a fancy tollbooth
- It started as HTTP requests, and it failed
- AJ and Miranda take time to reflect
- AJ ponders about streaming systems
- Comparing backend service and streaming
- How a streaming system could fit
- Queues: A foundational concept
- Data transfer via queues
- Our streaming framework (the start of it)
- The Streamwork framework overview
- Zooming in on the Streamwork engine
- Core streaming concepts
- More details of the concepts
- The streaming job execution flow
- Your first streaming job
- Executing the job
- Inspecting the job execution
- Look inside the engine
- Keep events moving
- The life of a data element
- Reviewing streaming concepts
- Summary
- Exercises
- 3 Parallelization and data grouping
- The sensor is emitting more events
- Even in streaming, real time is hard
- New concepts: Parallelism is important
- New concepts: Data parallelism
- New concepts: Data execution independence
- New concepts: Task parallelism
- Data parallelism vs. task parallelism.
- Parallelism and concurrency
- Parallelizing the job
- Parallelizing components
- Parallelizing sources
- Viewing job output
- Parallelizing operators
- Viewing job output
- Events and instances
- Event ordering
- Event grouping
- Shuffle grouping
- Shuffle grouping: Under the hood
- Fields grouping
- Fields grouping: Under the hood
- Event grouping execution
- Look inside the engine: Event dispatcher
- Applying fields grouping in your job
- Event ordering
- Comparing grouping behaviors
- Summary
- Exercises
- 4 Stream graph
- A credit card fraud detection system
- More about the credit card fraud detection system
- The fraud detection business
- Streaming isn't always a straight line
- Zoom into the system
- The fraud detection job in detail
- New concepts
- Upstream and downstream components
- Stream fan-out and fan-in
- Graph, directed graph, and DAG
- DAG in stream processing systems
- All new concepts in one page
- Stream fan-out to the analyzers
- Look inside the engine
- There is a problem: Efficiency
- Stream fan-out with different streams
- Look inside the engine again
- Communication between the components via channels
- Multiple channels
- Stream fan-in to the score aggregator
- Stream fan-in in the engine
- A brief introduction to anotherstream fan-in: Join
- Look at the whole system
- Graph and streaming jobs
- The example systems
- Summary
- Exercises
- 5 Delivery semantics
- The latency requirement of the fraud detection system
- Revisit the fraud detection job
- About accuracy
- Partial result
- A new streaming job to monitor system usage
- The new system usage job
- The requirements of the new system usage job
- New concepts: (The number of) times delivered and times processed
- New concept: Delivery semantics
- Choosing the right semantics
- At-most-once.
- The fraud detection job
- At-least-once
- At-least-once with acknowledging
- Track events
- Handle event processing failures
- Track early out events
- Acknowledging code in components
- New concept: Checkpointing
- New concept: State
- Checkpointing in the system usage job for the at-least-once semantic
- Checkpointing and state manipulation functions
- State handling code in the transaction source component
- Exactly-once or effectively-once?
- Bonus concept: Idempotent operation
- Exactly-once, finally
- State handling code in the system usage analyzer component
- Comparing the delivery semantics again
- Summary
- Exercises
- Up next ...
- 6 Streaming systems review and a glimpse ahead
- Streaming system pieces
- Parallelization and event grouping
- DAGs and streaming jobs
- Delivery semantics (guarantees)
- Delivery semantics used in the credit card fraud detection system
- Which way to go from here
- Windowed computations
- Joining data in real time
- Backpressure
- Stateless and stateful computations
- Part 2. Stepping up
- 7 Windowed computations
- Slicing up real-time data
- Breaking down the problem in detail
- Breaking down the problem in detail (continued)
- Two different contexts
- Windowing in the fraud detection job
- What exactly are windows?
- Looking closer into the window
- New concept: Windowing strategy
- Fixed windows
- Fixed windows in the windowed proximity analyzer
- Detecting fraud with a fixed time window
- Fixed windows: Time vs. count
- Sliding windows
- Sliding windows: Windowed proximity analyzer
- Detecting fraud with a sliding window
- Session windows
- Session windows (continued)
- Detecting fraud with session windows
- Summary of windowing strategies
- Slicing an event stream into data sets
- Windowing: Concept or implementation
- Another look.
- Key-value store 101
- Implement the windowed proximity analyzer
- Event time and other times for events
- Windowing watermark
- Late events
- Summary
- Exercise
- 8 Join operations
- Joining emission data on the fly
- The emissions job version 1
- The emission resolver
- Accuracy becomes an issue
- The enhanced emissions job
- Focusing on the join
- What is a join again?
- How the stream join works
- Stream join is a different kind of fan-in
- Vehicle events vs. temperature events
- Table: A materialized view of streaming
- Vehicle events are less efficient to be materialized
- Data integrity quickly became an issue
- What's the problem with this join operator?
- Inner join
- Outer join
- The inner join vs. outer join
- Different types of joins
- Outer joins in streaming systems
- A new issue: Weak connection
- Windowed joins
- Joining two tables instead of joining a stream and table
- Revisiting the materialized view
- Summary
- 9 Backpressure
- Reliability is critical
- Review the system
- Streamlining streaming jobs
- New concepts: Capacity, utilization, and headroom
- More about utilization and headroom
- New concept: Backpressure
- Measure capacity utilization
- Backpressure in the Streamwork engine
- Backpressure in the Streamwork engine: Propagation
- Our streaming job during a backpressure
- Backpressure in distributed systems
- New concept: Backpressure watermarks
- Another approach to handle lagging instances: Dropping events
- Why do we want to drop events?
- Backpressure could be a symptom when the underlying issue is permanent
- Stopping and resuming may lead to thrashing if the issue is permanent
- Handle thrashing
- Summary
- 10 Stateful computation
- The migration of the streaming jobs
- Stateful components in the system usage job
- Revisit: State.
- The states in different components
- State data vs. temporary data
- Stateful vs. stateless components: The code
- The stateful source and operator in the system usage job
- States and checkpoints
- Checkpoint creation: Timing is hard
- Event-based timing
- Creating checkpoints with checkpoint events
- A checkpoint event is handled by instance executors
- A checkpoint event flowing through a job
- Creating checkpoints with checkpoint events at the instance level
- Checkpoint event synchronization
- Checkpoint loading and backward compatibility
- Checkpoint storage
- Stateful vs. stateless components
- Manually managed instance states
- Lambda architecture
- Summary
- Exercises
- 11 Wrap-up: Advanced concepts in streaming systems
- Is this really the end?
- Windowed computations
- The major window types
- Joining data in real time
- SQL vs. stream joins
- Inner joins vs. outer joins
- Unexpected things can happen in streaming systems
- Backpressure: Slow down sources or upstream components
- Another approach to handle lagging instances: Dropping events
- Backpressure can be a symptom when the underlying issue is permanent
- Stateful components with checkpoints
- Event-based timing
- Stateful vs. stateless components
- You did it!
- Appendix. Key concepts covered in this book
- index.