Sumario: | Facing rapid growth and competition in its online business, Pinterest tech stacks demanded large-scale stateful online processing technology to unlock multiple top initiatives. From lambda functions like micro services to Kafka Stream and Spark streaming, Pinterest explored all of these options and decided to go with a single open source stream processing platform, powered by Apache Flink. Starting from a legacy batch-only data stack and varying levels of expertise among the user team, Pinterest's stream processing platform has evolved in multiple stages into a reliable platform that enables data-driven products and timely decision making. In the first stage, facing the demand of running large stateful applications, we built connectors that worked with Pinterest infrastructure, and streamlined user education with embedded training to fine tune each use case. In the next stage, we onboarded ads real-time ingestion which predict user match with 50% + accuracy in near real time. We also onboarded tier 1 advertiser spend unified calculation that has very stringent SLO in a very tight timeline. As the platform grew, we had to build a more scalable way to offer non-stream processing or infrastructure background engineered to unblock real-time machine learning use cases. Thus, reprocess, backfill source, live debugging automation, verification, and deployment automation, as well as a dependency injection based job editor, were built into the platform and became widely adopted, empowering a large number of jobs to production in a relatively small amount of time. Eventually, features like unified shopping catalog indexing, content safety and trust, as well as deduplicate 4b+ images on the platform in near real-time were built and launched. And with additional investment in technology and ecosystem teams, Pinterest's stream processing platform is accelerating transforming machine learning use cases, online processing, and data warehousing into near real-time. This case study reviews how Pinterest chose Apache Flink as the technology behind its stream processing platform, how the platform enabled critical use cases and a user base that scaled out and evolved along with product innovation, and lessons learned in implementing and developing this platform. What you will learn--and how to apply it By the end of this case study the viewer will understand: How and why Pinterest chose Apache Flink over other stream processing offerings How Pinterest uses stream processing to unlock business values and what it built to enable those accomplishments How the Pinterest stream processing team evolved and grew its offerings to scale out in alignment with business needs in different phases And the viewer will be able to: Kick start building a stream processing platform Evaluate use case fits in stream processing Avoid pitfalls Pinterest experienced in growing its stream processing platform This case study is for you if... You're a data infrastructure team lead or manager who is interested in implementing stream processing offerings in your organization You're a business executives or technology leader who is looking to add value and transform current ETL based data warehousing to near real-time Recommended follow-up: Unified Flink Source at Pinterest: Streaming Data Processing (article) Detecting Image Similarity in (Near) Real-time Using Apache Flink (article) Pinterest Visual Signals Infrastructure: Evolution from Lambda to Kappa Architecture (article) Real-time experiment analytics at Pinterest using Apache Flink (article).
|