Creating millions of user sessions using Complex Event Processing -- Prem Santosh & Udaya Shankar(Yelp)
Every day, Yelp connects millions of consumers with great local businesses through the website and mobile apps. We strive to provide our users with an ever-evolving, excellent experience by constantly running a plethora of experiments based on user activity.
每天,Yelp都通过网站和移动应用程序将数百万消费者与当地的大企业联系起来。我们通过不断地运行大量基于用户活动的实验,努力为用户提供不断发展、卓越的体验。
A user session encapsulates all of a single user’s activity until the user has been dormant for 30 minutes. Creating user sessions requires us to process hundreds of millions of log events occurring daily and applying filters on them. Due to the large volume of log events, creation of these sessions presents us with several application level challenges, including: handling of late events, filtering bot traffic, etc. Features like event time and exactly once processing that are provided by Flink made building such a large scale streaming application like ours possible.
用户会话将封装单个用户的所有活动,直到用户休眠30分钟。创建用户会话需要我们处理每天发生的数亿个日志事件,并对它们应用过滤器。由于大量的日志事件,创建这些会话会给我们带来几个应用程序级别的挑战,包括:处理延迟事件、过滤bot流量等。Flink提供的事件时间和一次性处理等功能使构建像我们这样的大规模流式应用程序成为可能。
Our main motivation to move towards streaming from batch processing stemmed from the fact that our feedback on analysis based on user sessions was always a day late and as an added bonus it also meant integrating with our state-of-the-art data-pipeline ecosystem.
我们从批处理转向流式处理的主要动机源于这样一个事实:我们对基于用户会话的分析的反馈总是晚了一天,作为额外的奖励,它还意味着要与我们最先进的数据管道生态系统集成。
In this talk we will not only discuss why Yelp moved from creating user sessions using batch jobs to generating them in near-real-time using Apache Flink but also highlight issues we encountered with continuous bot traffic that never closed the session window, adding custom triggers for long running sessions, duplicate events while allowing late events to be processed, auditing of the created sessions etc.
在本次讨论中,我们不仅将讨论Yelp为什么从使用批处理作业创建用户会话转移到使用Flink近实时生成用户会话,还将重点讨论我们在不关闭会话窗口的连续bot通信中遇到的问题,为长时间运行的会话添加自定义触发器,在允许延迟事件的同时复制事件。要处理的事件、已创建会话的审核等。