前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Flink Forward 2019--实战相关(9)--Yelp公司分享CEP应用

Flink Forward 2019--实战相关(9)--Yelp公司分享CEP应用

作者头像
阿泽
发布2019-07-11 17:55:20
5780
发布2019-07-11 17:55:20
举报
文章被收录于专栏:Flink实战应用指南

Creating millions of user sessions using Complex Event Processing -- Prem Santosh & Udaya Shankar(Yelp)

Every day, Yelp connects millions of consumers with great local businesses through the website and mobile apps. We strive to provide our users with an ever-evolving, excellent experience by constantly running a plethora of experiments based on user activity.

每天,Yelp都通过网站和移动应用程序将数百万消费者与当地的大企业联系起来。我们通过不断地运行大量基于用户活动的实验,努力为用户提供不断发展、卓越的体验。

A user session encapsulates all of a single user’s activity until the user has been dormant for 30 minutes. Creating user sessions requires us to process hundreds of millions of log events occurring daily and applying filters on them. Due to the large volume of log events, creation of these sessions presents us with several application level challenges, including: handling of late events, filtering bot traffic, etc. Features like event time and exactly once processing that are provided by Flink made building such a large scale streaming application like ours possible.

用户会话将封装单个用户的所有活动,直到用户休眠30分钟。创建用户会话需要我们处理每天发生的数亿个日志事件,并对它们应用过滤器。由于大量的日志事件,创建这些会话会给我们带来几个应用程序级别的挑战,包括:处理延迟事件、过滤bot流量等。Flink提供的事件时间和一次性处理等功能使构建像我们这样的大规模流式应用程序成为可能。

Our main motivation to move towards streaming from batch processing stemmed from the fact that our feedback on analysis based on user sessions was always a day late and as an added bonus it also meant integrating with our state-of-the-art data-pipeline ecosystem.

我们从批处理转向流式处理的主要动机源于这样一个事实:我们对基于用户会话的分析的反馈总是晚了一天,作为额外的奖励,它还意味着要与我们最先进的数据管道生态系统集成。

In this talk we will not only discuss why Yelp moved from creating user sessions using batch jobs to generating them in near-real-time using Apache Flink but also highlight issues we encountered with continuous bot traffic that never closed the session window, adding custom triggers for long running sessions, duplicate events while allowing late events to be processed, auditing of the created sessions etc.

在本次讨论中,我们不仅将讨论Yelp为什么从使用批处理作业创建用户会话转移到使用Flink近实时生成用户会话,还将重点讨论我们在不关闭会话窗口的连续bot通信中遇到的问题,为长时间运行的会话添加自定义触发器,在允许延迟事件的同时复制事件。要处理的事件、已创建会话的审核等。

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-07-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Flink实战应用指南 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
批量计算
批量计算(BatchCompute,Batch)是为有大数据计算业务的企业、科研单位等提供高性价比且易用的计算服务。批量计算 Batch 可以根据用户提供的批处理规模,智能地管理作业和调动其所需的最佳资源。有了 Batch 的帮助,您可以将精力集中在如何分析和处理数据结果上。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档