Mid-Level 视觉表示增强通用性和采样高效 for Learning Active Tasks

CreateAMind

发布于 2019-04-28 06:57:48

7950

文章被收录于专栏：CreateAMindCreateAMind

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Active Tasks

https://perceptual.actor/

Abstract

One of the ultimate promises of computer vision is to help robotic agents perform active tasks, like delivering packages or doing household chores. However, the conven- tional approach to solving “vision” is to define a set of of- fline recognition problems (e.g. object detection) and solve those first. This approach faces a challenge from the re- cent rise of Deep Reinforcement Learning frameworks that learn active tasks from scratch using images as input. This poses a set of fundamental questions: what is the role of computer vision if everything can be learned from scratch? Could intermediate vision tasks actually be useful for per- forming arbitrary downstream active tasks?

We show that proper use of mid-level perception confers significant advantages over training from scratch. We im- plement a perception module as a set of mid-level visual representations and demonstrate that learning active tasks with mid-level features is significantly more sample-efficient than scratch and able to generalize in situations where the from-scratch approach fails. However, we show that realiz- ing these gains requires careful selection of the particular mid-level features for each downstream task. Finally, we put forth a simple and efficient perception module based on the results of our study, which can be adopted as a rather generic perception module for active frameworks.

We test three core hypotheses:

I. if mid-level vision pro- vides an advantage in terms of sample efficiency of learning an active task (answer: yes)

II. if mid-level vision provides an advantage towards generalization to unseen spaces (an- swer: yes)

III. if a fixed mid-level vision feature could suf- fice or a set of features would be essential to support arbi- trary active tasks (answer: a set is essential).

Hypothesis I: Does mid-level vision provide an advantage in terms of sample efficiency when learning an active task?

Hypothesis II: Can mid-level vision features generalize better to unseen spaces?

Hypothesis III: Can a single feature support all arbitrary downstream tasks? Or is a set of features required for that?

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2019-01-10，如有侵权请联系 cloudcommunity@tencent.com 删除

编程算法

本文分享自 CreateAMind 微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

编程算法

登录后参与评论

0 条评论

热度