Loading [MathJax]/jax/output/CommonHTML/config.js
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >Mid-Level 视觉表示 增强通用性和采样高效 for Learning Active Tasks

Mid-Level 视觉表示 增强通用性和采样高效 for Learning Active Tasks

作者头像
CreateAMind
发布于 2019-04-28 06:57:48
发布于 2019-04-28 06:57:48
7870
举报
文章被收录于专栏:CreateAMindCreateAMind

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Active Tasks

https://perceptual.actor/

Abstract

One of the ultimate promises of computer vision is to help robotic agents perform active tasks, like delivering packages or doing household chores. However, the conven- tional approach to solving “vision” is to define a set of of- fline recognition problems (e.g. object detection) and solve those first. This approach faces a challenge from the re- cent rise of Deep Reinforcement Learning frameworks that learn active tasks from scratch using images as input. This poses a set of fundamental questions: what is the role of computer vision if everything can be learned from scratch? Could intermediate vision tasks actually be useful for per- forming arbitrary downstream active tasks?

We show that proper use of mid-level perception confers significant advantages over training from scratch. We im- plement a perception module as a set of mid-level visual representations and demonstrate that learning active tasks with mid-level features is significantly more sample-efficient than scratch and able to generalize in situations where the from-scratch approach fails. However, we show that realiz- ing these gains requires careful selection of the particular mid-level features for each downstream task. Finally, we put forth a simple and efficient perception module based on the results of our study, which can be adopted as a rather generic perception module for active frameworks.

We test three core hypotheses:

I. if mid-level vision pro- vides an advantage in terms of sample efficiency of learning an active task (answer: yes)

II. if mid-level vision provides an advantage towards generalization to unseen spaces (an- swer: yes)

III. if a fixed mid-level vision feature could suf- fice or a set of features would be essential to support arbi- trary active tasks (answer: a set is essential).

Hypothesis I: Does mid-level vision provide an advantage in terms of sample efficiency when learning an active task?

Hypothesis II: Can mid-level vision features generalize better to unseen spaces?

Hypothesis III: Can a single feature support all arbitrary downstream tasks? Or is a set of features required for that?

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2019-01-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
暂无评论
推荐阅读
编辑精选文章
换一批
机器人相关学术速递[7.20]
【1】 Know Thyself: Transferable Visuomotor Control Through Robot-Awareness 标题:认识自我:通过机器人感知实现可转移的视觉运动控制
公众号-arXiv每日学术速递
2021/07/27
7720
ICLR2024推荐系统投稿论文一览
今年ICLR会议已经把审稿意见放出来了,特此整理了一下关于推荐系统相关的论文,总共筛选出31篇。值得说明的是,之前整理的顶会论文都是正式被接收的,比如NeurlPS2023推荐系统论文集锦等。这次由于ICLR是Open Review的,所以目前下文所列出的论文列表不是最终的接收列表,而是投稿列表。正因为如此,我们可以看到每篇论文的投稿过程,了解在投稿过程中所关注论文的审稿意见以及评分,并可以学习一下在投稿过程中如何与审稿人进行“亲切友好”的battle。下文整理了每篇文章的标题、目前获得的评分、论文链接以及论文摘要。大家可以通过链接获取论文的详细评审意见以及论文的原始文件。
张小磊
2023/11/30
1.3K1
ICLR2024推荐系统投稿论文一览
【论文推荐】最新5篇深度学习相关论文推介——感知度量、图像检索、联合视盘和视杯分割、谱聚类、MPI并行
【导读】专知内容组整理了最近人工智能领域相关期刊的5篇最新综述文章,为大家进行介绍,欢迎查看! 1. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (深度特征在感知度量中难以置信的有效性) ---- ---- 作者: Richard Zhang,Phillip Isola,Alexei A. Efros,Eli Shechtman,Oliver Wang 摘要:While it is nearly effor
WZEARW
2018/04/12
1.2K0
【论文推荐】最新5篇深度学习相关论文推介——感知度量、图像检索、联合视盘和视杯分割、谱聚类、MPI并行
免费领取 | 11篇 3DV2022论文
01 The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs
一点人工一点智能
2022/12/27
9340
免费领取 | 11篇 3DV2022论文
计算机视觉学术速递[9.8]
【1】 nnFormer: Interleaved Transformer for Volumetric Segmentation 标题:nnFormer:用于体积分割的交错Transformer 链接:https://arxiv.org/abs/2109.03201
公众号-arXiv每日学术速递
2021/09/16
2.1K0
计算机视觉学术速递[6.17]
【1】 Shuffle Transformer with Feature Alignment for Video Face Parsing 标题:用于视频人脸分析的带特征对齐的置乱变换
公众号-arXiv每日学术速递
2021/07/02
1.8K0
每日学术速递1.27
ICLR 全称为国际学习表征会议(International Conference on Learning Representations),今年将举办的是第 11 届,预计将于 5 月 1 日至 5 日在卢旺达首都基加利线下举办。今年 ICLR 共接收近 5000 篇投稿,整体接收率为 31.8%,接近于去年的 32.26%。今年还有一个变化是接收论文的 tag 会有两个,一个是论文类型(oral、spotlight、poster),另一个是 presentation 的方式。
AiCharm
2023/05/15
2540
每日学术速递1.27
人工智能学术速递[6.23]
【1】 Tracking Instances as Queries 标题:将实例作为查询进行跟踪
公众号-arXiv每日学术速递
2021/07/02
1.3K0
【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索
【导读】既昨天推出七篇图像检索(Image Retrieval)文章,专知内容组今天又推出最近八篇图像检索相关文章,为大家进行介绍,欢迎查看! 1. Improving Deep Binary Embedding Networks by Order-aware Reweighting of Triplets(通过对三元组阶感知重加权来提高深层二进制嵌入网络) ---- ---- 作者:Jikai Chen,Hanjiang Lai,Libing Geng,Yan Pan 机构:Sun Yat-sen Uni
WZEARW
2018/06/05
1.3K0
计算机视觉与模式识别学术速递[12.9]
【1】 Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval 标题:一举多得--用于视频检索的多模式融合转换器 链接:https://arxiv.org/abs/2112.04446
公众号-arXiv每日学术速递
2021/12/09
1.4K0
计算机视觉学术速递[7.16]
【1】 STAR: Sparse Transformer-based Action Recognition 标题:STAR:基于稀疏Transformer的动作识别
公众号-arXiv每日学术速递
2021/07/27
1.6K0
计算机视觉与模式识别学术速递[12.8]
【1】 SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer and Removal 标题:SSAT:一种对称语义感知的补丁迁移与移除转换网络 链接:https://arxiv.org/abs/2112.03631
公众号-arXiv每日学术速递
2021/12/09
1.7K0
机器人相关学术速递[12.6]
【1】 Coupling Vision and Proprioception for Navigation of Legged Robots 标题:腿式机器人导航中的视觉与视觉耦合 链接:https://arxiv.org/abs/2112.02094
公众号-arXiv每日学术速递
2021/12/09
5270
计算机视觉与模式识别学术速递[11.11]
【1】 Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation 标题:用于视觉和语言导航的变长记忆多模转换器 链接:https://arxiv.org/abs/2111.05759
公众号-arXiv每日学术速递
2021/11/17
1K0
计算机视觉与模式识别学术速递[12.16]
【1】 Vision Transformer Based Video Hashing Retrieval for Tracing the Source of Fake Videos 标题:基于视觉变换的视频散列检索追查假视频来源 链接:https://arxiv.org/abs/2112.08117
公众号-arXiv每日学术速递
2021/12/17
1.2K0
计算机视觉学术速递[8.24]
【1】 Discovering Spatial Relationships by Transformers for Domain Generalization 标题:基于变换器域综合的空间关系发现 链接:https://arxiv.org/abs/2108.10046
公众号-arXiv每日学术速递
2021/08/25
2.9K0
CVPR2019 | 10篇论文速递(涵盖全景分割、实例分割和姿态估计等方向)
【导读】CVPR 2019 接收论文列表已经出来了,但只是一些索引号,所以并没有完整的论文合集。CVer 最近也在整理收集,今天一文涵盖10篇 CVPR 2019 论文速递,内容涵盖全景分割、实例分割和姿态估计等方向。
Amusi
2019/12/31
6890
金融/语音/音频处理学术速递[7.29]
【1】 MobilityCoins -- A new currency for the multimodal urban transportation system 标题:机动币--城市多式联运的新货币
公众号-arXiv每日学术速递
2021/07/30
4060
计算机视觉与模式识别学术速递[12.7]
【1】 DoodleFormer: Creative Sketch Drawing with Transformers 标题:DoodleFormer:用Transformer创作素描 链接:https://arxiv.org/abs/2112.03258
公众号-arXiv每日学术速递
2021/12/09
1.3K0
Expressivity,Trainability,and Generalization in Machine Learning
When I read Machine Learning papers, I ask myself whether the contributions of the paper fall under improvements to 1) Expressivity 2) Trainability, and/or 3) Generalization. I learned this categorization from my colleague Jascha Sohl-Dickstein at Google B
WZEARW
2018/04/11
1.1K0
Expressivity,Trainability,and Generalization in Machine Learning
推荐阅读
相关推荐
机器人相关学术速递[7.20]
更多 >
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档