Conference item icon

Conference item

EPIC-fusion: audio-visual temporal binding for egocentric action recognition

Abstract:

We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multimodal temporal-binding, i.e. the combination of modalities within a range of temporal offsets. We train the architecture with three modalities - RGB, Flow and Audio - and combine them with mid-level fusion alongside sparse temporal sampling off used representations. In contrast with previous works, modalities are fused before temporal aggregation, with shared modality and fusion weights...

Expand abstract
Publication status:
Published
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.1109/ICCV.2019.00559

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Oxford college:
Brasenose College
Role:
Author
ORCID:
0000-0002-8945-8573
Publisher:
Institute of Electrical and Electronics Engineers Publisher's website
Host title:
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
Pages:
5491-5500
Publication date:
2020-02-27
Acceptance date:
2019-07-22
Event title:
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
Event location:
Seoul, South Korea
Event website:
(http://iccv2019.thecvf.com/
Event start date:
2019-10-27
Event end date:
2019-11-02
DOI:
EISSN:
2380-7504
ISSN:
1550-5499
EISBN:
9781728148038
ISBN:
9781728148045
Language:
English
Keywords:
Pubs id:
pubs:1060199
UUID:
uuid:4c4ef89e-4342-46d2-924a-3819443dfff9
Local pid:
pubs:1060199
Source identifiers:
1060199
Deposit date:
2019-10-04

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP