first commit

This commit is contained in:
admin
2026-05-20 15:05:35 +08:00
commit ac09b26253
2048 changed files with 189478 additions and 0 deletions

View File

@@ -0,0 +1,46 @@
# EMANet
> [Expectation-Maximization Attention Networks for Semantic Segmentation](https://arxiv.org/abs/1907.13426)
## Introduction
<!-- [ALGORITHM] -->
<a href="https://xialipku.github.io/EMANet">Official Repo</a>
<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/ema_head.py#L80">Code Snippet</a>
## Abstract
<!-- [ABSTRACT] -->
Self-attention mechanism has been widely used for various tasks. It is designed to compute the representation of each position by a weighted sum of the features at all positions. Thus, it can capture long-range relations for computer vision tasks. However, it is computationally consuming. Since the attention maps are computed w.r.t all other positions. In this paper, we formulate the attention mechanism into an expectation-maximization manner and iteratively estimate a much more compact set of bases upon which the attention maps are computed. By a weighted summation upon these bases, the resulting representation is low-rank and deprecates noisy information from the input. The proposed Expectation-Maximization Attention (EMA) module is robust to the variance of input and is also friendly in memory and computation. Moreover, we set up the bases maintenance and normalization methods to stabilize its training procedure. We conduct extensive experiments on popular semantic segmentation benchmarks including PASCAL VOC, PASCAL Context and COCO Stuff, on which we set new records.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/24582831/142901186-7bfe15e2-805a-420e-81b0-74f214f20a36.png" width="80%"/>
</div>
## Results and models
### Cityscapes
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | Device | mIoU | mIoU(ms+flip) | config | download |
| ------ | -------- | --------- | ------: | -------: | -------------- | ------ | ----: | ------------- | -------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| EMANet | R-50-D8 | 512x1024 | 80000 | 5.4 | 4.58 | V100 | 77.59 | 79.44 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/emanet/eemanet_r50-d8_4xb2-80k_cityscapes-512x1024.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_512x1024_80k_cityscapes/emanet_r50-d8_512x1024_80k_cityscapes_20200901_100301-c43fcef1.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_512x1024_80k_cityscapes/emanet_r50-d8_512x1024_80k_cityscapes-20200901_100301.log.json) |
| EMANet | R-101-D8 | 512x1024 | 80000 | 6.2 | 2.87 | V100 | 79.10 | 81.21 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/emanet/emanet_r101-d8_4xb2-80k_cityscapes-512x1024.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_512x1024_80k_cityscapes/emanet_r101-d8_512x1024_80k_cityscapes_20200901_100301-2d970745.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_512x1024_80k_cityscapes/emanet_r101-d8_512x1024_80k_cityscapes-20200901_100301.log.json) |
| EMANet | R-50-D8 | 769x769 | 80000 | 8.9 | 1.97 | V100 | 79.33 | 80.49 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/emanet/emanet_r50-d8_4xb2-80k_cityscapes-769x769.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_769x769_80k_cityscapes/emanet_r50-d8_769x769_80k_cityscapes_20200901_100301-16f8de52.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_769x769_80k_cityscapes/emanet_r50-d8_769x769_80k_cityscapes-20200901_100301.log.json) |
| EMANet | R-101-D8 | 769x769 | 80000 | 10.1 | 1.22 | V100 | 79.62 | 81.00 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/emanet/emanet_r101-d8_4xb2-80k_cityscapes-769x769.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_769x769_80k_cityscapes/emanet_r101-d8_769x769_80k_cityscapes_20200901_100301-47a324ce.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_769x769_80k_cityscapes/emanet_r101-d8_769x769_80k_cityscapes-20200901_100301.log.json) |
## Citation
```bibtex
@inproceedings{li2019expectation,
title={Expectation-maximization attention networks for semantic segmentation},
author={Li, Xia and Zhong, Zhisheng and Wu, Jianlong and Yang, Yibo and Lin, Zhouchen and Liu, Hong},
booktitle={Proceedings of the IEEE International Conference on Computer Vision},
pages={9167--9176},
year={2019}
}
```

View File

@@ -0,0 +1,2 @@
_base_ = './emanet_r50-d8_4xb2-80k_cityscapes-512x1024.py'
model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101))

View File

@@ -0,0 +1,2 @@
_base_ = './emanet_r50-d8_4xb2-80k_cityscapes-769x769.py'
model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101))

View File

@@ -0,0 +1,7 @@
_base_ = [
'../_base_/models/emanet_r50-d8.py', '../_base_/datasets/cityscapes.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py'
]
crop_size = (512, 1024)
data_preprocessor = dict(size=crop_size)
model = dict(data_preprocessor=data_preprocessor)

View File

@@ -0,0 +1,12 @@
_base_ = [
'../_base_/models/emanet_r50-d8.py',
'../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py',
'../_base_/schedules/schedule_80k.py'
]
crop_size = (769, 769)
data_preprocessor = dict(size=crop_size)
model = dict(
data_preprocessor=data_preprocessor,
decode_head=dict(align_corners=True),
auxiliary_head=dict(align_corners=True),
test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513)))

View File

@@ -0,0 +1,109 @@
Collections:
- Name: EMANet
License: Apache License 2.0
Metadata:
Training Data:
- Cityscapes
Paper:
Title: Expectation-Maximization Attention Networks for Semantic Segmentation
URL: https://arxiv.org/abs/1907.13426
README: configs/emanet/README.md
Frameworks:
- PyTorch
Models:
- Name: eemanet_r50-d8_4xb2-80k_cityscapes-512x1024
In Collection: EMANet
Results:
Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 77.59
mIoU(ms+flip): 79.44
Config: configs/emanet/eemanet_r50-d8_4xb2-80k_cityscapes-512x1024.py
Metadata:
Training Data: Cityscapes
Batch Size: 8
Architecture:
- R-50-D8
- EMANet
Training Resources: 4x V100 GPUS
Memory (GB): 5.4
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_512x1024_80k_cityscapes/emanet_r50-d8_512x1024_80k_cityscapes_20200901_100301-c43fcef1.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_512x1024_80k_cityscapes/emanet_r50-d8_512x1024_80k_cityscapes-20200901_100301.log.json
Paper:
Title: Expectation-Maximization Attention Networks for Semantic Segmentation
URL: https://arxiv.org/abs/1907.13426
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/ema_head.py#L80
Framework: PyTorch
- Name: emanet_r101-d8_4xb2-80k_cityscapes-512x1024
In Collection: EMANet
Results:
Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 79.1
mIoU(ms+flip): 81.21
Config: configs/emanet/emanet_r101-d8_4xb2-80k_cityscapes-512x1024.py
Metadata:
Training Data: Cityscapes
Batch Size: 8
Architecture:
- R-101-D8
- EMANet
Training Resources: 4x V100 GPUS
Memory (GB): 6.2
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_512x1024_80k_cityscapes/emanet_r101-d8_512x1024_80k_cityscapes_20200901_100301-2d970745.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_512x1024_80k_cityscapes/emanet_r101-d8_512x1024_80k_cityscapes-20200901_100301.log.json
Paper:
Title: Expectation-Maximization Attention Networks for Semantic Segmentation
URL: https://arxiv.org/abs/1907.13426
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/ema_head.py#L80
Framework: PyTorch
- Name: emanet_r50-d8_4xb2-80k_cityscapes-769x769
In Collection: EMANet
Results:
Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 79.33
mIoU(ms+flip): 80.49
Config: configs/emanet/emanet_r50-d8_4xb2-80k_cityscapes-769x769.py
Metadata:
Training Data: Cityscapes
Batch Size: 8
Architecture:
- R-50-D8
- EMANet
Training Resources: 4x V100 GPUS
Memory (GB): 8.9
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_769x769_80k_cityscapes/emanet_r50-d8_769x769_80k_cityscapes_20200901_100301-16f8de52.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_769x769_80k_cityscapes/emanet_r50-d8_769x769_80k_cityscapes-20200901_100301.log.json
Paper:
Title: Expectation-Maximization Attention Networks for Semantic Segmentation
URL: https://arxiv.org/abs/1907.13426
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/ema_head.py#L80
Framework: PyTorch
- Name: emanet_r101-d8_4xb2-80k_cityscapes-769x769
In Collection: EMANet
Results:
Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 79.62
mIoU(ms+flip): 81.0
Config: configs/emanet/emanet_r101-d8_4xb2-80k_cityscapes-769x769.py
Metadata:
Training Data: Cityscapes
Batch Size: 8
Architecture:
- R-101-D8
- EMANet
Training Resources: 4x V100 GPUS
Memory (GB): 10.1
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_769x769_80k_cityscapes/emanet_r101-d8_769x769_80k_cityscapes_20200901_100301-47a324ce.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_769x769_80k_cityscapes/emanet_r101-d8_769x769_80k_cityscapes-20200901_100301.log.json
Paper:
Title: Expectation-Maximization Attention Networks for Semantic Segmentation
URL: https://arxiv.org/abs/1907.13426
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/ema_head.py#L80
Framework: PyTorch

View File

@@ -0,0 +1,105 @@
_base_ = [
'../_base_/models/emanet_r50-d8.py',
'../_base_/datasets/my_dataset_model.py',
'../_base_/default_runtime.py',
'../_base_/schedules/schedule_40k_check_4000.py',
]
norm_cfg = dict(
type='BN',
)
crop_size = (512, 1024)
data_preprocessor = dict(
size=(512, 1024),
mean=[
94.94709810464303,
61.72942233949928,
75.93763705236906,
],
std=[
44.005506081132594,
42.69595666984776,
44.99354156225523,
],
bgr_to_rgb=False,
)
model = dict(
pretrained='./My_Local_Model/open_mmlab/resnet101_v1c.pth',
backbone=dict(
depth=101,
),
data_preprocessor=dict(
size=(512, 1024),
mean=[
94.94709810464303,
61.72942233949928,
75.93763705236906,
],
std=[
44.005506081132594,
42.69595666984776,
44.99354156225523,
],
bgr_to_rgb=False,
),
decode_head=dict(
num_classes=36,
loss_decode=dict(
type='DiceLoss',
use_sigmoid=False,
loss_weight=1.0,
),
align_corners=True,
),
auxiliary_head=dict(
num_classes=36,
loss_decode=dict(
type='DiceLoss',
use_sigmoid=False,
loss_weight=0.4,
),
align_corners=True,
),
)
test_cfg = dict(
mode='slide',
crop_size=(512, 1024),
stride=(341, 682),
)
optim_wrapper = dict(
type='OptimWrapper',
_delete_=True,
optimizer=dict(
type='AdamW',
lr=0.0001,
weight_decay=0.0005,
),
clip_grad=dict(
max_norm=1,
norm_type=2,
),
)
param_scheduler = [
dict(
type='LinearLR',
start_factor=1e-06,
by_epoch=False,
begin=0,
end=1500,
),
dict(
type='PolyLR',
power=0.9,
begin=1500,
end=40000,
eta_min=1e-05,
by_epoch=False,
),
]

View File

@@ -0,0 +1,103 @@
_base_ = [
'../_base_/models/emanet_r50-d8.py',
'../_base_/datasets/my_dataset_model.py',
'../_base_/default_runtime.py',
'../_base_/schedules/schedule_40k_check_4000.py',
]
norm_cfg = dict(
type='BN',
)
crop_size = (512, 512)
data_preprocessor = dict(
size=(512, 512),
mean=[
94.94709810464303,
61.72942233949928,
75.93763705236906,
],
std=[
44.005506081132594,
42.69595666984776,
44.99354156225523,
],
bgr_to_rgb=False,
)
model = dict(
pretrained='./My_Local_Model/open_mmlab/resnet101_v1c.pth',
backbone=dict(
depth=101,
),
data_preprocessor=dict(
size=(512, 512),
mean=[
94.94709810464303,
61.72942233949928,
75.93763705236906,
],
std=[
44.005506081132594,
42.69595666984776,
44.99354156225523,
],
bgr_to_rgb=False,
),
decode_head=dict(
num_classes=36,
loss_decode=dict(
type='DiceLoss',
use_sigmoid=False,
loss_weight=1.0,
),
align_corners=False,
),
auxiliary_head=dict(
num_classes=36,
loss_decode=dict(
type='DiceLoss',
use_sigmoid=False,
loss_weight=0.4,
),
align_corners=False,
),
)
test_cfg = dict(
crop_size=(512, 512),
)
optim_wrapper = dict(
type='OptimWrapper',
_delete_=True,
optimizer=dict(
type='AdamW',
lr=0.0001,
weight_decay=0.0005,
),
clip_grad=dict(
max_norm=1,
norm_type=2,
),
)
param_scheduler = [
dict(
type='LinearLR',
start_factor=1e-06,
by_epoch=False,
begin=0,
end=1500,
),
dict(
type='PolyLR',
power=0.9,
begin=1500,
end=40000,
eta_min=1e-05,
by_epoch=False,
),
]