first commit

This commit is contained in:
admin
2026-05-20 15:05:35 +08:00
commit ac09b26253
2048 changed files with 189478 additions and 0 deletions

View File

@@ -0,0 +1,74 @@
# SETR
> [Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers](https://arxiv.org/abs/2012.15840)
## Introduction
<!-- [ALGORITHM] -->
<a href="https://github.com/fudan-zvg/SETR">Official Repo</a>
<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11">Code Snippet</a>
## Abstract
<!-- [ABSTRACT] -->
Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.
<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/24582831/142902777-ee2d34b7-a631-4fa7-ad68-118ff5716afe.png" width="80%"/>
</div>
```None
This head has two version head.
```
## Usage
You can download the pretrain from [here](https://github.com/rwightman/pytorch-image-models/releases/download/v0.1-vitjx/jx_vit_large_p16_384-b3be5167.pth). Then you can convert its keys with the script `vit2mmseg.py` in the tools directory.
```shell
python tools/model_converters/vit2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH}
```
E.g.
```shell
python tools/model_converters/vit2mmseg.py \
jx_vit_large_p16_384-b3be5167.pth pretrain/vit_large_p16.pth
```
This script convert the model from `PRETRAIN_PATH` and store the converted model in `STORE_PATH`.
## Results and models
### ADE20K
| Method | Backbone | Crop Size | Batch Size | Lr schd | Mem (GB) | Inf time (fps) | Device | mIoU | mIoU(ms+flip) | config | download |
| ---------- | -------- | --------- | ---------- | ------- | -------- | -------------- | ------ | ----- | ------------: | -------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| SETR Naive | ViT-L | 512x512 | 16 | 160000 | 18.40 | 4.72 | V100 | 48.28 | 49.56 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/setr/setr_vit-l_naive_8xb2-160k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_512x512_160k_b16_ade20k/setr_naive_512x512_160k_b16_ade20k_20210619_191258-061f24f5.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_512x512_160k_b16_ade20k/setr_naive_512x512_160k_b16_ade20k_20210619_191258.log.json) |
| SETR PUP | ViT-L | 512x512 | 16 | 160000 | 19.54 | 4.50 | V100 | 48.24 | 49.99 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/setr/setr_vit-l_pup_8xb2-160k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_512x512_160k_b16_ade20k/setr_pup_512x512_160k_b16_ade20k_20210619_191343-7e0ce826.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_512x512_160k_b16_ade20k/setr_pup_512x512_160k_b16_ade20k_20210619_191343.log.json) |
| SETR MLA | ViT-L | 512x512 | 8 | 160000 | 10.96 | - | V100 | 47.34 | 49.05 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/setr/setr_vit-l-mla_8xb1-160k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b8_ade20k/setr_mla_512x512_160k_b8_ade20k_20210619_191118-c6d21df0.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b8_ade20k/setr_mla_512x512_160k_b8_ade20k_20210619_191118.log.json) |
| SETR MLA | ViT-L | 512x512 | 16 | 160000 | 17.30 | 5.25 | V100 | 47.39 | 49.37 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/setr/setr_vit-l_mla_8xb2-160k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b16_ade20k/setr_mla_512x512_160k_b16_ade20k_20210619_191057-f9741de7.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b16_ade20k/setr_mla_512x512_160k_b16_ade20k_20210619_191057.log.json) |
### Cityscapes
| Method | Backbone | Crop Size | Batch Size | Lr schd | Mem (GB) | Inf time (fps) | Device | mIoU | mIoU(ms+flip) | config | download |
| ---------- | -------- | --------- | ---------- | ------- | -------- | -------------- | ------ | ----- | ------------: | ----------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| SETR Naive | ViT-L | 768x768 | 8 | 80000 | 24.06 | 0.39 | V100 | 78.10 | 80.22 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/setr/setr_vit-l_naive_8xb1-80k_cityscapes-768x768.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_vit-large_8x1_768x768_80k_cityscapes/setr_naive_vit-large_8x1_768x768_80k_cityscapes_20211123_000505-20728e80.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_vit-large_8x1_768x768_80k_cityscapes/setr_naive_vit-large_8x1_768x768_80k_cityscapes_20211123_000505.log.json) |
| SETR PUP | ViT-L | 768x768 | 8 | 80000 | 27.96 | 0.37 | V100 | 79.21 | 81.02 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/setr/setr_vit-l_pup_8xb1-80k_cityscapes-768x768.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_vit-large_8x1_768x768_80k_cityscapes/setr_pup_vit-large_8x1_768x768_80k_cityscapes_20211122_155115-f6f37b8f.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_vit-large_8x1_768x768_80k_cityscapes/setr_pup_vit-large_8x1_768x768_80k_cityscapes_20211122_155115.log.json) |
| SETR MLA | ViT-L | 768x768 | 8 | 80000 | 24.10 | 0.41 | V100 | 77.00 | 79.59 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/setr/setr_vit-l_mla_8xb1-80k_cityscapes-768x768.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_vit-large_8x1_768x768_80k_cityscapes/setr_mla_vit-large_8x1_768x768_80k_cityscapes_20211119_101003-7f8dccbe.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_vit-large_8x1_768x768_80k_cityscapes/setr_mla_vit-large_8x1_768x768_80k_cityscapes_20211119_101003.log.json) |
## Citation
```bibtex
@article{zheng2020rethinking,
title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers},
author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip HS and others},
journal={arXiv preprint arXiv:2012.15840},
year={2020}
}
```

View File

@@ -0,0 +1,197 @@
Collections:
- Name: SETR
License: Apache License 2.0
Metadata:
Training Data:
- ADE20K
- Cityscapes
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
README: configs/setr/README.md
Frameworks:
- PyTorch
Models:
- Name: setr_vit-l_naive_8xb2-160k_ade20k-512x512
In Collection: SETR
Results:
Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 48.28
mIoU(ms+flip): 49.56
Config: configs/setr/setr_vit-l_naive_8xb2-160k_ade20k-512x512.py
Metadata:
Training Data: ADE20K
Batch Size: 16
Architecture:
- ViT-L
- SETR
- Naive
Training Resources: 8x V100 GPUS
Memory (GB): 18.4
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_512x512_160k_b16_ade20k/setr_naive_512x512_160k_b16_ade20k_20210619_191258-061f24f5.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_512x512_160k_b16_ade20k/setr_naive_512x512_160k_b16_ade20k_20210619_191258.log.json
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11
Framework: PyTorch
- Name: setr_vit-l_pup_8xb2-160k_ade20k-512x512
In Collection: SETR
Results:
Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 48.24
mIoU(ms+flip): 49.99
Config: configs/setr/setr_vit-l_pup_8xb2-160k_ade20k-512x512.py
Metadata:
Training Data: ADE20K
Batch Size: 16
Architecture:
- ViT-L
- SETR
- PUP
Training Resources: 8x V100 GPUS
Memory (GB): 19.54
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_512x512_160k_b16_ade20k/setr_pup_512x512_160k_b16_ade20k_20210619_191343-7e0ce826.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_512x512_160k_b16_ade20k/setr_pup_512x512_160k_b16_ade20k_20210619_191343.log.json
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11
Framework: PyTorch
- Name: setr_vit-l-mla_8xb1-160k_ade20k-512x512
In Collection: SETR
Results:
Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 47.34
mIoU(ms+flip): 49.05
Config: configs/setr/setr_vit-l-mla_8xb1-160k_ade20k-512x512.py
Metadata:
Training Data: ADE20K
Batch Size: 8
Architecture:
- ViT-L
- SETR
- MLA
Training Resources: 8x V100 GPUS
Memory (GB): 10.96
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b8_ade20k/setr_mla_512x512_160k_b8_ade20k_20210619_191118-c6d21df0.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b8_ade20k/setr_mla_512x512_160k_b8_ade20k_20210619_191118.log.json
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11
Framework: PyTorch
- Name: setr_vit-l_mla_8xb2-160k_ade20k-512x512
In Collection: SETR
Results:
Task: Semantic Segmentation
Dataset: ADE20K
Metrics:
mIoU: 47.39
mIoU(ms+flip): 49.37
Config: configs/setr/setr_vit-l_mla_8xb2-160k_ade20k-512x512.py
Metadata:
Training Data: ADE20K
Batch Size: 16
Architecture:
- ViT-L
- SETR
- MLA
Training Resources: 8x V100 GPUS
Memory (GB): 17.3
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b16_ade20k/setr_mla_512x512_160k_b16_ade20k_20210619_191057-f9741de7.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_512x512_160k_b16_ade20k/setr_mla_512x512_160k_b16_ade20k_20210619_191057.log.json
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11
Framework: PyTorch
- Name: setr_vit-l_naive_8xb1-80k_cityscapes-768x768
In Collection: SETR
Results:
Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 78.1
mIoU(ms+flip): 80.22
Config: configs/setr/setr_vit-l_naive_8xb1-80k_cityscapes-768x768.py
Metadata:
Training Data: Cityscapes
Batch Size: 8
Architecture:
- ViT-L
- SETR
- Naive
Training Resources: 8x V100 GPUS
Memory (GB): 24.06
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_vit-large_8x1_768x768_80k_cityscapes/setr_naive_vit-large_8x1_768x768_80k_cityscapes_20211123_000505-20728e80.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_naive_vit-large_8x1_768x768_80k_cityscapes/setr_naive_vit-large_8x1_768x768_80k_cityscapes_20211123_000505.log.json
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11
Framework: PyTorch
- Name: setr_vit-l_pup_8xb1-80k_cityscapes-768x768
In Collection: SETR
Results:
Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 79.21
mIoU(ms+flip): 81.02
Config: configs/setr/setr_vit-l_pup_8xb1-80k_cityscapes-768x768.py
Metadata:
Training Data: Cityscapes
Batch Size: 8
Architecture:
- ViT-L
- SETR
- PUP
Training Resources: 8x V100 GPUS
Memory (GB): 27.96
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_vit-large_8x1_768x768_80k_cityscapes/setr_pup_vit-large_8x1_768x768_80k_cityscapes_20211122_155115-f6f37b8f.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_pup_vit-large_8x1_768x768_80k_cityscapes/setr_pup_vit-large_8x1_768x768_80k_cityscapes_20211122_155115.log.json
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11
Framework: PyTorch
- Name: setr_vit-l_mla_8xb1-80k_cityscapes-768x768
In Collection: SETR
Results:
Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 77.0
mIoU(ms+flip): 79.59
Config: configs/setr/setr_vit-l_mla_8xb1-80k_cityscapes-768x768.py
Metadata:
Training Data: Cityscapes
Batch Size: 8
Architecture:
- ViT-L
- SETR
- MLA
Training Resources: 8x V100 GPUS
Memory (GB): 24.1
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_vit-large_8x1_768x768_80k_cityscapes/setr_mla_vit-large_8x1_768x768_80k_cityscapes_20211119_101003-7f8dccbe.pth
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/setr/setr_mla_vit-large_8x1_768x768_80k_cityscapes/setr_mla_vit-large_8x1_768x768_80k_cityscapes_20211119_101003.log.json
Paper:
Title: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective
with Transformers
URL: https://arxiv.org/abs/2012.15840
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.17.0/mmseg/models/decode_heads/setr_up_head.py#L11
Framework: PyTorch

View File

@@ -0,0 +1,90 @@
_base_ = [
'../_base_/models/setr_mla.py', '../_base_/datasets/ade20k.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
]
crop_size = (512, 512)
data_preprocessor = dict(size=crop_size)
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
data_preprocessor=data_preprocessor,
pretrained=None,
backbone=dict(
img_size=(512, 512),
drop_rate=0.,
init_cfg=dict(
type='Pretrained', checkpoint='pretrain/vit_large_p16.pth')),
decode_head=dict(num_classes=150),
auxiliary_head=[
dict(
type='FCNHead',
in_channels=256,
channels=256,
in_index=0,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=0,
kernel_size=1,
concat_input=False,
num_classes=150,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='FCNHead',
in_channels=256,
channels=256,
in_index=1,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=0,
kernel_size=1,
concat_input=False,
num_classes=150,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='FCNHead',
in_channels=256,
channels=256,
in_index=2,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=0,
kernel_size=1,
concat_input=False,
num_classes=150,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='FCNHead',
in_channels=256,
channels=256,
in_index=3,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=0,
kernel_size=1,
concat_input=False,
num_classes=150,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
],
test_cfg=dict(mode='slide', crop_size=(512, 512), stride=(341, 341)),
)
optimizer = dict(lr=0.001, weight_decay=0.0)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=optimizer,
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}))
# num_gpus: 8 -> batch_size: 8
train_dataloader = dict(batch_size=1)
val_dataloader = dict(batch_size=1)
test_dataloader = val_dataloader

View File

@@ -0,0 +1,23 @@
_base_ = [
'../_base_/models/setr_mla.py', '../_base_/datasets/cityscapes_768x768.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py'
]
crop_size = (768, 768)
data_preprocessor = dict(size=crop_size)
model = dict(
data_preprocessor=data_preprocessor,
pretrained=None,
backbone=dict(
drop_rate=0,
init_cfg=dict(
type='Pretrained', checkpoint='pretrain/vit_large_p16.pth')),
test_cfg=dict(mode='slide', crop_size=(768, 768), stride=(512, 512)))
optimizer = dict(lr=0.002, weight_decay=0.0)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=optimizer,
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}))
train_dataloader = dict(batch_size=1)
val_dataloader = dict(batch_size=1)
test_dataloader = val_dataloader

View File

@@ -0,0 +1,6 @@
_base_ = ['./setr_vit-l-mla_8xb1-160k_ade20k-512x512.py']
# num_gpus: 8 -> batch_size: 16
train_dataloader = dict(batch_size=2)
val_dataloader = dict(batch_size=1)
test_dataloader = val_dataloader

View File

@@ -0,0 +1,24 @@
_base_ = [
'../_base_/models/setr_naive.py',
'../_base_/datasets/cityscapes_768x768.py', '../_base_/default_runtime.py',
'../_base_/schedules/schedule_80k.py'
]
crop_size = (768, 768)
data_preprocessor = dict(size=crop_size)
model = dict(
data_preprocessor=data_preprocessor,
pretrained=None,
backbone=dict(
drop_rate=0.,
init_cfg=dict(
type='Pretrained', checkpoint='pretrain/vit_large_p16.pth')),
test_cfg=dict(mode='slide', crop_size=(768, 768), stride=(512, 512)))
optimizer = dict(weight_decay=0.0)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=optimizer,
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}))
train_dataloader = dict(batch_size=1)
val_dataloader = dict(batch_size=1)
test_dataloader = val_dataloader

View File

@@ -0,0 +1,72 @@
_base_ = [
'../_base_/models/setr_naive.py', '../_base_/datasets/ade20k.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
]
crop_size = (512, 512)
data_preprocessor = dict(size=crop_size)
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
data_preprocessor=data_preprocessor,
pretrained=None,
backbone=dict(
img_size=(512, 512),
drop_rate=0.,
init_cfg=dict(
type='Pretrained', checkpoint='pretrain/vit_large_p16.pth')),
decode_head=dict(num_classes=150),
auxiliary_head=[
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=0,
num_classes=150,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=2,
kernel_size=1,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=1,
num_classes=150,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=2,
kernel_size=1,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=2,
num_classes=150,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=2,
kernel_size=1,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4))
],
test_cfg=dict(mode='slide', crop_size=(512, 512), stride=(341, 341)),
)
optimizer = dict(lr=0.01, weight_decay=0.0)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=optimizer,
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}))
# num_gpus: 8 -> batch_size: 16
train_dataloader = dict(batch_size=2)
val_dataloader = dict(batch_size=1)
test_dataloader = val_dataloader

View File

@@ -0,0 +1,70 @@
_base_ = [
'../_base_/models/setr_pup.py', '../_base_/datasets/cityscapes_768x768.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py'
]
crop_size = (768, 768)
data_preprocessor = dict(size=crop_size)
norm_cfg = dict(type='SyncBN', requires_grad=True)
crop_size = (768, 768)
model = dict(
data_preprocessor=data_preprocessor,
pretrained=None,
backbone=dict(
drop_rate=0.,
init_cfg=dict(
type='Pretrained', checkpoint='pretrain/vit_large_p16.pth')),
auxiliary_head=[
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=0,
num_classes=19,
dropout_ratio=0,
norm_cfg=norm_cfg,
num_convs=2,
up_scale=4,
kernel_size=3,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=1,
num_classes=19,
dropout_ratio=0,
norm_cfg=norm_cfg,
num_convs=2,
up_scale=4,
kernel_size=3,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=2,
num_classes=19,
dropout_ratio=0,
norm_cfg=norm_cfg,
num_convs=2,
up_scale=4,
kernel_size=3,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4))
],
test_cfg=dict(mode='slide', crop_size=crop_size, stride=(512, 512)))
optimizer = dict(weight_decay=0.0)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=optimizer,
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}))
train_dataloader = dict(batch_size=1)
val_dataloader = dict(batch_size=1)
test_dataloader = val_dataloader

View File

@@ -0,0 +1,72 @@
_base_ = [
'../_base_/models/setr_pup.py', '../_base_/datasets/ade20k.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py'
]
crop_size = (512, 512)
data_preprocessor = dict(size=crop_size)
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
data_preprocessor=data_preprocessor,
pretrained=None,
backbone=dict(
img_size=(512, 512),
drop_rate=0.,
init_cfg=dict(
type='Pretrained', checkpoint='pretrain/vit_large_p16.pth')),
decode_head=dict(num_classes=150),
auxiliary_head=[
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=0,
num_classes=150,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=2,
kernel_size=3,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=1,
num_classes=150,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=2,
kernel_size=3,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
dict(
type='SETRUPHead',
in_channels=1024,
channels=256,
in_index=2,
num_classes=150,
dropout_ratio=0,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
num_convs=2,
kernel_size=3,
align_corners=False,
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
],
test_cfg=dict(mode='slide', crop_size=(512, 512), stride=(341, 341)),
)
optimizer = dict(lr=0.001, weight_decay=0.0)
optim_wrapper = dict(
type='OptimWrapper',
optimizer=optimizer,
paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}))
# num_gpus: 8 -> batch_size: 16
train_dataloader = dict(batch_size=2)
val_dataloader = dict(batch_size=1)
test_dataloader = val_dataloader