first commit
This commit is contained in:
52
Seg_All_In_One_MMSeg/configs/knet/README.md
Normal file
52
Seg_All_In_One_MMSeg/configs/knet/README.md
Normal file
@@ -0,0 +1,52 @@
|
||||
# K-Net
|
||||
|
||||
> [K-Net: Towards Unified Image Segmentation](https://arxiv.org/abs/2106.14855)
|
||||
|
||||
## Introduction
|
||||
|
||||
<!-- [ALGORITHM] -->
|
||||
|
||||
<a href="https://github.com/ZwwWayne/K-Net/">Official Repo</a>
|
||||
|
||||
<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392">Code Snippet</a>
|
||||
|
||||
## Abstract
|
||||
|
||||
<!-- [ABSTRACT] -->
|
||||
|
||||
Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous published state-of-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at [this https URL](https://github.com/ZwwWayne/K-Net/).
|
||||
|
||||
<!-- [IMAGE] -->
|
||||
|
||||
<div align=center>
|
||||
<img src="https://user-images.githubusercontent.com/24582831/157008300-9f40905c-b8e8-4a2a-9593-c1177fa35b2c.png" width="90%"/>
|
||||
</div>
|
||||
|
||||
## Results and models
|
||||
|
||||
### ADE20K
|
||||
|
||||
| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | Device | mIoU | mIoU(ms+flip) | config | download |
|
||||
| ---------------- | -------- | --------- | ------- | -------- | -------------- | ------ | ----- | ------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| KNet + FCN | R-50-D8 | 512x512 | 80000 | 7.01 | 19.24 | V100 | 43.60 | 45.12 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/knet/knet-s3_r50-d8_fcn_8xb2-adamw-80k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_043751-abcab920.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_043751.log.json) |
|
||||
| KNet + PSPNet | R-50-D8 | 512x512 | 80000 | 6.98 | 20.04 | V100 | 44.18 | 45.58 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/knet/knet-s3_r50-d8_pspnet_8xb2-adamw-80k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_054634-d2c72240.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_054634.log.json) |
|
||||
| KNet + DeepLabV3 | R-50-D8 | 512x512 | 80000 | 7.42 | 12.10 | V100 | 45.06 | 46.11 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/knet/knet-s3_r50-d8_deeplabv3_8xb2-adamw-80k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_041642-00c8fbeb.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_041642.log.json) |
|
||||
| KNet + UperNet | R-50-D8 | 512x512 | 80000 | 7.34 | 17.11 | V100 | 43.45 | 44.07 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/knet/knet-s3_r50-d8_upernet_8xb2-adamw-80k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220304_125657-215753b0.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220304_125657.log.json) |
|
||||
| KNet + UperNet | Swin-T | 512x512 | 80000 | 7.57 | 15.56 | V100 | 45.84 | 46.27 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/knet/knet-s3_swin-t_upernet_8xb2-adamw-80k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k_20220303_133059-7545e1dc.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k_20220303_133059.log.json) |
|
||||
| KNet + UperNet | Swin-L | 512x512 | 80000 | 13.5 | 8.29 | V100 | 52.05 | 53.24 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/knet/knet-s3_swin-l_upernet_8xb2-adamw-80k_ade20k-512x512.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k_20220303_154559-d8da9a90.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k_20220303_154559.log.json) |
|
||||
| KNet + UperNet | Swin-L | 640x640 | 80000 | 13.54 | 8.29 | V100 | 52.21 | 53.34 | [config](https://github.com/open-mmlab/mmsegmentation/blob/main/configs/knet/knet-s3_swin-l_upernet_8xb2-adamw-80k_ade20k-640x640.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k_20220301_220747-8787fc71.pth) \| [log](https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k_20220301_220747.log.json) |
|
||||
|
||||
Note:
|
||||
|
||||
- All experiments of K-Net are implemented with 8 V100 (32G) GPUs with 2 samplers per GPU.
|
||||
|
||||
# Citation
|
||||
|
||||
```bibtex
|
||||
@inproceedings{zhang2021knet,
|
||||
title={{K-Net: Towards} Unified Image Segmentation},
|
||||
author={Wenwei Zhang and Jiangmiao Pang and Kai Chen and Chen Change Loy},
|
||||
year={2021},
|
||||
booktitle={NeurIPS},
|
||||
}
|
||||
```
|
||||
@@ -0,0 +1,111 @@
|
||||
_base_ = [
|
||||
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
|
||||
'../_base_/schedules/schedule_80k.py'
|
||||
]
|
||||
crop_size = (512, 512)
|
||||
data_preprocessor = dict(
|
||||
type='SegDataPreProcessor',
|
||||
mean=[123.675, 116.28, 103.53],
|
||||
std=[58.395, 57.12, 57.375],
|
||||
bgr_to_rgb=True,
|
||||
pad_val=0,
|
||||
size=crop_size,
|
||||
seg_pad_val=255)
|
||||
# model settings
|
||||
norm_cfg = dict(type='SyncBN', requires_grad=True)
|
||||
num_stages = 3
|
||||
conv_kernel_size = 1
|
||||
model = dict(
|
||||
type='EncoderDecoder',
|
||||
data_preprocessor=data_preprocessor,
|
||||
pretrained='open-mmlab://resnet50_v1c',
|
||||
backbone=dict(
|
||||
type='ResNetV1c',
|
||||
depth=50,
|
||||
num_stages=4,
|
||||
out_indices=(0, 1, 2, 3),
|
||||
dilations=(1, 1, 2, 4),
|
||||
strides=(1, 2, 1, 1),
|
||||
norm_cfg=norm_cfg,
|
||||
norm_eval=False,
|
||||
style='pytorch',
|
||||
contract_dilation=True),
|
||||
decode_head=dict(
|
||||
type='IterativeDecodeHead',
|
||||
num_stages=num_stages,
|
||||
kernel_update_head=[
|
||||
dict(
|
||||
type='KernelUpdateHead',
|
||||
num_classes=150,
|
||||
num_ffn_fcs=2,
|
||||
num_heads=8,
|
||||
num_mask_fcs=1,
|
||||
feedforward_channels=2048,
|
||||
in_channels=512,
|
||||
out_channels=512,
|
||||
dropout=0.0,
|
||||
conv_kernel_size=conv_kernel_size,
|
||||
ffn_act_cfg=dict(type='ReLU', inplace=True),
|
||||
with_ffn=True,
|
||||
feat_transform_cfg=dict(
|
||||
conv_cfg=dict(type='Conv2d'), act_cfg=None),
|
||||
kernel_updator_cfg=dict(
|
||||
type='KernelUpdator',
|
||||
in_channels=256,
|
||||
feat_channels=256,
|
||||
out_channels=256,
|
||||
act_cfg=dict(type='ReLU', inplace=True),
|
||||
norm_cfg=dict(type='LN'))) for _ in range(num_stages)
|
||||
],
|
||||
kernel_generate_head=dict(
|
||||
type='ASPPHead',
|
||||
in_channels=2048,
|
||||
in_index=3,
|
||||
channels=512,
|
||||
dilations=(1, 12, 24, 36),
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))),
|
||||
auxiliary_head=dict(
|
||||
type='FCNHead',
|
||||
in_channels=1024,
|
||||
in_index=2,
|
||||
channels=256,
|
||||
num_convs=1,
|
||||
concat_input=False,
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
|
||||
# model training and testing settings
|
||||
train_cfg=dict(),
|
||||
test_cfg=dict(mode='whole'))
|
||||
|
||||
# optimizer
|
||||
optim_wrapper = dict(
|
||||
_delete_=True,
|
||||
type='OptimWrapper',
|
||||
optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0005),
|
||||
clip_grad=dict(max_norm=1, norm_type=2))
|
||||
# learning policy
|
||||
param_scheduler = [
|
||||
dict(
|
||||
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
|
||||
end=1000),
|
||||
dict(
|
||||
type='MultiStepLR',
|
||||
begin=1000,
|
||||
end=80000,
|
||||
milestones=[60000, 72000],
|
||||
by_epoch=False,
|
||||
)
|
||||
]
|
||||
# In K-Net implementation we use batch size 2 per GPU as default
|
||||
train_dataloader = dict(batch_size=2, num_workers=2)
|
||||
val_dataloader = dict(batch_size=1, num_workers=4)
|
||||
test_dataloader = val_dataloader
|
||||
@@ -0,0 +1,112 @@
|
||||
_base_ = [
|
||||
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
|
||||
'../_base_/schedules/schedule_80k.py'
|
||||
]
|
||||
crop_size = (512, 512)
|
||||
data_preprocessor = dict(
|
||||
type='SegDataPreProcessor',
|
||||
mean=[123.675, 116.28, 103.53],
|
||||
std=[58.395, 57.12, 57.375],
|
||||
bgr_to_rgb=True,
|
||||
pad_val=0,
|
||||
size=crop_size,
|
||||
seg_pad_val=255)
|
||||
# model settings
|
||||
norm_cfg = dict(type='SyncBN', requires_grad=True)
|
||||
num_stages = 3
|
||||
conv_kernel_size = 1
|
||||
model = dict(
|
||||
type='EncoderDecoder',
|
||||
data_preprocessor=data_preprocessor,
|
||||
pretrained='open-mmlab://resnet50_v1c',
|
||||
backbone=dict(
|
||||
type='ResNetV1c',
|
||||
depth=50,
|
||||
num_stages=4,
|
||||
out_indices=(0, 1, 2, 3),
|
||||
dilations=(1, 1, 2, 4),
|
||||
strides=(1, 2, 1, 1),
|
||||
norm_cfg=norm_cfg,
|
||||
norm_eval=False,
|
||||
style='pytorch',
|
||||
contract_dilation=True),
|
||||
decode_head=dict(
|
||||
type='IterativeDecodeHead',
|
||||
num_stages=num_stages,
|
||||
kernel_update_head=[
|
||||
dict(
|
||||
type='KernelUpdateHead',
|
||||
num_classes=150,
|
||||
num_ffn_fcs=2,
|
||||
num_heads=8,
|
||||
num_mask_fcs=1,
|
||||
feedforward_channels=2048,
|
||||
in_channels=512,
|
||||
out_channels=512,
|
||||
dropout=0.0,
|
||||
conv_kernel_size=conv_kernel_size,
|
||||
ffn_act_cfg=dict(type='ReLU', inplace=True),
|
||||
with_ffn=True,
|
||||
feat_transform_cfg=dict(
|
||||
conv_cfg=dict(type='Conv2d'), act_cfg=None),
|
||||
kernel_updator_cfg=dict(
|
||||
type='KernelUpdator',
|
||||
in_channels=256,
|
||||
feat_channels=256,
|
||||
out_channels=256,
|
||||
act_cfg=dict(type='ReLU', inplace=True),
|
||||
norm_cfg=dict(type='LN'))) for _ in range(num_stages)
|
||||
],
|
||||
kernel_generate_head=dict(
|
||||
type='FCNHead',
|
||||
in_channels=2048,
|
||||
in_index=3,
|
||||
channels=512,
|
||||
num_convs=2,
|
||||
concat_input=True,
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))),
|
||||
auxiliary_head=dict(
|
||||
type='FCNHead',
|
||||
in_channels=1024,
|
||||
in_index=2,
|
||||
channels=256,
|
||||
num_convs=1,
|
||||
concat_input=False,
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
|
||||
# model training and testing settings
|
||||
train_cfg=dict(),
|
||||
test_cfg=dict(mode='whole'))
|
||||
# optimizer
|
||||
optim_wrapper = dict(
|
||||
_delete_=True,
|
||||
type='OptimWrapper',
|
||||
optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0005),
|
||||
clip_grad=dict(max_norm=1, norm_type=2))
|
||||
|
||||
# learning policy
|
||||
param_scheduler = [
|
||||
dict(
|
||||
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
|
||||
end=1000),
|
||||
dict(
|
||||
type='MultiStepLR',
|
||||
begin=1000,
|
||||
end=80000,
|
||||
milestones=[60000, 72000],
|
||||
by_epoch=False,
|
||||
)
|
||||
]
|
||||
# In K-Net implementation we use batch size 2 per GPU as default
|
||||
train_dataloader = dict(batch_size=2, num_workers=2)
|
||||
val_dataloader = dict(batch_size=1, num_workers=4)
|
||||
test_dataloader = val_dataloader
|
||||
@@ -0,0 +1,110 @@
|
||||
_base_ = [
|
||||
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
|
||||
'../_base_/schedules/schedule_80k.py'
|
||||
]
|
||||
crop_size = (512, 512)
|
||||
data_preprocessor = dict(
|
||||
type='SegDataPreProcessor',
|
||||
mean=[123.675, 116.28, 103.53],
|
||||
std=[58.395, 57.12, 57.375],
|
||||
bgr_to_rgb=True,
|
||||
pad_val=0,
|
||||
size=crop_size,
|
||||
seg_pad_val=255)
|
||||
# model settings
|
||||
norm_cfg = dict(type='SyncBN', requires_grad=True)
|
||||
num_stages = 3
|
||||
conv_kernel_size = 1
|
||||
model = dict(
|
||||
type='EncoderDecoder',
|
||||
data_preprocessor=data_preprocessor,
|
||||
pretrained='open-mmlab://resnet50_v1c',
|
||||
backbone=dict(
|
||||
type='ResNetV1c',
|
||||
depth=50,
|
||||
num_stages=4,
|
||||
out_indices=(0, 1, 2, 3),
|
||||
dilations=(1, 1, 2, 4),
|
||||
strides=(1, 2, 1, 1),
|
||||
norm_cfg=norm_cfg,
|
||||
norm_eval=False,
|
||||
style='pytorch',
|
||||
contract_dilation=True),
|
||||
decode_head=dict(
|
||||
type='IterativeDecodeHead',
|
||||
num_stages=num_stages,
|
||||
kernel_update_head=[
|
||||
dict(
|
||||
type='KernelUpdateHead',
|
||||
num_classes=150,
|
||||
num_ffn_fcs=2,
|
||||
num_heads=8,
|
||||
num_mask_fcs=1,
|
||||
feedforward_channels=2048,
|
||||
in_channels=512,
|
||||
out_channels=512,
|
||||
dropout=0.0,
|
||||
conv_kernel_size=conv_kernel_size,
|
||||
ffn_act_cfg=dict(type='ReLU', inplace=True),
|
||||
with_ffn=True,
|
||||
feat_transform_cfg=dict(
|
||||
conv_cfg=dict(type='Conv2d'), act_cfg=None),
|
||||
kernel_updator_cfg=dict(
|
||||
type='KernelUpdator',
|
||||
in_channels=256,
|
||||
feat_channels=256,
|
||||
out_channels=256,
|
||||
act_cfg=dict(type='ReLU', inplace=True),
|
||||
norm_cfg=dict(type='LN'))) for _ in range(num_stages)
|
||||
],
|
||||
kernel_generate_head=dict(
|
||||
type='PSPHead',
|
||||
in_channels=2048,
|
||||
in_index=3,
|
||||
channels=512,
|
||||
pool_scales=(1, 2, 3, 6),
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))),
|
||||
auxiliary_head=dict(
|
||||
type='FCNHead',
|
||||
in_channels=1024,
|
||||
in_index=2,
|
||||
channels=256,
|
||||
num_convs=1,
|
||||
concat_input=False,
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
|
||||
# model training and testing settings
|
||||
train_cfg=dict(),
|
||||
test_cfg=dict(mode='whole'))
|
||||
# optimizer
|
||||
optim_wrapper = dict(
|
||||
_delete_=True,
|
||||
type='OptimWrapper',
|
||||
optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0005),
|
||||
clip_grad=dict(max_norm=1, norm_type=2))
|
||||
# learning policy
|
||||
param_scheduler = [
|
||||
dict(
|
||||
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
|
||||
end=1000),
|
||||
dict(
|
||||
type='MultiStepLR',
|
||||
begin=1000,
|
||||
end=80000,
|
||||
milestones=[60000, 72000],
|
||||
by_epoch=False,
|
||||
)
|
||||
]
|
||||
# In K-Net implementation we use batch size 2 per GPU as default
|
||||
train_dataloader = dict(batch_size=2, num_workers=2)
|
||||
val_dataloader = dict(batch_size=1, num_workers=4)
|
||||
test_dataloader = val_dataloader
|
||||
@@ -0,0 +1,111 @@
|
||||
_base_ = [
|
||||
'../_base_/datasets/ade20k.py', '../_base_/default_runtime.py',
|
||||
'../_base_/schedules/schedule_80k.py'
|
||||
]
|
||||
crop_size = (512, 512)
|
||||
data_preprocessor = dict(
|
||||
type='SegDataPreProcessor',
|
||||
mean=[123.675, 116.28, 103.53],
|
||||
std=[58.395, 57.12, 57.375],
|
||||
bgr_to_rgb=True,
|
||||
pad_val=0,
|
||||
size=crop_size,
|
||||
seg_pad_val=255)
|
||||
# model settings
|
||||
norm_cfg = dict(type='SyncBN', requires_grad=True)
|
||||
num_stages = 3
|
||||
conv_kernel_size = 1
|
||||
|
||||
model = dict(
|
||||
type='EncoderDecoder',
|
||||
data_preprocessor=data_preprocessor,
|
||||
pretrained='open-mmlab://resnet50_v1c',
|
||||
backbone=dict(
|
||||
type='ResNetV1c',
|
||||
depth=50,
|
||||
num_stages=4,
|
||||
out_indices=(0, 1, 2, 3),
|
||||
dilations=(1, 1, 1, 1),
|
||||
strides=(1, 2, 2, 2),
|
||||
norm_cfg=norm_cfg,
|
||||
norm_eval=False,
|
||||
style='pytorch',
|
||||
contract_dilation=True),
|
||||
decode_head=dict(
|
||||
type='IterativeDecodeHead',
|
||||
num_stages=num_stages,
|
||||
kernel_update_head=[
|
||||
dict(
|
||||
type='KernelUpdateHead',
|
||||
num_classes=150,
|
||||
num_ffn_fcs=2,
|
||||
num_heads=8,
|
||||
num_mask_fcs=1,
|
||||
feedforward_channels=2048,
|
||||
in_channels=512,
|
||||
out_channels=512,
|
||||
dropout=0.0,
|
||||
conv_kernel_size=conv_kernel_size,
|
||||
ffn_act_cfg=dict(type='ReLU', inplace=True),
|
||||
with_ffn=True,
|
||||
feat_transform_cfg=dict(
|
||||
conv_cfg=dict(type='Conv2d'), act_cfg=None),
|
||||
kernel_updator_cfg=dict(
|
||||
type='KernelUpdator',
|
||||
in_channels=256,
|
||||
feat_channels=256,
|
||||
out_channels=256,
|
||||
act_cfg=dict(type='ReLU', inplace=True),
|
||||
norm_cfg=dict(type='LN'))) for _ in range(num_stages)
|
||||
],
|
||||
kernel_generate_head=dict(
|
||||
type='UPerHead',
|
||||
in_channels=[256, 512, 1024, 2048],
|
||||
in_index=[0, 1, 2, 3],
|
||||
pool_scales=(1, 2, 3, 6),
|
||||
channels=512,
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))),
|
||||
auxiliary_head=dict(
|
||||
type='FCNHead',
|
||||
in_channels=1024,
|
||||
in_index=2,
|
||||
channels=256,
|
||||
num_convs=1,
|
||||
concat_input=False,
|
||||
dropout_ratio=0.1,
|
||||
num_classes=150,
|
||||
norm_cfg=norm_cfg,
|
||||
align_corners=False,
|
||||
loss_decode=dict(
|
||||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
|
||||
# model training and testing settings
|
||||
train_cfg=dict(),
|
||||
test_cfg=dict(mode='whole'))
|
||||
# optimizer
|
||||
optim_wrapper = dict(
|
||||
_delete_=True,
|
||||
type='OptimWrapper',
|
||||
optimizer=dict(type='AdamW', lr=0.0001, weight_decay=0.0005),
|
||||
clip_grad=dict(max_norm=1, norm_type=2))
|
||||
# learning policy
|
||||
param_scheduler = [
|
||||
dict(
|
||||
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
|
||||
end=1000),
|
||||
dict(
|
||||
type='MultiStepLR',
|
||||
begin=1000,
|
||||
end=80000,
|
||||
milestones=[60000, 72000],
|
||||
by_epoch=False,
|
||||
)
|
||||
]
|
||||
# In K-Net implementation we use batch size 2 per GPU as default
|
||||
train_dataloader = dict(batch_size=2, num_workers=2)
|
||||
val_dataloader = dict(batch_size=1, num_workers=4)
|
||||
test_dataloader = val_dataloader
|
||||
@@ -0,0 +1,21 @@
|
||||
_base_ = 'knet-s3_swin-t_upernet_8xb2-adamw-80k_ade20k-512x512.py'
|
||||
|
||||
checkpoint_file = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_large_patch4_window7_224_22k_20220308-d5bdebaf.pth' # noqa
|
||||
# model settings
|
||||
model = dict(
|
||||
pretrained=checkpoint_file,
|
||||
backbone=dict(
|
||||
embed_dims=192,
|
||||
depths=[2, 2, 18, 2],
|
||||
num_heads=[6, 12, 24, 48],
|
||||
window_size=7,
|
||||
use_abs_pos_embed=False,
|
||||
drop_path_rate=0.3,
|
||||
patch_norm=True),
|
||||
decode_head=dict(
|
||||
kernel_generate_head=dict(in_channels=[192, 384, 768, 1536])),
|
||||
auxiliary_head=dict(in_channels=768))
|
||||
# In K-Net implementation we use batch size 2 per GPU as default
|
||||
train_dataloader = dict(batch_size=2, num_workers=2)
|
||||
val_dataloader = dict(batch_size=1, num_workers=4)
|
||||
test_dataloader = val_dataloader
|
||||
@@ -0,0 +1,57 @@
|
||||
_base_ = 'knet-s3_swin-t_upernet_8xb2-adamw-80k_ade20k-512x512.py'
|
||||
|
||||
checkpoint_file = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_large_patch4_window7_224_22k_20220308-d5bdebaf.pth' # noqa
|
||||
# model settings
|
||||
crop_size = (640, 640)
|
||||
data_preprocessor = dict(
|
||||
type='SegDataPreProcessor',
|
||||
mean=[123.675, 116.28, 103.53],
|
||||
std=[58.395, 57.12, 57.375],
|
||||
bgr_to_rgb=True,
|
||||
pad_val=0,
|
||||
size=crop_size,
|
||||
seg_pad_val=255)
|
||||
model = dict(
|
||||
data_preprocessor=data_preprocessor,
|
||||
pretrained=checkpoint_file,
|
||||
backbone=dict(
|
||||
embed_dims=192,
|
||||
depths=[2, 2, 18, 2],
|
||||
num_heads=[6, 12, 24, 48],
|
||||
window_size=7,
|
||||
use_abs_pos_embed=False,
|
||||
drop_path_rate=0.4,
|
||||
patch_norm=True),
|
||||
decode_head=dict(
|
||||
kernel_generate_head=dict(in_channels=[192, 384, 768, 1536])),
|
||||
auxiliary_head=dict(in_channels=768))
|
||||
|
||||
crop_size = (640, 640)
|
||||
train_pipeline = [
|
||||
dict(type='LoadImageFromFile'),
|
||||
dict(type='LoadAnnotations', reduce_zero_label=True),
|
||||
dict(
|
||||
type='RandomResize',
|
||||
scale=(2048, 640),
|
||||
ratio_range=(0.5, 2.0),
|
||||
keep_ratio=True),
|
||||
dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
|
||||
dict(type='RandomFlip', prob=0.5),
|
||||
dict(type='PhotoMetricDistortion'),
|
||||
dict(type='PackSegInputs')
|
||||
]
|
||||
test_pipeline = [
|
||||
dict(type='LoadImageFromFile'),
|
||||
dict(type='Resize', scale=(2048, 640), keep_ratio=True),
|
||||
# add loading annotation after ``Resize`` because ground truth
|
||||
# does not need to do resize data transform
|
||||
dict(type='LoadAnnotations', reduce_zero_label=True),
|
||||
dict(type='PackSegInputs')
|
||||
]
|
||||
train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
|
||||
val_dataloader = dict(dataset=dict(pipeline=test_pipeline))
|
||||
test_dataloader = val_dataloader
|
||||
# In K-Net implementation we use batch size 2 per GPU as default
|
||||
train_dataloader = dict(batch_size=2, num_workers=2)
|
||||
val_dataloader = dict(batch_size=1, num_workers=4)
|
||||
test_dataloader = val_dataloader
|
||||
@@ -0,0 +1,63 @@
|
||||
_base_ = 'knet-s3_r50-d8_upernet_8xb2-adamw-80k_ade20k-512x512.py'
|
||||
|
||||
checkpoint_file = 'https://download.openmmlab.com/mmsegmentation/v0.5/pretrain/swin/swin_tiny_patch4_window7_224_20220308-f41b89d3.pth' # noqa
|
||||
|
||||
# model settings
|
||||
norm_cfg = dict(type='SyncBN', requires_grad=True)
|
||||
num_stages = 3
|
||||
conv_kernel_size = 1
|
||||
|
||||
model = dict(
|
||||
type='EncoderDecoder',
|
||||
pretrained=checkpoint_file,
|
||||
backbone=dict(
|
||||
_delete_=True,
|
||||
type='SwinTransformer',
|
||||
embed_dims=96,
|
||||
depths=[2, 2, 6, 2],
|
||||
num_heads=[3, 6, 12, 24],
|
||||
window_size=7,
|
||||
mlp_ratio=4,
|
||||
qkv_bias=True,
|
||||
qk_scale=None,
|
||||
drop_rate=0.,
|
||||
attn_drop_rate=0.,
|
||||
drop_path_rate=0.3,
|
||||
use_abs_pos_embed=False,
|
||||
patch_norm=True,
|
||||
out_indices=(0, 1, 2, 3)),
|
||||
decode_head=dict(
|
||||
kernel_generate_head=dict(in_channels=[96, 192, 384, 768])),
|
||||
auxiliary_head=dict(in_channels=384))
|
||||
|
||||
optim_wrapper = dict(
|
||||
_delete_=True,
|
||||
type='OptimWrapper',
|
||||
# modify learning rate following the official implementation of Swin Transformer # noqa
|
||||
optimizer=dict(
|
||||
type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.0005),
|
||||
paramwise_cfg=dict(
|
||||
custom_keys={
|
||||
'absolute_pos_embed': dict(decay_mult=0.),
|
||||
'relative_position_bias_table': dict(decay_mult=0.),
|
||||
'norm': dict(decay_mult=0.)
|
||||
}),
|
||||
clip_grad=dict(max_norm=1, norm_type=2))
|
||||
|
||||
# learning policy
|
||||
param_scheduler = [
|
||||
dict(
|
||||
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
|
||||
end=1000),
|
||||
dict(
|
||||
type='MultiStepLR',
|
||||
begin=1000,
|
||||
end=80000,
|
||||
milestones=[60000, 72000],
|
||||
by_epoch=False,
|
||||
)
|
||||
]
|
||||
# In K-Net implementation we use batch size 2 per GPU as default
|
||||
train_dataloader = dict(batch_size=2, num_workers=2)
|
||||
val_dataloader = dict(batch_size=1, num_workers=4)
|
||||
test_dataloader = val_dataloader
|
||||
188
Seg_All_In_One_MMSeg/configs/knet/metafile.yaml
Normal file
188
Seg_All_In_One_MMSeg/configs/knet/metafile.yaml
Normal file
@@ -0,0 +1,188 @@
|
||||
Collections:
|
||||
- Name: KNet
|
||||
License: Apache License 2.0
|
||||
Metadata:
|
||||
Training Data:
|
||||
- ADE20K
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
README: configs/knet/README.md
|
||||
Frameworks:
|
||||
- PyTorch
|
||||
Models:
|
||||
- Name: knet-s3_r50-d8_fcn_8xb2-adamw-80k_ade20k-512x512
|
||||
In Collection: KNet
|
||||
Results:
|
||||
Task: Semantic Segmentation
|
||||
Dataset: ADE20K
|
||||
Metrics:
|
||||
mIoU: 43.6
|
||||
mIoU(ms+flip): 45.12
|
||||
Config: configs/knet/knet-s3_r50-d8_fcn_8xb2-adamw-80k_ade20k-512x512.py
|
||||
Metadata:
|
||||
Training Data: ADE20K
|
||||
Batch Size: 16
|
||||
Architecture:
|
||||
- R-50-D8
|
||||
- KNet
|
||||
- FCN
|
||||
Training Resources: 8x V100 GPUS
|
||||
Memory (GB): 7.01
|
||||
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_043751-abcab920.pth
|
||||
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_fcn_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_043751.log.json
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392
|
||||
Framework: PyTorch
|
||||
- Name: knet-s3_r50-d8_pspnet_8xb2-adamw-80k_ade20k-512x512
|
||||
In Collection: KNet
|
||||
Results:
|
||||
Task: Semantic Segmentation
|
||||
Dataset: ADE20K
|
||||
Metrics:
|
||||
mIoU: 44.18
|
||||
mIoU(ms+flip): 45.58
|
||||
Config: configs/knet/knet-s3_r50-d8_pspnet_8xb2-adamw-80k_ade20k-512x512.py
|
||||
Metadata:
|
||||
Training Data: ADE20K
|
||||
Batch Size: 16
|
||||
Architecture:
|
||||
- R-50-D8
|
||||
- KNet
|
||||
- PSPNet
|
||||
Training Resources: 8x V100 GPUS
|
||||
Memory (GB): 6.98
|
||||
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_054634-d2c72240.pth
|
||||
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_pspnet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_054634.log.json
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392
|
||||
Framework: PyTorch
|
||||
- Name: knet-s3_r50-d8_deeplabv3_8xb2-adamw-80k_ade20k-512x512
|
||||
In Collection: KNet
|
||||
Results:
|
||||
Task: Semantic Segmentation
|
||||
Dataset: ADE20K
|
||||
Metrics:
|
||||
mIoU: 45.06
|
||||
mIoU(ms+flip): 46.11
|
||||
Config: configs/knet/knet-s3_r50-d8_deeplabv3_8xb2-adamw-80k_ade20k-512x512.py
|
||||
Metadata:
|
||||
Training Data: ADE20K
|
||||
Batch Size: 16
|
||||
Architecture:
|
||||
- R-50-D8
|
||||
- KNet
|
||||
- DeepLabV3
|
||||
Training Resources: 8x V100 GPUS
|
||||
Memory (GB): 7.42
|
||||
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_041642-00c8fbeb.pth
|
||||
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_deeplabv3_r50-d8_8x2_512x512_adamw_80k_ade20k_20220228_041642.log.json
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392
|
||||
Framework: PyTorch
|
||||
- Name: knet-s3_r50-d8_upernet_8xb2-adamw-80k_ade20k-512x512
|
||||
In Collection: KNet
|
||||
Results:
|
||||
Task: Semantic Segmentation
|
||||
Dataset: ADE20K
|
||||
Metrics:
|
||||
mIoU: 43.45
|
||||
mIoU(ms+flip): 44.07
|
||||
Config: configs/knet/knet-s3_r50-d8_upernet_8xb2-adamw-80k_ade20k-512x512.py
|
||||
Metadata:
|
||||
Training Data: ADE20K
|
||||
Batch Size: 16
|
||||
Architecture:
|
||||
- R-50-D8
|
||||
- KNet
|
||||
- UperNet
|
||||
Training Resources: 8x V100 GPUS
|
||||
Memory (GB): 7.34
|
||||
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220304_125657-215753b0.pth
|
||||
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_r50-d8_8x2_512x512_adamw_80k_ade20k_20220304_125657.log.json
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392
|
||||
Framework: PyTorch
|
||||
- Name: knet-s3_swin-t_upernet_8xb2-adamw-80k_ade20k-512x512
|
||||
In Collection: KNet
|
||||
Results:
|
||||
Task: Semantic Segmentation
|
||||
Dataset: ADE20K
|
||||
Metrics:
|
||||
mIoU: 45.84
|
||||
mIoU(ms+flip): 46.27
|
||||
Config: configs/knet/knet-s3_swin-t_upernet_8xb2-adamw-80k_ade20k-512x512.py
|
||||
Metadata:
|
||||
Training Data: ADE20K
|
||||
Batch Size: 16
|
||||
Architecture:
|
||||
- Swin-T
|
||||
- KNet
|
||||
- UperNet
|
||||
Training Resources: 8x V100 GPUS
|
||||
Memory (GB): 7.57
|
||||
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k_20220303_133059-7545e1dc.pth
|
||||
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-t_8x2_512x512_adamw_80k_ade20k_20220303_133059.log.json
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392
|
||||
Framework: PyTorch
|
||||
- Name: knet-s3_swin-l_upernet_8xb2-adamw-80k_ade20k-512x512
|
||||
In Collection: KNet
|
||||
Results:
|
||||
Task: Semantic Segmentation
|
||||
Dataset: ADE20K
|
||||
Metrics:
|
||||
mIoU: 52.05
|
||||
mIoU(ms+flip): 53.24
|
||||
Config: configs/knet/knet-s3_swin-l_upernet_8xb2-adamw-80k_ade20k-512x512.py
|
||||
Metadata:
|
||||
Training Data: ADE20K
|
||||
Batch Size: 16
|
||||
Architecture:
|
||||
- Swin-L
|
||||
- KNet
|
||||
- UperNet
|
||||
Training Resources: 8x V100 GPUS
|
||||
Memory (GB): 13.5
|
||||
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k_20220303_154559-d8da9a90.pth
|
||||
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_512x512_adamw_80k_ade20k_20220303_154559.log.json
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392
|
||||
Framework: PyTorch
|
||||
- Name: knet-s3_swin-l_upernet_8xb2-adamw-80k_ade20k-640x640
|
||||
In Collection: KNet
|
||||
Results:
|
||||
Task: Semantic Segmentation
|
||||
Dataset: ADE20K
|
||||
Metrics:
|
||||
mIoU: 52.21
|
||||
mIoU(ms+flip): 53.34
|
||||
Config: configs/knet/knet-s3_swin-l_upernet_8xb2-adamw-80k_ade20k-640x640.py
|
||||
Metadata:
|
||||
Training Data: ADE20K
|
||||
Batch Size: 16
|
||||
Architecture:
|
||||
- Swin-L
|
||||
- KNet
|
||||
- UperNet
|
||||
Training Resources: 8x V100 GPUS
|
||||
Memory (GB): 13.54
|
||||
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k_20220301_220747-8787fc71.pth
|
||||
Training log: https://download.openmmlab.com/mmsegmentation/v0.5/knet/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k/knet_s3_upernet_swin-l_8x2_640x640_adamw_80k_ade20k_20220301_220747.log.json
|
||||
Paper:
|
||||
Title: 'K-Net: Towards Unified Image Segmentation'
|
||||
URL: https://arxiv.org/abs/2106.14855
|
||||
Code: https://github.com/open-mmlab/mmsegmentation/blob/v0.23.0/mmseg/models/decode_heads/knet_head.py#L392
|
||||
Framework: PyTorch
|
||||
Reference in New Issue
Block a user