Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Overview

Keras_cv_attention_models


Usage

Basic Usage

  • Current under works: CMT, CoAtNet training.
  • Install as pip package:
    pip install -U keras-cv-attention-models
    # Or
    pip install -U git+https://github.com/leondgarse/keras_cv_attention_models
    Refer to each sub directory for detail usage.
  • Basic model prediction
    from keras_cv_attention_models import volo
    mm = volo.VOLO_d1(pretrained="imagenet")
    
    """ Run predict """
    import tensorflow as tf
    from tensorflow import keras
    from skimage.data import chelsea
    img = chelsea() # Chelsea the cat
    imm = keras.applications.imagenet_utils.preprocess_input(img, mode='torch')
    pred = mm(tf.expand_dims(tf.image.resize(imm, mm.input_shape[1:3]), 0)).numpy()
    pred = tf.nn.softmax(pred).numpy()  # If classifier activation is not softmax
    print(keras.applications.imagenet_utils.decode_predictions(pred)[0])
    # [('n02124075', 'Egyptian_cat', 0.9692954),
    #  ('n02123045', 'tabby', 0.020203391),
    #  ('n02123159', 'tiger_cat', 0.006867502),
    #  ('n02127052', 'lynx', 0.00017674894),
    #  ('n02123597', 'Siamese_cat', 4.9493494e-05)]
  • Exclude model top layers by set num_classes=0
    from keras_cv_attention_models import resnest
    mm = resnest.ResNest50(num_classes=0)
    print(mm.output_shape)
    # (None, 7, 7, 2048)

Layers

  • attention_layers is __init__.py only, which imports core layers defined in model architectures. Like RelativePositionalEmbedding from botnet, outlook_attention from volo.
from keras_cv_attention_models import attention_layers
aa = attention_layers.RelativePositionalEmbedding()
print(f"{aa(tf.ones([1, 4, 14, 16, 256])).shape = }")
# aa(tf.ones([1, 4, 14, 16, 256])).shape = TensorShape([1, 4, 14, 16, 14, 16])

Model surgery

  • model_surgery including functions used to change model parameters after built.
from keras_cv_attention_models import model_surgery
# Replace all ReLU with PReLU
mm = model_surgery.replace_ReLU(keras.applications.ResNet50(), target_activation='PReLU')

AotNet

  • Keras AotNet is just a ResNet / ResNetV2 like framework, that set parameters like attn_types and se_ratio and others, which is used to apply different types attention layer.
    # Mixing se and outlook and halo and mhsa and cot_attention, 21M parameters
    # 50 is just a picked number that larger than the relative `num_block`
    from keras_cv_attention_models import aotnet
    attn_types = [None, "outlook", ["mhsa", "halo"] * 50, "cot"]
    se_ratio = [0.25, 0, 0, 0]
    mm = aotnet.AotNet50V2(attn_types=attn_types, se_ratio=se_ratio, deep_stem=True, strides=1)

ResNetD

Model Params Image resolution Top1 Acc Download
ResNet50D 25.58M 224 80.530 resnet50d.h5
ResNet101D 44.57M 224 83.022 resnet101d.h5
ResNet152D 60.21M 224 83.680 resnet152d.h5
ResNet200D 64.69 224 83.962 resnet200d.h5

ResNeXt

Model Params Image resolution Top1 Acc Download
ResNeXt50 (32x4d) 25M 224 79.768 resnext50_imagenet.h5
- SWSL 25M 224 82.182 resnext50_swsl.h5
ResNeXt50D (32x4d + deep) 25M 224 79.676 resnext50d_imagenet.h5
ResNeXt101 (32x4d) 42M 224 80.334 resnext101_imagenet.h5
- SWSL 42M 224 83.230 resnext101_swsl.h5
ResNeXt101W (32x8d) 89M 224 79.308 resnext101_imagenet.h5
- SWSL 89M 224 84.284 resnext101w_swsl.h5

ResNetQ

Model Params Image resolution Top1 Acc Download
ResNet51Q 35.7M 224 82.36 resnet51q.h5

BotNet

Model Params Image resolution Top1 Acc Download
botnet50 21M 224 77.604 botnet50_imagenet.h5
botnet101 41M 224
botnet152 56M 224

VOLO

Model Params Image resolution Top1 Acc Download
volo_d1 27M 224 84.2 volo_d1_224.h5
volo_d1 ↑384 27M 384 85.2 volo_d1_384.h5
volo_d2 59M 224 85.2 volo_d2_224.h5
volo_d2 ↑384 59M 384 86.0 volo_d2_384.h5
volo_d3 86M 224 85.4 volo_d3_224.h5
volo_d3 ↑448 86M 448 86.3 volo_d3_448.h5
volo_d4 193M 224 85.7 volo_d4_224.h5
volo_d4 ↑448 193M 448 86.8 volo_d4_448.h5
volo_d5 296M 224 86.1 volo_d5_224.h5
volo_d5 ↑448 296M 448 87.0 volo_d5_448.h5
volo_d5 ↑512 296M 512 87.1 volo_d5_512.h5

ResNeSt

Model Params Image resolution Top1 Acc Download
resnest50 28M 224 81.03 resnest50.h5
resnest101 49M 256 82.83 resnest101.h5
resnest200 71M 320 83.84 resnest200.h5
resnest269 111M 416 84.54 resnest269.h5

HaloNet

Model Params Image resolution Top1 Acc
HaloNetH0 6.6M 256 77.9
HaloNetH1 9.1M 256 79.9
HaloNetH2 10.3M 256 80.4
HaloNetH3 12.5M 320 81.9
HaloNetH4 19.5M 384 83.3
- 21k 19.5M 384 85.5
HaloNetH5 31.6M 448 84.0
HaloNetH6 44.3M 512 84.4
HaloNetH7 67.9M 600 84.9

CoTNet

Model Params Image resolution FLOPs Top1 Acc Download
CoTNet-50 22.2M 224 3.3 81.3 cotnet50_224.h5
CoTNeXt-50 30.1M 224 4.3 82.1
SE-CoTNetD-50 23.1M 224 4.1 81.6 se_cotnetd50_224.h5
CoTNet-101 38.3M 224 6.1 82.8 cotnet101_224.h5
CoTNeXt-101 53.4M 224 8.2 83.2
SE-CoTNetD-101 40.9M 224 8.5 83.2 se_cotnetd101_224.h5
SE-CoTNetD-152 55.8M 224 17.0 84.0 se_cotnetd152_224.h5
SE-CoTNetD-152 55.8M 320 26.5 84.6 se_cotnetd152_320.h5

CoAtNet

Model Params Image resolution Top1 Acc
CoAtNet-0 25M 224 81.6
CoAtNet-1 42M 224 83.3
CoAtNet-2 75M 224 84.1
CoAtNet-2, ImageNet-21k pretrain 75M 224 87.1
CoAtNet-3 168M 224 84.5
CoAtNet-3, ImageNet-21k pretrain 168M 224 87.6
CoAtNet-3, ImageNet-21k pretrain 168M 512 87.9
CoAtNet-4, ImageNet-21k pretrain 275M 512 88.1
CoAtNet-4, ImageNet-21K + PT-RA-E150 275M 512 88.56

CMT

Model Params Image resolution Top1 Acc
CMTTiny 9.5M 160 79.2
CMTXS 15.2M 192 81.8
CMTSmall 25.1M 224 83.5
CMTBig 45.7M 256 84.5

CoaT

Model Params Image resolution Top1 Acc Download
CoaTLiteTiny 5.7M 224 77.5 coat_lite_tiny_imagenet.h5
CoaTLiteMini 11M 224 79.1 coat_lite_mini_imagenet.h5
CoaTLiteSmall 20M 224 81.9 coat_lite_small_imagenet.h5
CoaTTiny 5.5M 224 78.3 coat_tiny_imagenet.h5
CoaTMini 10M 224 81.0 coat_mini_imagenet.h5

MLP mixer

Model Params Top1 Acc ImageNet Imagenet21k ImageNet SAM
MLPMixerS32 19.1M 68.70
MLPMixerS16 18.5M 73.83
MLPMixerB32 60.3M 75.53 b32_imagenet_sam.h5
MLPMixerB16 59.9M 80.00 b16_imagenet.h5 b16_imagenet21k.h5 b16_imagenet_sam.h5
MLPMixerL32 206.9M 80.67
MLPMixerL16 208.2M 84.82 l16_imagenet.h5 l16_imagenet21k.h5
- input 448 208.2M 86.78
MLPMixerH14 432.3M 86.32
- input 448 432.3M 87.94

ResMLP

Model Params Image resolution Top1 Acc ImageNet
ResMLP12 15M 224 77.8 resmlp12_imagenet.h5
ResMLP24 30M 224 80.8 resmlp24_imagenet.h5
ResMLP36 116M 224 81.1 resmlp36_imagenet.h5
ResMLP_B24 129M 224 83.6 resmlp_b24_imagenet.h5
- imagenet22k 129M 224 84.4 resmlp_b24_imagenet22k.h5

GMLP

Model Params Image resolution Top1 Acc ImageNet
GMLPTiny16 6M 224 72.3
GMLPS16 20M 224 79.6 gmlp_s16_imagenet.h5
GMLPB16 73M 224 81.6

LeViT

Model Params Image resolution Top1 Acc ImageNet
LeViT128S 7.8M 224 76.6 levit128s_imagenet.h5
LeViT128 9.2M 224 78.6 levit128_imagenet.h5
LeViT192 11M 224 80.0 levit192_imagenet.h5
LeViT256 19M 224 81.6 levit256_imagenet.h5
LeViT384 39M 224 82.6 levit384_imagenet.h5

Other implemented keras models


Comments
  • TPU support for VOLO

    TPU support for VOLO

    While trying VOLO with TPU I'm getting this error, any idea how to reolve this?

    InvalidArgumentError: 9 root error(s) found.
      (0) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_127]]
      (1) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_103]]
      (2) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929 ... [truncated]
    
    enhancement 
    opened by awsaf49 14
  • Use YoloR with swin transformer as backbone.

    Use YoloR with swin transformer as backbone.

    @leondgarse I am trying to get inference using yolor with swin backbone but getting the following results. What can be the issue?

    from keras_cv_attention_models import efficientnet, yolor
    from keras_cv_attention_models import swin_transformer_v2
    
    from keras_cv_attention_models import efficientnet, yolor
    bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(256, 256, 3), num_classes=1000)
    model = yolor.YOLOR(backbone=bb) 
    
    from keras_cv_attention_models import test_images
    imm = test_images.dog_cat()
    preds = model(model.preprocess_input(imm))
    bboxs, lables, confidences = model.decode_predictions(preds)[0]
    
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, lables, confidences)
    

    resulting output download

    opened by farazBhatti 10
  • MobileViT

    MobileViT

    Tried to run MobileViT_S model with input shape 256, 256, 3 and got the following error

    UnimplementedError Traceback (most recent call last) in () 2 3 history = model.fit(get_training_dataset_with_oversample(repeat_dataset=True, oversample=True), steps_per_epoch=STEPS_PER_EPOCH, epochs=EPOCHS, ----> 4 validation_data=get_validation_dataset(), validation_steps=VALIDATION_STEPS) 5

    1 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self) 1189 return self._numpy_internal() 1190 except core._NotOkStatusException as e: # pylint: disable=protected-access -> 1191 raise core._status_to_exception(e) from None # pylint: disable=protected-access 1192 1193 @property

    UnimplementedError: 9 root error(s) found. (0) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] (1) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_35/_445]] (2) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_23/_381]] (3) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Pad_8/_407]] (4) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Maximum_2/y/_341]] (5) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f3 ... [truncated]

    bug good first issue 
    opened by KyloRen1 10
  • [General Questions] Rough estimates for training time for pre-training CoAtNet?

    [General Questions] Rough estimates for training time for pre-training CoAtNet?

    Hi, 👋 Thanks for such an amazing library and taking out the time to implement so many parts of the CoatNet paper!

    In your CoAtNet README, you mentioned you use TPU accelerators. Could you provide a ballpark for the amount of time it took for you to train the biggest models and the corresponding accelerators? I have a task for which I wish to use scaled-up models, but I'd have to pre-train on Imagenet first because of low data amount (<5-10M) and squeeze out maximum accuracy from fine-tuning.

    I assume there might've been a few bottlenecks also, perhaps data? 🤔 If you could describe your setup, it would be very helpful to my experiments!

    Sorry for bothering you with minor questions, and again thank you for all your work!

    opened by neel04 9
  • Visualize saliency map with the attention models

    Visualize saliency map with the attention models

    It would be great if some functional code could be included for plotting attention maps using the attention models. Such a functionality has been provided for the vision transformer models at https://github.com/faustomorales/vit-keras. Thanks and looking forward.

    enhancement good first issue 
    opened by sivaramakrishnan-rajaraman 9
  • How to save models ?

    How to save models ?

    @leondgarse I want to save the models in saved_model format. How to do that? When I am attempting it, it is showing me the error

    WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
    

    What can be the soluion for this?

    Code:

    import os
    from keras_cv_attention_models import mobilevit
    pretrained = '/content/mobilevit_xxs_imagenet.h5'
    model = mobilevit.MobileViT_XXS(pretrained=pretrained)
    model.save('mobilevit_xxs_imagenet1k')
    
    opened by sayannath 7
  • The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    In Line 44 of beit.py, you use tf.meshgrid(range(height), range(width)), while it should be tf.meshgrid(range(width), range(height)), isn't it?

    When I ran the code from Line 44 to Line 52 with height=3 and width=4, it gives the output

    [[17 16 15 10  9  8  3  2  1 -4 -5 -6]
     [18 17 16 11 10  9  4  3  2 -3 -4 -5]
     [19 18 17 12 11 10  5  4  3 -2 -3 -4]
     [24 23 22 17 16 15 10  9  8  3  2  1]
     [25 24 23 18 17 16 11 10  9  4  3  2]
     [26 25 24 19 18 17 12 11 10  5  4  3]
     [31 30 29 24 23 22 17 16 15 10  9  8]
     [32 31 30 25 24 23 18 17 16 11 10  9]
     [33 32 31 26 25 24 19 18 17 12 11 10]
     [38 37 36 31 30 29 24 23 22 17 16 15]
     [39 38 37 32 31 30 25 24 23 18 17 16]
     [40 39 38 33 32 31 26 25 24 19 18 17]], shape=(12, 12), dtype=int32)
    

    which seems incorrect.

    Of course, this is not a problem if you assume height==width, but I think tf.meshgrid(range(width), range(height)) gives more readability and can potentially prevent bugs if height != width is supported in the future.

    bug enhancement 
    opened by xskxzr 6
  • Training of YoloXS Model on Coco dataset

    Training of YoloXS Model on Coco dataset

    Hi, I am currently reproducing the coco training on YoloXS model with line below:

    python leondgarse/coco_train_script.py --det_header yolox.YOLOXS --data_name coco/2014 --batch_size 16

    After my training using 30 epochs, I am getting poor result, as

    # Show result
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, labels, confidences, num_classes=80)
    

    b8fb80fa-e897-4a40-a8be-50d4a59b23a1

    Do I have anything configure wrongly? Or any suggestion could I change? Thanks!

    opened by ThePaperFish 5
  • Update for EdgeNeXt

    Update for EdgeNeXt

    I reproduced EdgeNeXt based on torch and your project, Is there any mistake with this code? Why can't it show all layers details,looks like it's missing some layers in “summary”

    import tensorflow as tf
    from tensorflow import keras
    from keras_cv_attention_models.common_layers import (
        layer_norm, activation_by_name
    )
    from tensorflow.keras import initializers
    from keras_cv_attention_models.attention_layers import (
        conv2d_no_bias,
        drop_block,
    )
    import math
    
    BATCH_NORM_DECAY = 0.9
    BATCH_NORM_EPSILON = 1e-5
    TF_BATCH_NORM_EPSILON = 0.001
    LAYER_NORM_EPSILON = 1e-5
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class PositionalEncodingFourier(keras.layers.Layer):
        def __init__(self, hidden_dim=32, dim=768, temperature=10000):
            super(PositionalEncodingFourier, self).__init__()
            self.token_projection = tf.keras.layers.Conv2D(dim, kernel_size=1)
            self.scale = 2 * math.pi
            self.temperature = temperature
            self.hidden_dim = hidden_dim
            self.dim = dim
            self.eps = 1e-6
    
        def __call__(self, B, H, W, *args, **kwargs):
            mask_tf = tf.zeros([B, H, W])
            not_mask_tf = 1 - mask_tf
            y_embed_tf = tf.cumsum(not_mask_tf, axis=1)
            x_embed_tf = tf.cumsum(not_mask_tf, axis=2)
            y_embed_tf = y_embed_tf / (y_embed_tf[:, -1:, :] + self.eps) * self.scale  # 2 * math.pi
            x_embed_tf = x_embed_tf / (x_embed_tf[:, :, -1:] + self.eps) * self.scale  # 2 * math.pi
            dim_t_tf = tf.range(self.hidden_dim, dtype=tf.float32)
            dim_t_tf = self.temperature ** (2 * (dim_t_tf // 2) / self.hidden_dim)
            pos_x_tf = x_embed_tf[:, :, :, None] / dim_t_tf
            pos_y_tf = y_embed_tf[:, :, :, None] / dim_t_tf
            pos_x_tf = tf.reshape(tf.stack([tf.math.sin(pos_x_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_x_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_y_tf = tf.reshape(tf.stack([tf.math.sin(pos_y_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_y_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_tf = tf.concat([pos_y_tf, pos_x_tf], axis=-1)
            pos_tf = self.token_projection(pos_tf)
    
            return pos_tf
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"token_projection": self.token_projection, "scale": self.scale,
                                "temperature": self.temperature, "hidden_dim": self.hidden_dim,
                                "dim": self.dim, "eps": self.eps})
            return base_config
    
    
    def EdgeNeXt(input_shape=(256, 256, 3), depths=[3, 3, 9, 3], dims=[24, 48, 88, 168],
                 global_block=[0, 0, 0, 3], global_block_type=['None', 'None', 'None', 'SDTA'],
                 drop_path_rate=1, layer_scale_init_value=1e-6, head_init_scale=1., expan_ratio=4,
                 kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False],
                 use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], epsilon=1e-6, model_name='EdgeNeXt'):
        inputs = keras.layers.Input(input_shape, batch_size=2)
    
        nn = conv2d_no_bias(inputs, dims[0], kernel_size=4, strides=4, padding="valid", name="stem_")
        nn = layer_norm(nn, epsilon=epsilon, name='stem_')
    
        drop_connect_rates = tf.linspace(0, stop=drop_path_rate, num=int(
            sum(depths)))  # drop_connect_rates_split(num_blocks, start=0.0, end=drop_connect_rate)
        cur = 0
        for i in range(4):
            for j in range(depths[i]):
                if j > depths[i] - global_block[i] - 1:
                    if global_block_type[i] == 'SDTA':
                        SDTA_encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                     expan_ratio=expan_ratio, scales=d2_scales[i],
                                     use_pos_emb=use_pos_embd_xca[i], num_heads=heads[i], name='stage_'+str(i)+'_SDTA_encoder_'+str(j))(nn)
                    else:
                        raise NotImplementedError
                else:
                    if i != 0 and j == 0:
                        nn = layer_norm(nn, epsilon=epsilon, name='stage_' + str(i) + '_')
                        nn = conv2d_no_bias(nn, dims[i], kernel_size=2, strides=2, padding="valid",
                                            name='stage_' + str(i) + '_')
    
                    Conv_Encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                 layer_scale_init_value=layer_scale_init_value,
                                 expan_ratio=expan_ratio, kernel_size=kernel_sizes[i], name='stage_'+str(i)+'_Conv_Encoder_'+str(j) + '_')(nn)  # drop_connect_rates[cur + j]
    
        model = keras.models.Model(inputs, nn, name=model_name)
        return model
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class Conv_Encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4, kernel_size=7, epsilon=1e-6,
                     name=''):
    
            super(Conv_Encoder, self).__init__()
            self.encoder_name = name
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_path = drop_path
            self.dim = dim
            self.expan_ratio = expan_ratio
            self.kernel_size = kernel_size
            self.epsilon = epsilon
    
        def __call__(self, x, *args, **kwargs):
            inputs = x
            x = keras.layers.Conv2D(self.dim, kernel_size=self.kernel_size, padding="SAME", name=self.encoder_name +'Conv2D')(x)
            x = layer_norm(x, epsilon=self.epsilon, name=self.encoder_name)
            x = keras.layers.Dense(self.expan_ratio * self.dim)(x)
            x = activation_by_name(x, activation="gelu")
            x = keras.layers.Dense(self.dim)(x)
            if self.gamma is not None:
                x = self.gamma * x
    
            x = inputs + drop_block(x, drop_rate=0.)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"gamma": self.gamma, "drop_path": self.drop_path,
                                "dim": self.dim, "expan_ratio": self.expan_ratio,
                                "kernel_size": self.kernel_size})
            return base_config
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class SDTA_encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4,
                     use_pos_emb=True, num_heads=8, qkv_bias=True, attn_drop=0., drop=0., scales=1, zero_gamma=False,
                     activation='gelu', use_bias=False, name='sdf'):
            super(SDTA_encoder, self).__init__()
            self.expan_ratio = expan_ratio
            self.width = max(int(math.ceil(dim / scales)), int(math.floor(dim // scales)))
            self.width_list = [self.width] * (scales - 1)
            self.width_list.append(dim - self.width * (scales - 1))
            self.dim = dim
            self.scales = scales
            if scales == 1:
                self.nums = 1
            else:
                self.nums = scales - 1
            self.pos_embd = None
            if use_pos_emb:
                self.pos_embd = PositionalEncodingFourier(dim=dim)
            self.xca = XCA(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)
            self.gamma_xca = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                         name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_rate = drop_path
            self.drop_path = keras.layers.Dropout(drop_path)
            gamma_initializer = tf.zeros_initializer() if zero_gamma else tf.ones_initializer()
            self.norm = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                        name=name and name + "ln")
            self.norm_xca = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                            name=name and name + "norm_xca")
            self.activation = activation
            self.use_bias = use_bias
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"width": self.width, "dim": self.dim,
                                "nums": self.nums, "pos_embd": self.pos_embd,
                                "xca": self.xca, "gamma_xca": self.gamma_xca,
                                "gamma": self.gamma, "norm": self.norm,
                                "activation": self.activation, "use_bias": self.use_bias,
                                })
            return base_config
    
        def __call__(self, inputs, *args, **kwargs):
            x = inputs
            spx = tf.split(inputs, self.width_list, axis=-1)
            for i in range(self.nums):
                if i == 0:
                    sp = spx[i]
                else:
                    sp = sp + spx[i]
                sp = keras.layers.Conv2D(self.width, kernel_size=3, padding='SAME')(sp)  # , groups=self.width
                if i == 0:
                    out = sp
                else:
                    out = tf.concat([out, sp], -1)
            inputs = tf.concat([out, spx[self.nums]], -1)
    
            # XCA
            B, H, W, C = inputs.shape
            inputs = tf.reshape(inputs, (-1, H * W, C))  # tf.transpose(), perm=[0, 2, 1])
    
            if self.pos_embd:
                pos_encoding = tf.reshape(self.pos_embd(B, H, W), (-1, H * W, C))
                inputs += pos_encoding
    
            if self.gamma_xca is not None:
                inputs = self.gamma_xca * inputs
            input_xca = self.gamma_xca * self.xca(self.norm_xca(inputs))
            inputs = inputs + drop_block(input_xca, drop_rate=self.drop_rate, name="SDTA_encoder_")
            inputs = tf.reshape(inputs, (-1, H, W, C))
    
            # Inverted Bottleneck
            inputs = self.norm(inputs)
            inputs = keras.layers.Conv2D(self.expan_ratio * self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            inputs = activation_by_name(inputs, activation=self.activation)
            inputs = keras.layers.Conv2D(self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            if self.gamma is not None:
                inputs = self.gamma * inputs
    
            x = x + self.drop_path(inputs)
            return x
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class XCA(keras.layers.Layer):
        def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0., name=""):
            super(XCA, self).__init__()
            self.num_heads = num_heads
            self.temperature = tf.Variable(tf.ones(num_heads, 1, 1), trainable=True, name=name + 'gamma')
    
            self.qkv = keras.layers.Dense(dim * 3, use_bias=qkv_bias)
            self.attn_drop = keras.layers.Dropout(attn_drop)
            self.k_ini = initializers.GlorotUniform()
            self.b_ini = initializers.Zeros()
            self.proj = keras.layers.Dense(dim, name="out",
                                           kernel_initializer=self.k_ini, bias_initializer=self.b_ini)
            self.proj_drop = keras.layers.Dropout(proj_drop)
    
        def __call__(self, inputs, training=None, *args, **kwargs):
            input_shape = inputs.shape
            qkv = self.qkv(inputs)
            qkv = tf.reshape(qkv, (input_shape[0], input_shape[1], 3,
                                   self.num_heads,
                                   input_shape[2] // self.num_heads))  # [batch, hh * ww, 3, num_heads, dims_per_head]
            qkv = tf.transpose(qkv, perm=[2, 0, 3, 4, 1])  # [3, batch, num_heads, dims_per_head, hh * ww]
            query, key, value = tf.split(qkv, 3, axis=0)  # [batch, num_heads, dims_per_head, hh * ww]
    
            norm_query, norm_key = tf.nn.l2_normalize(tf.squeeze(query), axis=-1, epsilon=1e-6), \
                                   tf.nn.l2_normalize(tf.squeeze(key), axis=-1, epsilon=1e-6)
            attn = tf.matmul(norm_query, norm_key, transpose_b=True)
            attn = tf.transpose(tf.transpose(attn, perm=[0, 2, 3, 1]) * self.temperature, perm=[0, 3, 2, 1])
    
            attn = tf.nn.softmax(attn, axis=-1)
            attn = self.attn_drop(attn, training=training)  # [batch, num_heads, hh * ww, hh * ww]
    
            x = tf.matmul(attn, value)  # [batch, num_heads, hh * ww, dims_per_head]
            x = tf.reshape(x, [input_shape[0], input_shape[1], input_shape[2]])
    
            x = self.proj(x)
            x = self.proj_drop(x)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"num_heads": self.num_heads, "temperature": self.temperature,
                                "qkv": self.qkv, "attn_drop": self.attn_drop,
                                "proj": self.proj, "proj_drop": self.proj_drop})
            return base_config
    
    
    def edgenext_xx_small(pretrained=False, **kwargs):
        # 1.33M & 260.58M @ 256 resolution
        # 71.23% Top-1 accuracy
        # No AA, Color Jitter=0.4, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=51.66 versus 47.67 for MobileViT_XXS
        # For A100: FPS @ BS=1: 212.13 & @ BS=256: 7042.06 versus FPS @ BS=1: 96.68 & @ BS=256: 4624.71 for MobileViT_XXS
        model = EdgeNeXt(depths=[2, 2, 6, 2], dims=[24, 48, 88, 168], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_x_small(pretrained=False, **kwargs):
        # 2.34M & 538.0M @ 256 resolution
        # 75.00% Top-1 accuracy
        # No AA, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=31.61 versus 28.49 for MobileViT_XS
        # For A100: FPS @ BS=1: 179.55 & @ BS=256: 4404.95 versus FPS @ BS=1: 94.55 & @ BS=256: 2361.53 for MobileViT_XS
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[32, 64, 100, 192], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_small(pretrained=False, **kwargs):
        # 5.59M & 1260.59M @ 256 resolution
        # 79.43% Top-1 accuracy
        # AA=True, No Mixup & Cutmix, DropPath=0.1, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=20.47 versus 18.86 for MobileViT_S
        # For A100: FPS @ BS=1: 172.33 & @ BS=256: 3010.25 versus FPS @ BS=1: 93.84 & @ BS=256: 1785.92 for MobileViT_S
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    if __name__ == '__main__':
        model = edgenext_small()
        model.summary()
        # from download_and_load import keras_reload_from_torch_model
        # keras_reload_from_torch_model(
        #     'D:\GitHub\EdgeNeXt\edgenext_small.pth',
        #     keras_model=model,
        #     # tail_align_dict=tail_align_dict,
        #     # full_name_align_dict=full_name_align_dict,
        #     # additional_transfer=additional_transfer,
        #     input_shape=(256, 256),
        #     do_convert=True,
        #     save_name="adaface_ir101_webface4m.h5",
        # )
    
    
    
    ```
    
    
    
    
    
    opened by whalefa1I 5
  • custom layer issue at tflite conversion

    custom layer issue at tflite conversion

    Hi, thanks for the good references.

    I have implemented MobileViT with your package, and tried to convert the trained model into tflite format. At there, I met an error saying,

    Unknown layer: Addons>GroupNormalization. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details
    

    I tried to addd custom layer name as a parameter of model load, but still facing the issue.

    model = tf.keras.models.load_model('./checkpoints/model_best.h5', custom_objects={'AttentionLayer': AttentionLayer})
    

    Is there any way to solve this?

    Thanks,

    bug good first issue 
    opened by mhyeonsoo 4
  • coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    Hi,

    I try to train a model with:

        model = coat.CoaTMini(input_shape=(200, 240, 1), num_classes=240, pretrained=None)
    

    but the model cannot be build, error out with:

    ValueError: Exception encountered when calling layer "tf.__operators__.add" (type TFOpLambda).
    
    Dimensions must be equal, but are 730 and 677 for '{{node tf.__operators__.add/AddV2}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [?,730,216], [?,677,216].
    
    Call arguments received:
      • x=tf.Tensor(shape=(None, 730, 216), dtype=float32)
      • y=tf.Tensor(shape=(None, 677, 216), dtype=float32)
      • name=None
    

    I just wonder something wrong with coat?

    Thanks.

    bug enhancement 
    opened by mw66 4
  • Can you provide the code for converting pytorch weights to tf?

    Can you provide the code for converting pytorch weights to tf?

    Hi. Can you provide the code for converting pytorch weights to tf, such as beit. Because I wanted to try the effect of beitv2's pre-training weights. Thanks!

    opened by 131404060321 1
  • tflite conversion - GPU/XNNPACK fails

    tflite conversion - GPU/XNNPACK fails

    Hi! Thanks for great repo! I have converted the EfficientFormer model to tflite. However, applying both XNNPACK and GPU delegates fail.

    GPU delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite delegate for GPU. Failed to apply GPU delegate. Benchmarking failed.

    XNNPACK delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Failed to apply XNNPACK delegate. Benchmarking failed.

    Do you know what could be the issue? Im using latest tensorflow version for conversion.

    opened by macsmy 3
Releases(yolov7)
Pre-Training 3D Point Cloud Transformers with Masked Point Modeling

Point-BERT: Pre-Training 3D Point Cloud Transformers with Masked Point Modeling Created by Xumin Yu*, Lulu Tang*, Yongming Rao*, Tiejun Huang, Jie Zho

Lulu Tang 306 Jan 06, 2023
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

Region_Learner The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv) We are still cleaning up the code further and pre

Rui Yan 0 Mar 20, 2022
Real time Human Detection Counting

In this python project, we are going to build the Human Detection and Counting System through Webcam or you can give your own video or images. This is a deep learning project on computer vision, whic

Mir Nawaz Ahmad 2 Jun 17, 2022
SuRE Evaluation: A Supplementary Material

SuRE Evaluation: A Supplementary Material This repository contains supplementary material regarding the evaluations presented in the paper Visual Expl

NYU Visualization Lab 0 Dec 14, 2021
A machine learning benchmark of in-the-wild distribution shifts, with data loaders, evaluators, and default models.

WILDS is a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, from tumor identification to wildlife monitoring to poverty mapping.

P-Lambda 437 Dec 30, 2022
Torchreid: Deep learning person re-identification in PyTorch.

Torchreid Torchreid is a library for deep-learning person re-identification, written in PyTorch. It features: multi-GPU training support both image- a

Kaiyang 3.7k Jan 05, 2023
An end-to-end regression problem of predicting the price of properties in Bangalore.

Bangalore-House-Price-Prediction An end-to-end regression problem of predicting the price of properties in Bangalore. Deployed in Heroku using Flask.

Shruti Balan 1 Nov 25, 2022
Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

ScorpioMiku 19 Sep 30, 2022
Stock-history-display - something like a easy yearly review for your stock performance

Stock History Display Available on Heroku: https://stock-history-display.herokua

LiaoJJ 1 Jan 07, 2022
Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord.

numpy2tfrecord Simple helper library to convert a collection of numpy data to tfrecord, and build a tensorflow dataset from the tfrecord. Installation

Ryo Yonetani 2 Jan 16, 2022
Lipschitz-constrained Unsupervised Skill Discovery

Lipschitz-constrained Unsupervised Skill Discovery This repository is the official implementation of Seohong Park, Jongwook Choi*, Jaekyeom Kim*, Hong

Seohong Park 17 Dec 18, 2022
With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function

With this package, you can generate mixed-integer linear programming (MIP) models of trained artificial neural networks (ANNs) using the rectified linear unit (ReLU) activation function. At the momen

ChemEngAI 40 Dec 27, 2022
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
Using deep actor-critic model to learn best strategies in pair trading

Deep-Reinforcement-Learning-in-Stock-Trading Using deep actor-critic model to learn best strategies in pair trading Abstract Partially observed Markov

281 Dec 09, 2022
MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

LxzGordon 1 Oct 24, 2021
A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

A Higher Performance Pytorch Implementation of DeepLab V3 Plus Introduction This repo is an (re-)implementation of Encoder-Decoder with Atrous Separab

linhua 326 Nov 22, 2022
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"

ResDAVEnet-VQ Official PyTorch implementation of Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech What is in this repo? M

Wei-Ning Hsu 21 Aug 23, 2022
Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks

Self-supervised Point Cloud Prediction Using 3D Spatio-temporal Convolutional Networks This is a Pytorch-Lightning implementation of the paper "Self-s

Photogrammetry & Robotics Bonn 111 Dec 06, 2022
[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

Counterfactual Attention Learning Created by Yongming Rao*, Guangyi Chen*, Jiwen Lu, Jie Zhou This repository contains PyTorch implementation for ICCV

Yongming Rao 90 Dec 31, 2022