Chengjian Li 李成建
He received the M.S. degree in computer science and technology from the Xi'an University of Technology, Xi'an, China, in 2018. He is currently pursuing the Ph.D. degree in computer science and technology at Nanjing University of Science and Technology, Nanjing, China. His research interests include computer vision, and image processing.

Education
  • NJUST
    NJUST
    Department of Computer Science
    Ph.D. Student
    Sep. 2024 - present
Selected Publications (view all )
OSAMamba: An Adaptive Bidirectional Selective State Space Model For OSA Detection
OSAMamba: An Adaptive Bidirectional Selective State Space Model For OSA Detection

Chengjian Li, Zhenghao Shi, Na Li, Yitong Zhang, Xiaoyong Ren, Xinhong Hei, Haiqin Liu

IEEE Transactions on Instrumentation and Measurement 2025

As the two most typical classic network models, convolutional neural networks (CNNs) and Transformer have been widely applied in obstructive sleep apnea (OSA) detection in recent years. However, due to the inherent limitations of the receptive field in traditional CNN models (the receptive field is positively correlated with the fixed convolutional kernel size, and the ability to extract global feature information is limited), further improvement in their performance is constrained. While, for the Transformer, due to the computational complexity of the self-attention mechanism in the Transformer model increases exponentially with the length of the context, it will hold a very high computational overhead, and which would hinder the deployment of the Transformer on devices with limited computing resources. To address these problems, this article proposes an adaptive bidirectional selective state-space model (ABSM)-based method for OSA detection, termed as OSAMamba. The main novelty of the proposed method lies in the following two aspects: the development of the lightweight multiscale efficient aggregation (LMSEA) module and the propose of ABSM. To achieve the purpose of expanding the model receptive field and capturing the effective temporal features with a very low number of parameters, the LMSEA module adopts a combination of partial convolution (PConv)-based multiscale strategy and convolutional block attention module (CBAM). The purpose of the ABSM module is to reduce the computational cost of the model and improve the model deployability by using a frequency-domain enhancement strategy to fuse the effective time-domain features extracted by adaptive bidirectional Mamba (ABi-Mamba) with linear complexity with the frequency-domain features extracted by the frequency-domain enhancement module (FEM). Extensive experiments on the Apnea-electrocardiogram (ECG) dataset show that of all compared methods, the proposed method obtains the best accuracy of 91.91% in the per-segment detection, and which surpasses the state-of-the-art (SOTA) TFFormer by 0.31%. It also achieves a remarkable accuracy of 100% with the lowest mean absolute error (MAE) of 2.43 in per-record detection.

OSAMamba: An Adaptive Bidirectional Selective State Space Model For OSA Detection

Chengjian Li, Zhenghao Shi, Na Li, Yitong Zhang, Xiaoyong Ren, Xinhong Hei, Haiqin Liu

IEEE Transactions on Instrumentation and Measurement 2025

As the two most typical classic network models, convolutional neural networks (CNNs) and Transformer have been widely applied in obstructive sleep apnea (OSA) detection in recent years. However, due to the inherent limitations of the receptive field in traditional CNN models (the receptive field is positively correlated with the fixed convolutional kernel size, and the ability to extract global feature information is limited), further improvement in their performance is constrained. While, for the Transformer, due to the computational complexity of the self-attention mechanism in the Transformer model increases exponentially with the length of the context, it will hold a very high computational overhead, and which would hinder the deployment of the Transformer on devices with limited computing resources. To address these problems, this article proposes an adaptive bidirectional selective state-space model (ABSM)-based method for OSA detection, termed as OSAMamba. The main novelty of the proposed method lies in the following two aspects: the development of the lightweight multiscale efficient aggregation (LMSEA) module and the propose of ABSM. To achieve the purpose of expanding the model receptive field and capturing the effective temporal features with a very low number of parameters, the LMSEA module adopts a combination of partial convolution (PConv)-based multiscale strategy and convolutional block attention module (CBAM). The purpose of the ABSM module is to reduce the computational cost of the model and improve the model deployability by using a frequency-domain enhancement strategy to fuse the effective time-domain features extracted by adaptive bidirectional Mamba (ABi-Mamba) with linear complexity with the frequency-domain features extracted by the frequency-domain enhancement module (FEM). Extensive experiments on the Apnea-electrocardiogram (ECG) dataset show that of all compared methods, the proposed method obtains the best accuracy of 91.91% in the per-segment detection, and which surpasses the state-of-the-art (SOTA) TFFormer by 0.31%. It also achieves a remarkable accuracy of 100% with the lowest mean absolute error (MAE) of 2.43 in per-record detection.

FTMoMamba: Motion Generation with Frequency and Text State Space Models
FTMoMamba: Motion Generation with Frequency and Text State Space Models

Chengjian Li, Xiangbo Shu, Qiongjie Cui, Yazhou Yao, Jinhui Tang

Under review. 2024

Diffusion models achieve impressive performance in human motion generation. However, current approaches typically ignore the significance of frequency-domain information in capturing fine-grained motions within the latent space (e.g., low frequencies correlate with static poses, and high frequencies align with fine-grained motions). Additionally, there is a semantic discrepancy between text and motion, leading to inconsistency between the generated motions and the text descriptions. In this work, we propose a novel diffusion-based FTMoMamba framework equipped with a Frequency State Space Model (FreqSSM) and a Text State Space Model (TextSSM). Specifically, to learn fine-grained representation, FreqSSM decomposes sequences into low-frequency and high-frequency components, guiding the generation of static pose (e.g., sits, lay) and fine-grained motions (e.g., transition, stumble), respectively. To ensure the consistency between text and motion, TextSSM encodes text features at the sentence level, aligning textual semantics with sequential features. Extensive experiments show that FTMoMamba achieves superior performance on the text-to-motion generation task, especially gaining the lowest FID of 0.181 (rather lower than 0.421 of MLD) on the HumanML3D dataset.

FTMoMamba: Motion Generation with Frequency and Text State Space Models

Chengjian Li, Xiangbo Shu, Qiongjie Cui, Yazhou Yao, Jinhui Tang

Under review. 2024

Diffusion models achieve impressive performance in human motion generation. However, current approaches typically ignore the significance of frequency-domain information in capturing fine-grained motions within the latent space (e.g., low frequencies correlate with static poses, and high frequencies align with fine-grained motions). Additionally, there is a semantic discrepancy between text and motion, leading to inconsistency between the generated motions and the text descriptions. In this work, we propose a novel diffusion-based FTMoMamba framework equipped with a Frequency State Space Model (FreqSSM) and a Text State Space Model (TextSSM). Specifically, to learn fine-grained representation, FreqSSM decomposes sequences into low-frequency and high-frequency components, guiding the generation of static pose (e.g., sits, lay) and fine-grained motions (e.g., transition, stumble), respectively. To ensure the consistency between text and motion, TextSSM encodes text features at the sentence level, aligning textual semantics with sequential features. Extensive experiments show that FTMoMamba achieves superior performance on the text-to-motion generation task, especially gaining the lowest FID of 0.181 (rather lower than 0.421 of MLD) on the HumanML3D dataset.

DRLFormer: A Data Rebalancing Loss Constrained Light Transformer For OSA Detection
DRLFormer: A Data Rebalancing Loss Constrained Light Transformer For OSA Detection

Chengjian Li, Zhenghao Shi, ZhenZhen You, Na Li, Liang Zhou, Zhijun Zhang, Lulu Ye, Yitong Zhang, Xiaoyong Ren, Xinhong Hei, Haiqin Liu

IEEE Transactions on Instrumentation and Measurement 2024

Although Transformer-based methods have achieved significant success in obstructive sleep apnea (OSA) detection, they suffer from higher computational costs, lower feature-capturing ability in the frequency domain, and class imbalance at the data level. To address these problems, this article proposes a data rebalancing loss (DRLoss)-constrained light transformer for OSA detection, called DRLFormer. The core of the proposed method lies in the following two aspects: the development of the additive temporal–frequency fusion attention (ATFA) module, and the propose of data rebalancing loss (DRLoss) function. The purpose of the ATFA module is to fuse the frequency-domain features extracted by the frequency-domain enhancement module (FEM) with the important time-domain features extracted by the learnable time-domain attention (LTA) in a very low computational cost by novel using the summation strategy. The purpose of DRLoss is to solve the problem of imbalanced samples within batches which usually ignored by existing common used cross-entropy loss (CELoss). The novelty of the design of DRLoss lies in by designing weight memory units (WMUs) and allocating different weights for correctly and incorrectly predicted samples to achieve prediction level balance through classification penalty item (CPI). Extensive experiments are conducted on the Apnea-ECG dataset and compared with the state-of-the-art (SOTA) method. Our method achieves an improvement of 0.44% in accuracy (from 91.68% improved to 92.12%), with a reduction of size by 74% and FLOPs by 53%. Furthermore, excellent performance was achieved on the University College Dublin Sleep Apnea Database (UCD), PhysioNet, and clinical XJ300 datasets, thoroughly verifying the effectiveness and clinical application potential of DRLFormer.

DRLFormer: A Data Rebalancing Loss Constrained Light Transformer For OSA Detection

Chengjian Li, Zhenghao Shi, ZhenZhen You, Na Li, Liang Zhou, Zhijun Zhang, Lulu Ye, Yitong Zhang, Xiaoyong Ren, Xinhong Hei, Haiqin Liu

IEEE Transactions on Instrumentation and Measurement 2024

Although Transformer-based methods have achieved significant success in obstructive sleep apnea (OSA) detection, they suffer from higher computational costs, lower feature-capturing ability in the frequency domain, and class imbalance at the data level. To address these problems, this article proposes a data rebalancing loss (DRLoss)-constrained light transformer for OSA detection, called DRLFormer. The core of the proposed method lies in the following two aspects: the development of the additive temporal–frequency fusion attention (ATFA) module, and the propose of data rebalancing loss (DRLoss) function. The purpose of the ATFA module is to fuse the frequency-domain features extracted by the frequency-domain enhancement module (FEM) with the important time-domain features extracted by the learnable time-domain attention (LTA) in a very low computational cost by novel using the summation strategy. The purpose of DRLoss is to solve the problem of imbalanced samples within batches which usually ignored by existing common used cross-entropy loss (CELoss). The novelty of the design of DRLoss lies in by designing weight memory units (WMUs) and allocating different weights for correctly and incorrectly predicted samples to achieve prediction level balance through classification penalty item (CPI). Extensive experiments are conducted on the Apnea-ECG dataset and compared with the state-of-the-art (SOTA) method. Our method achieves an improvement of 0.44% in accuracy (from 91.68% improved to 92.12%), with a reduction of size by 74% and FLOPs by 53%. Furthermore, excellent performance was achieved on the University College Dublin Sleep Apnea Database (UCD), PhysioNet, and clinical XJ300 datasets, thoroughly verifying the effectiveness and clinical application potential of DRLFormer.

TFFormer: A time–frequency information fusion-based CNN-transformer model for OSA detection with single-lead ECG
TFFormer: A time–frequency information fusion-based CNN-transformer model for OSA detection with single-lead ECG

Chengjian Li, Zhenghao Shi, Liang Zhou, Zhijun Zhang, Chenwei Wu, Xiaoyong Ren, Xinhong Hei, Minghua Zhao, Yitong Zhang, Haiqin Liu, ZhenZhen You, Lifeng He

IEEE Transactions on Instrumentation and Measurement 2023

Accurate detection of obstructive sleep apnea (OSA) with a single-lead electrocardiogram (ECG) signal is highly desirable for the timely treating of OSA patients. However, due to the variance of apneas in appearance and size in ECG signals, it is still a very challenging task to obtain an accurate OSA apnea detection. To address this problem, this article presents a time–frequency information fusion-based CNN-Transformer model (TFFormer) for OSA detection with single-lead ECG, in which a module consisting of a deep residual shrinkage module, a multiscale convolutional attention (MSCA) module, and a multilayer convolution module is developed for time–frequency feature extraction. The purpose of this operation is to extract rich features from a short length of ECG signal sequences with a low computation cost. For time–frequency information fusion, to reduce its computation cost, a gated self-attention-based adaptive pruning time–frequency information fusion attention module is developed to prune the redundant tokens. With the attention-based adaptive pruning time–frequency information fusion module, the TFFormer is constructed for data-parallel processing and long-distance modeling. Compared with the best model in the comparative method, the accuracy of the proposed method was improved by 0.18% in the segmented case, and the mean absolute error was reduced by 0.25 per-recorded case, which demonstrates that the TFFormer model has better OSA detection performance and could provide a convenient and accurate solution for clinical OSA detection.

TFFormer: A time–frequency information fusion-based CNN-transformer model for OSA detection with single-lead ECG

Chengjian Li, Zhenghao Shi, Liang Zhou, Zhijun Zhang, Chenwei Wu, Xiaoyong Ren, Xinhong Hei, Minghua Zhao, Yitong Zhang, Haiqin Liu, ZhenZhen You, Lifeng He

IEEE Transactions on Instrumentation and Measurement 2023

Accurate detection of obstructive sleep apnea (OSA) with a single-lead electrocardiogram (ECG) signal is highly desirable for the timely treating of OSA patients. However, due to the variance of apneas in appearance and size in ECG signals, it is still a very challenging task to obtain an accurate OSA apnea detection. To address this problem, this article presents a time–frequency information fusion-based CNN-Transformer model (TFFormer) for OSA detection with single-lead ECG, in which a module consisting of a deep residual shrinkage module, a multiscale convolutional attention (MSCA) module, and a multilayer convolution module is developed for time–frequency feature extraction. The purpose of this operation is to extract rich features from a short length of ECG signal sequences with a low computation cost. For time–frequency information fusion, to reduce its computation cost, a gated self-attention-based adaptive pruning time–frequency information fusion attention module is developed to prune the redundant tokens. With the attention-based adaptive pruning time–frequency information fusion module, the TFFormer is constructed for data-parallel processing and long-distance modeling. Compared with the best model in the comparative method, the accuracy of the proposed method was improved by 0.18% in the segmented case, and the mean absolute error was reduced by 0.25 per-recorded case, which demonstrates that the TFFormer model has better OSA detection performance and could provide a convenient and accurate solution for clinical OSA detection.

Transformer 驱动的图像分类研究进展
Transformer 驱动的图像分类研究进展

Zhenghao Shi, Chengjian Li, Liang Zhou, Zhijun Zhang, Chenwei Wu, Zhenzhen You, Wenqi Ren

中国图象图形学报 2023

图像分类是图像理解的基础,对计算机视觉在实际中的应用具有重要作用。然而由于图像目标形态、类型的多样性以及成像环境的复杂性,导致很多图像分类方法在实际应用中的分类结果总是差强人意,例如依然存在分类准确性低、假阳性高等问题,严重影响其在后续图像及计算机视觉相关任务中的应用。因此,如何通过后期算法提高图像分类的精度和准确性具有重要研究意义,受到越来越多的关注。随着深度学习技术的快速发展及其在图像处理中的广泛应用和优异表现,基于深度学习技术的图像分类方法研究取得了巨大进展。为了更加全面地对现有方法进行研究,紧跟最新研究进展,本文对Transformer驱动的深度学习图像分类方法和模型进行系统梳理和总结。与已有主题相似综述不同,本文重点对Transformer变体驱动的深度学习图像分类方法和模型进行归纳和总结,包括基于可扩展位置编码的Transformer图像分类方法、具有低复杂度和低计算代价的Transformer图像分类方法、局部信息与全局信息融合的Transformer图像分类方法以及基于深层ViT(visual Transformer)模型的图像分类方法等,从设计思路、结构特点和存在问题等多个维度、多个层面深度分析总结现有方法。为了更好地对不同方法进行比较分析,在ImageNet、CIFAR-10(Canadian Institute for Advanced Research)和CIFAR-100等公开图像分类数据集上,采用准确率、参数量、浮点运算数(floating point operations,FLOPs)、总体分类精度(overall accuracy,OA)、平均分类精度(average accuracy,AA)和Kappa(κ)系数等评价指标,对不同方法模型的分类性能进行了实验评估。最后,对未来研究方向进行了展望。

Transformer 驱动的图像分类研究进展

Zhenghao Shi, Chengjian Li, Liang Zhou, Zhijun Zhang, Chenwei Wu, Zhenzhen You, Wenqi Ren

中国图象图形学报 2023

图像分类是图像理解的基础,对计算机视觉在实际中的应用具有重要作用。然而由于图像目标形态、类型的多样性以及成像环境的复杂性,导致很多图像分类方法在实际应用中的分类结果总是差强人意,例如依然存在分类准确性低、假阳性高等问题,严重影响其在后续图像及计算机视觉相关任务中的应用。因此,如何通过后期算法提高图像分类的精度和准确性具有重要研究意义,受到越来越多的关注。随着深度学习技术的快速发展及其在图像处理中的广泛应用和优异表现,基于深度学习技术的图像分类方法研究取得了巨大进展。为了更加全面地对现有方法进行研究,紧跟最新研究进展,本文对Transformer驱动的深度学习图像分类方法和模型进行系统梳理和总结。与已有主题相似综述不同,本文重点对Transformer变体驱动的深度学习图像分类方法和模型进行归纳和总结,包括基于可扩展位置编码的Transformer图像分类方法、具有低复杂度和低计算代价的Transformer图像分类方法、局部信息与全局信息融合的Transformer图像分类方法以及基于深层ViT(visual Transformer)模型的图像分类方法等,从设计思路、结构特点和存在问题等多个维度、多个层面深度分析总结现有方法。为了更好地对不同方法进行比较分析,在ImageNet、CIFAR-10(Canadian Institute for Advanced Research)和CIFAR-100等公开图像分类数据集上,采用准确率、参数量、浮点运算数(floating point operations,FLOPs)、总体分类精度(overall accuracy,OA)、平均分类精度(average accuracy,AA)和Kappa(κ)系数等评价指标,对不同方法模型的分类性能进行了实验评估。最后,对未来研究方向进行了展望。

航空遥感图像深度学习目标检测技术研究进展
航空遥感图像深度学习目标检测技术研究进展

Zhenghao Shi, Chenwei Wu, Chengjian Li, Zhenzhen You, Quan Wang, Chengcheng Ma

中国图象图形学报 2023

航空遥感图像目标检测是智能解译遥感图像的关键技术,广泛应用于情报侦察、灾害救援和资源勘探等领域,但由于图像中目标小且密集、角度多变、易被遮挡等特点,检测任务仍面临诸多挑战。近年来,基于深度卷积神经网络的方法因其高精度和高效率而受到关注。为推动该领域发展,本文系统梳理了2020—2022年主流检测方法,回顾了深度学习目标检测技术的发展,分析了基于CNN与Transformer的代表性算法,总结了针对不同应用场景的改进思路,介绍了公开数据集与典型实验结果,并探讨了当前存在的问题及未来研究方向。

航空遥感图像深度学习目标检测技术研究进展

Zhenghao Shi, Chenwei Wu, Chengjian Li, Zhenzhen You, Quan Wang, Chengcheng Ma

中国图象图形学报 2023

航空遥感图像目标检测是智能解译遥感图像的关键技术,广泛应用于情报侦察、灾害救援和资源勘探等领域,但由于图像中目标小且密集、角度多变、易被遮挡等特点,检测任务仍面临诸多挑战。近年来,基于深度卷积神经网络的方法因其高精度和高效率而受到关注。为推动该领域发展,本文系统梳理了2020—2022年主流检测方法,回顾了深度学习目标检测技术的发展,分析了基于CNN与Transformer的代表性算法,总结了针对不同应用场景的改进思路,介绍了公开数据集与典型实验结果,并探讨了当前存在的问题及未来研究方向。

All publications