陆奕衡 Yiheng Lu

CUHKSZ UG'27 | ECE/MUS

About Me

Hi I’m Yiheng Lu, a Music lover, Gym bro, also an Aspiring AI researcher.

I’m a 3rd Year Undergraduate student at The Chinese University of Hong Kong, Shenzhen, pursuing a B.Eng. in Electrical Engineering, with a minor in Music.

I’m currently working on Symbolic Music Generation and Smart Grids through AI-based approaches, and I’m planning to pursue Robotics as my future research direction.

Publications

I contributed to advancing filler-word recognition and prosody-aware speech processing through feature engineering, dataset evaluation, and data preparation for synthesis.

I proposed a timestamp-based filler and prosody feature extraction method leveraging WhisperX, FunAudio, and Whisper-timestamped to improve recognition accuracy.
I further conducted cross-dataset evaluation on Emilia and EARS MM_TTS, validating robustness across English long-form audio and daily conversations.
Additionally, I generated over 100 text slices per filler label to support TTS synthesis and evaluation, and designed the NVSpeech project homepage to present the system architecture, demos, and datasets.

Projects

I led my team to leverage the latest AI technologies to build a system that helps novice academic researchers with paper finding, summarization, and retrieval.

We critically analyzed the limitations of current academic search engines and conventional RAG systems.
I proposed an improved architecture and coordinated the division of tasks among team members.
My role focused on backend development, including model fine-tuning, and the integration of MCP, large language models (LLMs), and database systems to ensure seamless functionality.

I led my team to present a hybrid method for segmenting lung lesions in COVID-19 CT scans. It combines classical image processing with deep learning to detect 3 common lesion types.

HyCoSeg is a hybrid pipeline that integrates classical image processing with deep learning for the segmentation of COVID-19 lung lesions in CT scans. The lung regions are first delineated with a 2D U-Net (LungMask) to constrain subsequent detection. Ground-glass opacities are segmented through Hounsfield Unit thresholding combined with morphological post-processing, while consolidation is identified using a coarse-to-fine deep model enhanced by Gabor texture filtering. Pleural effusion is extracted by boundary-aware density thresholding followed by connected-component analysis.

Experience

Amphion

Research Assistant

October 2024 - Present

Amphion GitHub Page

Amphion is the lab led by Prof. Zhizheng Wu, focusing on the research of AI Audio, Music, and Speech Generation.

As a research assistant in the Amphion Lab, I contributed to NVSpeech and explored state-of-the-art methods in Text-to-Speech and Text-to-Audio generation. This experience deepened my understanding of speech synthesis pipelines and the latest advancements in generative audio technologies.

NCEL

Research Assistant

May 2025 - Present

NCEL website

Network Communications and Economics Lab (NCEL) is the lab led by Prof. Jianwei Huang, focusing on the research of Network Economics, Wireless Communications, swarm Intelligence and Smart Grid.

As a research assistant in the NCEL, I contributed to the Smart Grid project collaborated with Guangdong Power Grid. This project aims to develop a smart grid system that can predict the electric price through the analysis of historical data and real-time information.

Tech Stack

  • Languages: Python, PyTorch, HTML, Bash, VHDL
  • Tools: Linux, Git&GitHub, Docker, Huggingface, Slurm, Vivado, MultiSim
  • Experience: Deep Learning, Signal Processing & System, Digital Circuit Design