CUHKSZ UG’27 ECE & Music

About Me 关于我

I am an undergraduate student in Electronic and Computer Engineering (Class of 2027) at The Chinese University of Hong Kong, Shenzhen, with a minor in Music.

At the Amphion Lab, I focus on AI audio research. I contributed to NVSpeech, the first large-scale paralinguistic audio dataset in academia, which has been accepted by ICASSP 2026. I am also independently developing SchenkerHnet, the first pretraining-scale symbolic music generation model grounded in structured music theory.

At the Network and Computational Economics Lab (NCEL), I participated in the E-Price project in collaboration with Guangdong Power Grid, where I developed the first long-term electricity price forecasting system driven by news events using large language models.

I have also led the development of several user-oriented interactive AI systems, including HyCoSeg for COVID-19 lung CT lesion segmentation, FingerSense for intelligent piano hand-motion analysis, and PaperHelper for assisting novice researchers in literature retrieval and synthesis.

My research aims to enhance the interpretability of generative AI in audio and to build practical human-centered AI systems that address real user needs.

我是香港中文大学（深圳）电子与计算机工程专业2027届本科生，辅修音乐。

我在 Amphion 实验室深入研究AI音频，参与发表了学界首个大型副语言音频数据集 NVSpeech 并被ICASSP2026 接收，我正独立开发首个预训练级别的大型乐理结构化符号音乐生成模型 SchenkerHnet。

我在 NCEL 实验室参与了与广东电网合作的智能电网项目 E-Price，开发了首个基于大语言模型的新闻事件驱动长期电价预测系统。

我也带领团队开发了多个面向用户需求的交互人工智能系统，包括 COVID-19 肺部CT病灶分割系统 HyCoSeg、智能钢琴演奏手部动作分析系统 FingerSense，以及面向科研新手的智能文献检索整合系统 PaperHelper。

我的研究旨在使生成式人工智能音频具备更高的可解释性，以及开发实用的人机交互系统，让AI真正服务用户的切实需求。

Research Experience 科研经历

Amphion Amphion

Research Assistant 研究助理

October 2024 - Present 2024年10月 - 至今

Amphion GitHub Page Amphion GitHub 页面

Amphion is the lab led by Prof. Zhizheng Wu, focusing on the research of AI Audio, Music, and Speech Generation. Amphion 是由 武执政教授 领导的实验室，主要研究 AI 音频、音乐与语音生成。

I independently led the SchenkerHnet project and contributed to the NVSpeech project (accepted by ICASSP 2026).

In SchenkerHnet, I constructed the first million-scale symbolic music dataset grounded in music-theoretic structure and developed a symbolic music generation model based on H-Net integrated with Schenkerian theory. The model generates music with strong theoretical interpretability, captures long-range structural dependencies, and scales to full movement-length compositions.

In NVSpeech, I built the complete pipeline for extracting timestamped paralinguistic features and validated it across multiple large-scale datasets. I also generated a substantial number of speech–text aligned segments for each paralinguistic category and was responsible for designing the NVSpeech project website.

我独立完成 SchenkerHnet 项目并参与贡献了 NVSpeech 项目 (被ICASSP 2026 接收)

在 SchenkerHnet 中我独立构建了学界首个百万级基于乐理结构的符号音乐数据集，构建了基于 H-Net ，结合申克理论的符号音乐生成模型。模型能够生成高乐理可解释性，具备长程结构连接，达到单乐章长度级别音乐。

在 NVSpeech 中我构建了提取副语言时间戳的整个工作流，并在多个大型数据集上完成验证，为每种副语言生成了大量语音-文本对齐的切片，我还负责设计 NVSpeech 项目主页。

NCEL NCEL

Research Assistant 研究助理

May 2025 - Present 2025年5月 - 至今

NCEL website NCEL 网站

Network Communications and Economics Lab (NCEL) is the lab led by Prof. Jianwei Huang, focusing on the research of Network Economics, Wireless Communications, swarm Intelligence and Smart Grid. 网络通信与经济实验室（NCEL）由 黄建伟教授 领导，主要研究网络经济学、无线通信、群体智能与智能电网。

At NCEL, I participated in the E-Price smart grid project in collaboration with Guangdong Power Grid, where I was responsible for constructing electricity price–relevant datasets and designing and fine-tuning forecasting models. Specifically, I implemented a time- and keyword-constrained news retrieval pipeline based on GDELT and EIA data, and developed an LLM-driven system that integrates historical data with real-time news to accurately forecast future electricity prices.

在 NCEL 我参与了与广东电网合作的智能电网项目 E-Price，负责影响电价的数据集构建与电价预测模型的设计及微调：围绕 GDELT 和 EIA 实现了带时间与关键词约束的新闻检索流程，并构建了通过LLM分析历史数据与实时新闻信息，能够精确预测未来电价的智能电网系统。

Publications 发表成果

NVSpeech NVSpeech

An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations 面向副语言发声的人类式语音建模一体化可扩展流程

NVSpeech is an integrated pipeline for modeling human-like speech with paralinguistic vocalizations, bridging dataset construction, paralinguistic-aware ASR, and controllable TTS in a unified framework. In this project, I contributed to the recognition and synthesis side by developing timestamp-based paralanguage and prosody feature extraction with WhisperX, FunAudio, and Whisper-timestamped, evaluating robustness across Emilia and EARS MM_TTS, generating over 100 text slices for each paralanguage label to support synthesis and evaluation, and designing the project homepage to present the system architecture, demos, and datasets.

NVSpeech 是学界首个人声副语言数据集，围绕副语言发声现象，将数据集构建、具备副语言感知能力的语音识别，以及可控语音合成整合到同一框架中。在这个项目中，我主要参与了识别与合成相关部分：利用 WhisperX、FunAudio 和 Whisper-timestamped 设计了基于时间戳的副语言与韵律特征提取流程，在 Emilia 和 EARS MM_TTS 上完成了跨数据集鲁棒性验证，并为每类副语言生成了大量文本-音频对齐切片以支持合成与评测；此外，我还负责设计 NVSpeech 项目主页，用于展示系统架构、音频 demo 与数据资源。

SchenkerHnet SchenkerHnet

A Hierarchical and Theory-Guided Framework for Long-Range Symbolic Music (In Preparation) 一种面向长程符号音乐生成的层级化与理论引导框架 (开发中)

SchenkerHnet is my independent research project on structure-aware symbolic music generation, which aims to bring Schenkerian hierarchy into AI models so that generated music reflects long-range tonal organization rather than only local token patterns. To build this system, I reconstructed large-scale classical piano corpora into theory-guided training data with harmonic, tonal, notes density, and structural annotations, redesigned the representation from coarse beat-level tokens to a cleaner tick-level JSONL format, and developed a hierarchical encoder–router–downsampler–main network–upsampler pipeline with Schenker-aware routing and long-memory modeling inspired by dynamic chunking network (Hnet) and Mamba. The project is currently being scaled to larger corpora for interpretable and structurally coherent long-form music generation.

SchenkerHnet 是我独立推进的结构化符号音乐生成项目，目标是把申克乐理分析中的层级结构真正引入现代 AI 模型，让生成结果学习到长程调性组织，而不只是局部音符模式。围绕这个目标，我首先把大规模古典钢琴语料重构为带有和声、调性、音符密度与结构标注的 theory-guided 训练数据，又将原先较粗糙的 beat-level 表示重构为更稳定的 tick-level JSONL 格式；在模型上，我进一步设计了编码器,申克单元下采样模块, 主网络, 上采样的层级框架，并结合 Mamba 式长记忆建模来学习深层结构。该项目目前正在向更大规模语料训练推进，以实现更可解释、结构更连贯的长篇符号音乐生成。

E-Price E-Price

Event-aware Long-term Electricity Price Forecasting via Large Language Models (Manuscript under review) 基于大语言模型的事件驱动长期电力价格预测（论文在审）

This project studies long-term electricity price forecasting through a hybrid framework that combines structured trend modeling with event-aware reasoning over news, aiming to capture both stable market fundamentals and high-impact external shocks. I contributed to the event-data and forecasting pipeline by building time- and keyword-conditioned news retrieval workflows for GDELT, automating the conversion from LLM-generated search plans to executable queries, collecting official yearly price-driver data and forecasts from EIA-oriented sources, improving retrieval robustness against malformed outputs, rate limits, and overlong queries, and supporting later benchmark experiments that connect event-aware news signals with long-horizon forecasting evaluation.

这个项目聚焦于长期电价预测，核心思路是将电价趋势与基于大语言模型的新闻事件推理相结合，从而同时捕捉稳定的市场基本面与高影响力的外部冲击。我在其中主要参与了事件数据与预测流程的搭建：围绕 GDELT 实现了带时间与关键词约束的新闻检索流程，完成了从 LLM 生成搜索计划到可执行 query 的自动化转换，收集了面向 EIA 的官方年度 price driver 历史值与预测值，并针对输出格式异常、请求限流和 query 过长等问题优化了检索过程。

Recent Awards 近期奖项

Dean List in School of Science and Engineering(2024–2025): In recognition of having achieved overall academic excellence in undergraduate study for the 2024-25 academic year.
28,29,30th Undergraduate Research Awards: Fund undergraduate students in independent, faculty-guided research projects.

2024至2025年度院长荣誉榜：表彰 2024-25 学年本科阶段取得的整体学业优秀成绩
第28、29、30届本科生科研奖：资助本科生个人在导师指导下开展的独立研究工作

Selected Projects 项目经历

HyCoSeg HyCoSeg

Manual and Deep Learning Hybrid Approach for COVID-19 Lung Lesion Segmentation 基于人工与深度学习融合的 COVID-19 肺部病灶分割系统

HyCoSeg is a hybrid pipeline that integrates classical image processing with deep learning for COVID-19 lung lesion segmentation in CT scans. The system first employs a 2D U-Net (LungMask) to delineate lung regions, providing anatomical constraints for subsequent detection. Ground-glass opacities are segmented using Hounsfield Unit thresholding combined with morphological post-processing, while consolidation lesions are identified through a coarse-to-fine deep model enhanced with Gabor texture filtering. Pleural effusion is extracted via boundary-aware density thresholding followed by connected-component analysis.

I designed the overall technical framework, led data collection and preprocessing, fine-tuned the 2D U-Net model, and built the unified pipeline for three-type lesion recognition.

HyCoSeg 是一个将传统图像处理与深度学习结合的混合式流程，用于 COVID-19 CT 扫描中的肺部病灶分割。系统首先使用 2D U-Net（LungMask）勾勒肺部区域，以约束后续检测范围。随后，通过 Hounsfield Unit 阈值结合形态学后处理分割磨玻璃影；再利用结合 Gabor 纹理滤波的粗到细深度模型识别实变病灶；最后通过边界感知密度阈值与连通域分析提取胸腔积液。我设计了整体技术路径，并负责数据收集清洗，微调 2D U-Net 与搭建 3 类病灶识别的整体框架

FingerSense FingerSense

A Piano Gesture Correction Pipeline with Motion Comparison and Real-Time Feedback 实时反馈的智能钢琴演奏手部动作分析系统

FingerSense is an AI system I independently developed for real-time analysis of piano hand movements. It takes an overhead piano-practice video as input and splits it into temporally aligned audio and video streams. The system leverages PianoMotion10M to generate a reference hand-motion sequence in MANO space from the audio, while reconstructing the observed hand-motion sequence from video using HaMeR. It then aligns the two motion streams, computes frame-level losses, and feeds the structured evidence into an LLM to produce professional, second-level precise feedback for performance correction.

我独立开发了一个能实时分析钢琴演奏手部动作的人工智能系统，系统以一段俯视钢琴练习演奏视频作为输入，并将其拆分为时间对齐的音频与视频。
系统使用 PianoMotion10M 从音频生成MANO 空间下的标准手部动作序列，同时借助 HaMeR 从视频重建 MANO 空间下的实际手部动作序列。随后，系统对两路动作流进行时序对齐，计算逐帧损失，用 LLM 分析损失并生成精确到秒的专业动作纠正建议。

PaperHelper PaperHelper

An integrated system for novice academic researchers that combines two complementary components 面向科研新手的智能文献检索整合系统

We developed PaperHelper, a system designed to assist novice researchers in academic paper retrieval and management, addressing the rigidity and ambiguity of traditional RAG-based search systems. The system leverages a fine-tuned BART model to classify user queries and employs an MCP-based retrieval mechanism to traverse and interpret the entire MongoDB-backed corpus, enabling precise identification and aggregation of relevant literature.

As the team lead, I proposed the overall system architecture and coordinated task allocation. My primary contribution focused on backend development, including fine-tuning the BART model, designing and implementing MCP tools for structured retrieval, and integrating the frontend, large language models (LLMs), and database infrastructure into a unified, end-to-end system.

我带领团队开发了一个帮助科研新手进行论文检索和管理的系统，克服了传统RAG 检索系统生硬，模糊的缺陷。系统通过 BART 分类用户输入的请求并用MCP阅读检索整个 MongoDB 代码库输出目标文献。我负责后端开发，包括构建 ICLR25 论文数据集构建，BART模型微调，以及开发MCP tools, 并集成了前端，模型，MCP，MongoDB 的整体框架。

Tech Stack 技术栈

Programming Languages: Python, PyTorch, HTML, Bash, VHDL
Development Tools: Linux, Git&GitHub, Docker, Huggingface, Slurm, Vivado, MultiSim
Research Interests: Deep Learning, Signal Processing & System, Digital Circuit Design

编程语言：Python, PyTorch, HTML, Bash, VHDL
开发工具：Linux, Git&GitHub, Docker, Hugging Face, Slurm, Vivado, MultiSim
研究领域：深度学习、信号与系统、数字电路设计

陆奕衡 Yiheng Lu 陆奕衡 Yiheng Lu