登录

欢迎莅临 IEEE HotICN 中文社区，IEEE HotICN 国际学术会议网站： https://hoticn.com, https://hoticn.cn。

cz的文章

Self-evolving LLM Agents with In-distribution Optimization

Self-evolving LLM Agents with In-distribution Optimization

1. 摘要（Abstract）本文研究的是长程交互型 LLM Agent 的训练问题，核心关注点是稀疏延迟奖励下的贡献归因。随着大语言模型从静态文本生成逐渐走向环境交互，LLM Agent 需要在网页购物、虚拟实验、家居任务等复杂环境中进行连续决策。然而，这类任务通常只...

1个月前 (06-11)

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

1. 摘要（Abstract）本文研究的是 LLM Agents 在长程交互任务中的强化学习训练问题。现有大模型智能体在完成复杂环境任务时，通常会把完整的历史交互记录作为上下文输入，包括任务指令、过去观察、动作和中间状态等。这种做法虽然能帮助模型理解当前处境，但也带来了...

1个月前 (06-11)

Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights

Rethinking Evaluation for LLM Hallucination Detection: A Desiderata, A New RAG-based Benchmark, New Insights

1.摘要（Abstract）本文研究的是 LLM hallucination detection benchmark（大模型幻觉检测基准）的评测问题。随着大模型被用于电商、医疗、法律等真实场景，幻觉问题已经不只是模型效果问题，而是直接关系到生成式 AI 的安全使用。虽...

2个月前 (05-18)

A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network

A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network

1. 摘要（Abstract）本文研究的是大模型推理中的通信瓶颈问题，具体聚焦在 Tensor Parallelism（TP）推理场景下的 All-Reduce 加速。随着 LLM 参数规模不断扩大，单个 GPU 很难独立完成低延迟推理，多加速器并行已经成为常态。但在 ...

2个月前 (05-08)

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

1. 摘要（Abstract）本文聚焦于 LLM-based Search Agent 的训练问题。现有方法在强化学习训练中主要面临一个核心难点credit assignment（贡献归因）。一方面，基于最终答案的 outcome supervision 虽然训练稳定，...

3个月前 (04-22)