活动环节 | 讲者 | 时间 |
签到入场 | 14:00-14:30 | |
宣讲主题介绍 | 14:30-14:40 | |
论文题目:《Dialogue Response Selection with Hierarchical Curriculum Learning》;《BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data》 | 王琰,腾讯AI Lab高级研究员;ACL、EMNLP、NAACL、AAAI等顶级会议发表论文二十余篇;主导开发了腾讯AI开放平台智能闲聊服务。 | 14:40-15:20 |
论文题目:《Fast and Accurate Neural Machine Translation with Translation Memory》;《Neural Machine Translation with Monolingual Translation Memory》 | 刘乐茂,腾讯AI Lab高级研究员;ACL、EMNLP、NAACL、AAAI等顶级会议发表论文约四十余篇;曾担任IJCAI2021 高级程序委员和Findings of EMNLP2020 论文出版联合主席。 | 15:20-16:00 |
论文题目:《GWLAN: General Word-Level AutocompletioN for Computer-Aided Translation》;《Data Augmentation for Text Generation Without Any Augmented Data》 | 李华阳,腾讯AI Lab研究员;在ACL、EMNLP、AAAI等自然语言处理相关会议和刊物上发表论文十余篇。 | 16:00-16:40 |
圆桌交流 | 16:40-17:00 |
王琰,博士,毕业于香港城市大学,现担任腾讯公司人工智能实验室(AI Lab)高级研究员,主要负责智能闲聊和文本生成相关的研究与算法工作。主导开发了腾讯AI开放平台智能闲聊服务,为腾讯数十款智能音箱、客服以及智能NPC产品提供千人千面的闲聊能力,并在ACL、EMNLP、NAACL、AAAI等自然语言处理和机器学习顶级会议发表论文二十余篇。
1.宣讲论文题目:《Dialogue Response Selection with Hierarchical Curriculum Learning》
2.宣讲论文摘要:We study the learning of a matching model for dialogue response selection. Motivated by the recent finding that models trained with random negative samples are not ideal in real-world scenarios, we propose a hierarchical curriculum learning framework that trains the matching model in an “easy-to-difficult” scheme. Our learning framework consists of two complementary curricula: (1) corpus-level curriculum (CC); and (2) instance-level curriculum (IC). In CC, the model gradually increases its ability in finding the matching clues between the dialogue context and a response candidate. As for IC, it progressively strengthens the model’s ability in identifying the mismatching information between the dialogue context and a response candidate. Empirical studies on three benchmark datasets with three state-of-the-art matching models demonstrate that the proposed learning framework significantly improves the model performance across various evaluation metrics.
刘乐茂,现任腾讯公司人工智能实验室(AI Lab)高级研究员,主要从事自然语言处理和机器翻译相关的研究与开发工作;2013年博士毕业于哈尔滨工业大学,随后在纽约城市大学和日本国立信息通讯研究机构(NICT)从事机器翻译方面的研究工作;在ACL、EMNLP、NAACL、ICLR、AAAI等自然语言处理相关会议和刊物上发表论文约四十篇,曾担任IJCAI2021 高级程序委员和Findings of EMNLP2020 论文出版联合主席。
1.宣讲论文题目:《Fast and Accurate Neural Machine Translation with Translation Memory》
2.论文摘要:It is generally believed that a translation memory (TM) should be beneficial for machine translation tasks. Unfortunately, existing wisdom demonstrates the superiority of TM-based neural machine translation (NMT) only on the TM-specialized translation tasks rather than general tasks, with a non-negligible computational overhead. In this paper, we propose a fast and accurate approach to TM-based NMT within the Transformer framework: the model architecture is simple and employs a single bilingual sentence as its TM, leading to efficient training and inference; and its parameters are effectively optimized through a novel training criterion. Extensive experiments on six TM-specialized tasks show that the proposed approach substantially surpasses several strong baselines that use multiple TMs, in terms of BLEU and running time. In particular, the proposed approach also advances the strong baselines on two general tasks (WMT news Zh -> En and -> De)
3.中文概要:翻译记忆能帮助提升机器翻译质量。遗憾的是,现有基于翻译记忆的翻译模型在通用翻译任务上无效,它只在针对翻译记忆研究的特定任务上有效,并且具有不可忽略的计算开销。因此,本文提出了一种高效且准确的融合翻译记忆的神经翻译模型。此模型的架构简单,它仅使用一个双语句子作为其翻译记忆,因而它的训练和推理相当高效;更重要的是,我们还提出了一种新的训练方法来优化模型参数。我们在6个针对翻译记忆研究的特定任务上进行实验,结果表明:相比于几种使用多个双语句子作为翻译记忆的强基线系统,该模型在翻译质量和运行时间方面有极大的优势;特别地,该模型在两个通用任务(WMT Zh->En和En ->De)上也能提升翻译质量。
李华阳,现任腾讯公司人工智能实验室(AI Lab)研究员,主要从事自然语言处理以及(交互式)机器翻译相关的研究与落地工作;主导研发了腾讯交互式机器翻译系统 TranSmart,目前已为联合国在内的十余家大型机构及公司提供辅助翻译服务;在ACL、EMNLP、AAAI等自然语言处理相关会议和刊物上发表论文十余篇。
1.部分宣讲论文题目:《GWLAN: General Word-Level AutocompletioN for Computer-Aided Translation》
2.论文摘要:Computer-aided translation (CAT), the use of software to assist a human translator in the translation process, has been proven to be useful in enhancing the productivity of human translators. Autocompletion, which suggests translation results according to the text pieces provided by human translators, is a core function of CAT. There are two limitations in previous research in this line. First, most research works on this topic focus on sentence-level autocompletion (i.e., generating the whole translation as a sentence based on human input), but word-level autocompletion is under-explored so far. Second, almost no public benchmarks are available for the autocompletion task of CAT. This might be among the reasons why research progress in CAT is much slower compared to automatic MT. In this paper, we propose the task of general word-level autocompletion (GWLAN) from a real-world CAT scenario, and construct the first public benchmark to facilitate research in this topic. In addition, we propose an effective method for GWLAN and compare it with several strong baselines. Experiments demonstrate that our proposed method can give significantly more accurate predictions than the baseline methods on our benchmark datasets.

联系电话:王宏刚 17898468114