Damon Mayaffre (达蒙·马亚福尔)
Laurent Vanni(洛朗·万尼)
报告题目
Artificial Intelligence of Texts. Statistical Model and Interpretation
文本的人工智能:统计模型与解读
摘要
In France, as well as globally, the performance of artificial intelligence applied to texts is no longer disputed(Mayaffre and Vanni 2021). It is the interpretability of the computational models used and the explainability of the linguistic results obtained that raise questions within the scientific community. The background of the 'French-style' computational linguistics (TAL or ADT), with the initial work of(Benzécri 1973), continued by Lebart and Salem (1994,1997), helps to partially illuminate the black box and make sense of machine outputs that are primarily probabilistic. To understand the linguistic phenomena at play, we argue that convolutional models (CNNs) have the ability to account for the syntagmatic axis, meaning they exhibit salient combinations along the text chain. Meanwhile, transformer models (Self-attention mechanism) add the ability to address the paradigmatic axis, identifying the selections or 'associative relations' (Saussure: 170-175) within the corpus texts. In both cases, and in a firmly complementary manner, an effort of co(n)textualization must be undertaken – the word in relation to its immediate co(n)text, and the word in association with its paradigmatic counterparts in memory or in the corpus – for a semantics that is not merely formal but rather an interpretive semantics or corpus-driven semantics.
在法国乃至全球范围内,应用于文本的人工智能性能已不再受到质疑(Mayaffre and Vanni 2021)。但所运用计算模型的可阐释性以及所获得语言结果的可解释性,成为学界所关注的问题。以Benzécri(1973)初步研究为代表的“法国派”计量语言学(法语简称为TAL 或 ADT),后由Salem和Lebart(1994,1997)等学者得以继续发扬,这些研究均有助于部分揭示“黑匣子”,理解以概率为基础的机器输出。要进一步理解相关语言现象,我们认为卷积模型(CNNs)可解释句法轴,显示文本轴中相关显著组合;与此同时,Transformer模型(自注意力机制)可增加处理范式轴的能力、识别语料库文本中的选择或“联想关系”(索绪尔:170-175)。在上述两种情况下,采用坚固的互补方式、通过语境化努力(即将词语关联直接语境,或关联与记忆、语料库中的范式对应项),那么将会形成一种语义学,这将是一种不单为形式的,更是一种解释性或语料库驱动的语义学。
报告人简介
Damon Mayaffre, a research professor at the French National Centre for Scientific Research (CNRS), also teaches at the Université Côte d'Azur where he supervises PhD students. He is a permanent member of the UMR 7320 “Data, Corpora, and Language” research lab in Nice and heads the "Logometry: Corpora, Processing, and Models" team. Since completing his PhD in political discourse analysis in 1998, he has published five books and more than 80 articles on discourse analysis, natural language processing, and political text studies. As of 2021, he has supervised six PhD students. In the field of digital humanities, Mayaffre is a key advocate for Logometry and serves on the scientific committee of the International Conference on Statistical Analysis of Textual Data (JADT). He is a peer reviewer for the academic journals Mots: Les Langages du politique and Corpus. He also collaborates with the Honoré Champion publishing house, where he oversees the Lettres numériques series.
达蒙·马亚福尔(Damon Mayaffre), 法国国家科学研究中心CNRS的专职科研教授,蔚蓝海岸大学老师,博士生导师,尼斯UMR7320“数据、语料库、言语实验室”常委, 也是“话语计量阐释学:语料库、处理、模型”团队负责人。自1998年获得政治话语分析博士论文后,出版了五部著作, 发表80余篇文章,涉及话语分析、自然语言处理以及政治文本研究。截止2021年指导了六位博士生毕业。植根于数字人文领域,马亚福尔是话语计量阐释学(Logométrie)的积极推动者,也是国际数据分析大会JADT科学委员会成员,担任学术期刊杂志《文字:政治语言》(Mots: Les Langages du politique)、《语料库》(Corpus)审稿专家,并与法国Honoré Champion出版社合作,负责《数字文集》(Lettres numériques)系列丛书。
Laurent Vanni is a research engineer at the French CNRS and holds a PhD in Computer Science from the Université Côte d'Azur. His work focuses on computational linguistics, in particular the application of deep learning methods to text mining. Since 2013, he has been working at the Bases, Corpus, Language laboratory in Nice (France), where he combines standard statistical methods with deep neural networks to detect and study new linguistic objects from texts. Since 2015, Laurent Vanni has also been developing the Hyperbase web research platform (https://hyperbase.unice.fr), which implements a set of statistical tools for the analysis of textual data from multilingual digital corpora on a variety of topics, such as the analysis of political, media or literary discourse. Finally, he teaches courses at the University of Nice Côte d'Azur on the statistical analysis of textual data, combining theoretical and practical approaches based on examples of algorithm implementations and IT tools.
Laurent Vanni(洛朗·万尼),拥有法国蔚蓝海岸大学(Université Côte d'Azur)计算机科学博士文凭,现为法国国家科学研究中心(CNRS)研究型工程师。研究领域为计算语言学,尤其是深度学习在文本挖掘方面的应用。自2013年以来,他还在法国尼斯大学“数据、语料库、言语实验室”工作,将标准的统计方法与深度神经网络进行结合,从文本中检测并研究新语言对象。自2015年以来,洛朗·万尼开发了Hyperbase网络研究平台(https://hyperbase.unice.fr),该平台拥有系列多语种语料库文本数据统计分析工具,适用于政治、媒体、文学等多主题话语语料。目前他还在蔚蓝海岸大学任教,负责文本数据统计分析课程,借助算法和IT工具实例,结合理论与实践方法进行教学。