论文标题

R编程语言的扩展和演变

Expansion and evolution of the R programming language

论文作者

Staples, Timothy L

论文摘要

语言使用的变化是由文化力量驱动的。目前尚不清楚是否扩展到编程语言。它们的目的是被人类使用,但是与计算机硬件而不是人类受众的互动可能会限制使用术语词典的机会。我在R R(用于统计计算的开源,成熟且常用的编程语言)中对此进行了测试。在2014年至2021年之间发布的360,321个GitHub存储库的语料库中,我提取了168,857,044个功能调用,以充当R语言的N-grams。在八年的时间里,R迅速多元化和经历了重大的词汇变化,这是由于社区套餐收集的越来越受欢迎的驱动。我的结果提供了证据,表明用户可以影响编程语言的演变,并以自然语言观察到的模式与遗传进化相匹配。 R的演变可能是由于分析复杂性的提高而驱动的,将新用户推向R,既给替代词典和伴随的对流变化构成选择性压力。这种变化的速度和幅度可能会对用R和类似语言编码的分析和科学询问的可读性和连续性产生流程后果。

Change in language use is driven by cultural forces; it is unclear whether that extends to programming languages. They are designed to be used by humans, but interaction with computer hardware rather than a human audience may limit opportunities for evolution of the lexicon of used terms. I tested this in R, an open source, mature and commonly used programming language for statistical computing. In corpus of 360,321 GitHub repositories published between 2014 and 2021, I extracted 168,857,044 function calls to act as n-grams of the R language. Over the eight-year period, R rapidly diversified and underwent substantial lexical change, driven by increasing popularity of the tidyverse collection of community packages. My results provide evidence that users can influence the evolution of programming languages, with patterns that match those observed in natural languages and reflect genetic evolution. R's evolution may have been driven by increased analytic complexity, driving new users to R, creating both selective pressure for an alternate lexicon and accompanying advective change. The speed and magnitude of this change may have flow-on consequences for the readability and continuity of analytic and scientific inquiries codified in R and similar languages.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源