T5QL：SQL生成的驯服语言模型

论文标题

T5QL：SQL生成的驯服语言模型

T5QL: Taming language models for SQL generation

论文作者

Arcadinho, Samuel, Aparício, David, Veiga, Hugo, Alegria, António

论文摘要

自动SQL生成一直是一个活跃的研究领域，旨在通过以特定意图编写自然语言而不是编写SQL来简化对数据库的访问。当前用于语义解析的SOTA方法取决于LLMS在基准数据集上实现高预测精度。这降低了其适用性，因为LLMS需要昂贵的GPU。此外，SOTA方法是未接地的，因此不能保证始终生成有效的SQL。在这里，我们提出了T5QL，这是一种新的SQL生成方法，当使用较小的LMS（即T5-base）与SOTA方法相比，在13pp时使用较小的LMS（即T5-base），可以改善基准数据集的性能。此外，保证T5QL始终使用无上下文语法来限制SQL生成的有效SQL。最后，我们表明，在两项任务中进行语义解析，候选SQLS的生成和候选人重新排行，是一个有希望的研究途径，可以减少对大型LM的需求。

Automatic SQL generation has been an active research area, aiming at streamlining the access to databases by writing natural language with the given intent instead of writing SQL. Current SOTA methods for semantic parsing depend on LLMs to achieve high predictive accuracy on benchmark datasets. This reduces their applicability, since LLMs requires expensive GPUs. Furthermore, SOTA methods are ungrounded and thus not guaranteed to always generate valid SQL. Here we propose T5QL, a new SQL generation method that improves the performance in benchmark datasets when using smaller LMs, namely T5-Base, by 13pp when compared against SOTA methods. Additionally, T5QL is guaranteed to always output valid SQL using a context-free grammar to constrain SQL generation. Finally, we show that dividing semantic parsing in two tasks, candidate SQLs generation and candidate re-ranking, is a promising research avenue that can reduce the need for large LMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题