The Linear Arrangement Library

The Linear Arrangement Library (LAL) is a tool for research in the growing field of Quantitative Dependency Syntax (Quasy 2019, Quasy 2021). Quantitative Dependency Syntax results from the combination of Quantitative Linguistics and Dependency Syntax. As syntactic dependency structures are two-fold and typically defined as a rooted tree and a linear ordering (=arrangement) of its vertices, LAL is also a tool for related areas such as Computer Science, Graph Theory or the Theory of Spatial Networks.

LAL has been designed to ease the statistical analysis of individual syntactic dependency structures, treebanks (collections of syntactic dependency structures) or collections of treebanks. Common analyses in Quantitative Dependency Syntax can be performed executing a few lines of code which requires practically no programming experience. Moreover, learning how to use the library can serve as an introduction to programming. LAL is available in Python and C++ and provides state-of-the art algorithms both in terms of speed and numerical precision to calculate a myriad of syntax scores of two sorts: scores that do not depend on the linear ordering of the sentence (e.g., the minimum sum of dependency distances, mean hierarchical distance), and those that do depend on such ordering (e.g., the sum of dependency distances, the proportion of head initial dependencies).

A major goal of LAL as a tool for quantitative syntax research is to enable one to test hypotheses (e.g., dependency distance minimization) via calculation of baselines of two types: random baselines (e.g. the expected sum of dependency distances in random shufflings of the words of a sentence) and minimum baselines (the minimum sum of dependency distances over all possible shufflings of the words of a sentence). LAL comes with specific algorithms to calculate these baselines quickly and accurately. It also provides the user with methods that allow one to calculate baselines and test hypotheses for virtually any possible score: random and exhaustive generation of linear arrangements for scores that depend on the linear ordering and random an exhaustive generation of trees for scores that do not depend on the linear structure.

In addition to be designed to be fast (computing all the scores on the whole UD collection takes a few minutes on laptop), LAL has been designed to be reliable. Although researchers are gradually getting used to share code and make it publicly available (including some algorithms in LAL), the effort made to ensure correctness is often unknown. LAL, on the contrary, is the result of transferring the knowledge and the technology that has allowed us to find errors in classic algorithms, e.g. Shiloach’s or Hochberg & Stallmann’s, and provide well-tested implementations (Esteban and Ferrer-i-Cancho 2017, Alemany-Puig et al 2021). Ensuring the correctness of the code is central to the LAL team and is indeed an independent branch of the project.

LAL was also designed to ease interaction between researchers from different parts of the globe. To the best of our knowledge, LAL integrates, for the first time in a single tool, scores and views from different traditions of quantitative syntax (e.g., formal constraints on syntactic dependency structures). Examples are the sum of dependency distances, preferred in the US tradition, dependency flux from the French tradition and the mean hierarchical distance and many other scores from the Chinese tradition. In attempt to be useful to Computer Scientists and Graph Theorists, the library offers (to the best of our knowledge) the only public implementations of classic algorithms, such as Shiloach’s or Chung’s algorithms to solve the Minimum Linear Arrangement problem, random and exhaustive generation methods to support analytical work or the development of new algorithms and a C++ core to satisfy advanced computing needs.

References

Visit the publications page for further information.