Our idea

Our project, carried out by two units, one from the University of Milan and one from the University of Insubria, studies the English metalanguage that was created to analyse and compare, appraise and classify, teach and learn the vernacular languages of Europe between 1500 and 1700, before the development of comparative philology and the institutionalisation of linguistics as an academic discipline. 

MetaLing Corpus: Creating a corpus of English linguistics metalanguage from the 16th to the 18th century 

Our project studies the English metalanguage that was created to analyse and compare, appraise and classify, teach and learn the vernacular languages of Europe between 1500 and 1700, i.e. before the development of comparative philology and the institutionalisation of linguistics as an academic discipline. To this end, we will build a corpus of texts dedicated to or including observations on vernacular languages, which, in the period under review, are to be found in works with a large variety of aims and fields (Van Hal 2019). Through extensive archival research and corpus compilation, the project in the field of history of English for Specific Purposes (ESP) aims to assess the genres and text-types involved in the circulation of linguistic knowledge, and thus throw light onto unconventional texts and voices besides the major works and figures on which scholarship has naturally concentrated. The core part of our study will involve the analysis of the terminology, discursive strategies and descriptive metaphors used to discuss language in these texts, in diachronic perspective.

Our method for corpus collection combines human and computational tools (Moretti 2000) to analyse available sources and make an inventory of authors and works representative of early modern linguistic metalanguage in English. For the purposes of this project, we intend to collect a meaningful corpus (Sangiacomo et al. 2022) in the sense that it both corroborates existing scholarly knowledge about some major aspects of the evolution of linguistics and its metalanguage in English and provides new insights about facets of this evolution that have not been observed previously. For these reasons, we aim at an open corpus, the composition of which may change over time also benefitting from future external contributions. In terms of actual workflow, we will proceed as follows: the large amount of scraped information will be cleaned, simplified, and tokenised via NLTK Python libraries; subsequently, the keywords and collocations will be further consolidated, analysed and processed though lexicon extraction techniques (Anglin, 2019; Lahti et al., 2019). The corpus will be published open access to be freely queried by other researchers.

The use of such a corpus will be multifold. This tool will help raise awareness of the significance of linguistics and philology in multilingual Europe, as a way to enhance the importance of these studies for the advancement of our knowledge of a long tradition of contact, exchange and even conflict between the linguistic and cultural identities of Europe. It will be a scholarly and didactic tool, and the terminology extracted from it will provide data of interest for open source dictionaries and lexical repertoires. This study is timely and relevant as a contribution to the existing debate on the development of the discourse of the humanities as an inherently interdisciplinary field.  

Anglin K. L. 2019. “Gather-Narrow-Extract: A Framework for Studying Local Policy Variation Using Web-Scraping and Natural Language Processing”, Journal of Research on Educational Effectiveness, 12(4), 685-706.

Lahti L., Marjanen J., Roivainen H., Tolonen M. 2019. “Bibliographic Data Science and the History of the Book (c. 1500–1800)”, Cataloging & Classification Quarterly, 57(1), 5-23.

Moretti F. 2000. “Conjectures on World Literature”, New Left Review, 1, 54.

Sangiacomo A., Tanasescu, R., Donker, S., & Hogenbirk, H. 2022. “Mapping the evolution of early modern natural philosophy: corpus collection and authority acknowledgement”. Annals of Science, 79(1), 1–39.

Van Hal, T. 2019. “Early Modern Views on Language and Languages (ca. 1450-1800).” In Oxford Research Encyclopaedia of Linguistics. Oxford UP. 

  • Angela Andreani (Principal Investigator), University of Milan
  • Daniel Russo (Associated Investigator), University of Insubria
  • Martin Petkov Ruskov, University of Milan
  • Simona Turbanti, University of Milan
  • Vahid Asadi, University of Milan
  • Asadi V., Russo D., Andreani A., “Early Modern linguistic terminology in the MetaLing Corpus: Computational approaches and methodological challenges”, Seminar 55 “Human mediation and linguistic knowledge across centuries: Multilingualism, metalanguage, and multimodal teaching practices” chaired by Montini D. and Russo D., 32nd AIA Conference, 13 September 2025, University of Turin, Turin.
  • Andreani A., Russo D., Asadi V., “Teaching material in the Metaling Corpus”, International Inter-Association Conference on the History of Language Learning and Teaching (ICHoLLT 2025), 5 June 2025, Como, University of Insubria, Como.
  • Asadi, V., Andreani, A., Russo, D., “Exploring the use of English metalanguage in early modern sources: Focus on orthography and variation”, Les Journées Internationales de Linguistique de Corpus (JLC), 21-23 October 2025, Lyon.
  • Cuscito M., Ferrara A., Ruskov M., “Shakespeare Did Not Know Our Vocabulary: Measuring the Historical Adequacy of LLMs.” Paper presented at IEEE International Conference on Cyber Humanities. 2025. 
  • Andreani A., "Intra-writer socio-pragmatic variation in the sermons of a 17th-century Gloucestershire preacher", Historical Sociolinguistics Network Conference 2025, University of Bristol, 22-24 May 2025.
  • Andreani A., "Lexical change through the English Reformation: Religious and 'linguistic' terminology", 19th SLIN Conference Socio-political Instability and Language Change (1300–1900), Università degli Studi di Catania 8-10 May 2025.
  • Andreani A., "Labelling Language Variety and Diversity in the English Renaissance", 71st Renaissance Society of America (RSA) Annual Meeting, 20 March 2025, Boston (MA), USA.
  • Cuscito M., Ferrara A., Ruskov M., “How BERT Speaks Shakespearean English? Evaluating Historical Bias in Masked Language Models”, IAI4CH @AIxIA 3rd International Workshop on Artificial Intelligence for Cultural Heritage, 28 November 2024, Bolzano.
  • Russo D., “Corpus-Driven Metalinguistic Explorations: Analyzing Language Discussions in Early Modern English Sources”, 71st Renaissance Society of America (RSA) Annual Meeting, 22 March 2025, Boston (MA), USA.
  • Russo D., Andreani A., “Varieties of metalinguistic awareness in English, 1500-1700”, XVII Convegno Internazionale CIRSIL, 18 October 2024, University of Trento.
  • Russo D., Andreani A., “Mapping the history of language-related terminology in English (1500-1700): A corpus-based collocate approach”, Henry Sweet Society Colloquium 2023, 4 September 2023, University of Trás-os-Montes and Alto Douro, Vila Real, Portugal.
  • Russo D., “Navigating the Digital Frontier: A Study of Concurrent Translation (CT) Practices among Italian Professionals”, How can AI translate?, 22 aprile 2024, Università degli Studi di Napoli Federico II, Napoli.
  • Andreani A., Metalinguistic labelling in Florio’s A Worlde of Words (1598), International conference John and/or Giovanni, Tradition and Innovation in Florian Studies, 13-14 June 2024, Sapienza Università di Roma. 
  • Russo D., Andreani A., “A review of the metalinguistic labelling in R.C. Alston’s collection of texts regarding Cant and Dialects”, International Conference on the History of the Language Sciences (ICHoLS), August 26-30, 2024, Tbilisi State University, Tbilisi (Georgia). 
  • (Forthcoming)
  • Andreani, A., Russo D. (2026). “Work Data 1 - Metaling Corpus.” UNIMI Dataverse. https://doi.org/doi:10.13130/RD_UNIMI/KS2PUX.
  • Andreani, A., Russo, D. (2025). “Language for Specific Purposes: Historical Perspectives”. In: International Encyclopedia of Language and Linguistics. Elsevier, pp. 1-9, doi: 10.1016/b978-0-323-95504-1.00710-9.
  • Andreani, A. (2025). “Metalinguistic Labelling in John Florio’s: A Worlde of Wordes (1598)”. In: New Essays on John Florio: Linguistic and Cultural Perspectives, edited by D. Montini. Sapienza Università Editrice, pp. 67-88.
  • Cuscito, M., Ferrara, A., & Ruskov, M. (2025). “Shakespeare did not know our vocabulary: Measuring the Historical Adequacy of LLMs”. 2025 IEEE International Conference on Cyber Humanities (IEEE-CH), Florence, Italy, pp. 1-7, doi: 10.1109/IEEE-CH65308.2025.11279542.
  • Cuscito, M., Ferrara, A., & Ruskov, M. (2024). "How BERT Speaks Shakespearean English? Evaluating Historical Bias in Masked Language Models" (short paper). IAI4CH@AI*IA.
  • Andreani, A., Russo, D. (2023). “Building a Corpus of the Metalanguage of English Linguistics 1500-1700: Methodological Issues”. In: LINGUISTICA E FILOLOGIA 43, doi: 10.13122/LeF_43_p151.
Seminar Series

We organised the seminar series “MetaLing Corpus Project: Texts and Methods”, 12 March 2024 – 4 November 2025 Università degli Studi di Milano: https://dllcm.unimi.it/it/metaling-corpus-project. The seminar series involved an invited list of guest speakers from international universities and was dedicated to presenting research progress, discussing innovative methodologies in historical linguistics, and sharing best practices in corpus construction and analysis.

Symposium
Immagine
Symposium poster

The symposium "Addressing Technical Challenges in Corpus Linguistics Research" took place on 11 April 2025 at the Aula Magna of the University of Insubria, Sant'Abbondio, Como. This event brought together distinguished scholars in the field, including Alicia Rodríguez Álvarez (University of Las Palmas de Canaria), Isabel Sofía Moskowich-Spiegel Fandiño and Luís Miguel Puente Castelo (University of A Coruña),  Walter Giordano (University of Naples Federico II), Silvia Bernardini and Adriano Ferraresi (University of Bologna). Together, they exploreed current technical issues and advancements shaping corpus linguistic research, offering valuable insights for both academics and practitioners.

The MetaLing Project contributed to the organisation of the International Inter-Association Conference on the History of Language Learning and Teaching 2025 (ICHOLLT 2025), hosted by the University of Insubria. As part of the conference, the project convened a panel entitled Exploring metalinguistic awareness in early modern English educational (con)texts: historical perspectives and methodological challenges, chaired by Angela Andreani (University of Milan) and Daniel Russo (University of Insubria), with the support of discussant Leidamaria Monaco (Universidade da Coruña). The panel featured contributions from Cristiano Ragni (Università di Verona), Marco Bagli (Università per Stranieri di Perugia), and Vahid Asadi (Università di Milano).

Immagine
Notte dei ricercatori 2025 Milano
  • Organisation of the engagement activity for secondary school students: “Storie e curiosità della lingua inglese. Viaggio alla scoperta della storia della lingua inglese e delle sue varietà” as part of the Notte Europea delle Ricercatrici e dei Ricercatori, 27 September 2024, Università dell’Insubria, Varese. The activity was designed as an outreach effort to engage secondary school students in exploring the history and varieties of the English language. 
  • Organisation of the engagement activity for high school students: “Viaggio nella lingua inglese: tra enigmi, mappe, e intelligenza artificiale” as part of the Notte Europea delle Ricercatrici e dei Ricercatori, 27 September 2025, Università degli Studi di Milano. https://www.meetmetonight.it/2025/eventi/viaggio-nella-lingua-inglese-tra-enigmi-mappe-e-intelligenza-artificiale/. The workshop was addressed to upper secondary school students, designed as an outreach activity to explore the history and varieties of the English language while challenging some of the most common myths about it.
  • Dataverse: https://dataverse.unimi.it/dataset.xhtml?persistentId=doi:10.13130/RD_UNIMI/KS2PUX The MetaLing corpus is archived and published on the University of Milan Dataverse repository, which ensures long-term preservation, persistent identification (DOI), and compliance with FAIR principles for research data. This platform provides open access to the corpus dataset for scholarly reuse.
  • Hugginface: https://huggingface.co/datasets/MetalingProject/MetaLingCorpus   The Hugging Face repository distributes the corpus in formats suitable for computational analysis and Natural Language Processing research, facilitating its reuse in digital humanities and machine-learning environments.
  • GitHub: https://github.com/MetalingProject The GitHub repository provides access to the technical infrastructure of the project, including scripts, data-processing workflows, and documentation related to corpus preparation and analysis, supporting transparency and reproducibility.
  • Omeka: https://metaling.omeka.net/ The Omeka platform hosts the structured metadata and curated presentation of the corpus materials, allowing users to explore texts, authors, and subcorpora through a searchable digital collection based on Dublin Core standards.
  • Varieng Corpus Resource Database (CoRD): The MetaLing corpus has been listed in the database maintained by the Varieng research unit at the University of Helsinki, one of the major international directories of historical English corpora. Its inclusion enhances the visibility and integration of the resource within the global corpus linguistics research infrastructure.
     
Sponsors

Il progetto è realizzato con il contributo del Ministero dell’Università e della Ricerca, progetto PRIN bando 2022 – “MetaLing Corpus: Creating a corpus of English linguistics metalanguage from the 16th to the 18th century”, ref.: 202233C93X, finanziato dall’Unione Europea – NextGenerationEU, PNRR Missione 4 - Componente 2 - Investimento 1.1.

For information

Angela Andreani, University of Milan (Principal Investigator) angela.andreani@unimi.it

Daniel Russo, University of Insubria (Associated Investigator) daniel.russo@uninsubria.it