This paper is dealing with the topic of corpus linguistics. It tries to give an overview about this topic, as I think that, although it already gained popularity, not everyone is familiar with it.
First, I want to explain what a corpus actually is, and what it is useful for. The different types of corpora will be described and also potential risks of depending to much on computer-processable corpora. Then the focus shifts to the fields of application of corpus linguistics, and also the use in syntax and morphology is discussed. I will also try to illustrate the opportunities a corpus provides by using an example for better understanding.
The main aim of this paper is to give an overview about corpus linguistics and the fields of application, with attention to syntax and morphology.
Table of Contents
1. Introduction
2. What is a corpus?
3. What is the use of a corpus?
4. What are potential risks?
5. What are the fields of application of corpus linguistics?
6. What is the use of corpora in syntax and morphology?
7. Conclusion
Objectives and Key Themes
The primary objective of this paper is to provide a comprehensive introduction to the field of corpus linguistics. It aims to clarify what constitutes a corpus, explore its utility in linguistic research, and address the potential risks associated with relying on computer-processable data, while specifically examining applications within syntax and morphology.
- Definition and fundamental characteristics of a corpus
- Advantages of using large-scale corpora over linguistic introspection
- Critical analysis of potential risks and limitations in corpus-based research
- Practical applications of corpora in syntax, morphology, and language teaching
- Demonstration of corpus analysis techniques using the Corpus of Contemporary American English (COCA)
Excerpt from the Book
What is the use of a corpus?
As it has been said above, the samples in a corpus are selected according to particular criteria, depending on its purpose. The selection does not so much depend on the content of the text samples, but more on “external” features such as “the situation of their production or reception” (Aston and Burnard 1998, 5).
The biggest advantage of a corpus for linguists is its size. The history of corpus linguistics only dates back to the 1960s, and even shorter is the history of computer-processable corpora. Before they had the possibility to browse large corpora for signs for language rules, linguists were working a lot with introspection. No individual is able to have the whole “repertoire” of a language, as there are many different fields of language with different terminologies. Corpus data can be seen as an extension of “linguistic intuition” to create an objective impression of a language: “A corpus can enable grammarians, lexicographers, and other interested parties to provide better descriptions of a language by embodying a view of it which is beyond any one individual’s experience” (Aston and Burnard 1998, 5). For grammarians, a corpus provides, due to its size, information about the frequency of certain combinations of words and about sentence structure; lexicographers use it to find out the frequency of words, which is useful e.g. in composing dictionaries. Also, information about different uses of words, register, diachronic varieties and different uses of language in general can be found.
Summary of Chapters
1. Introduction: This chapter outlines the scope of the paper, aiming to demystify corpus linguistics and provide a roadmap for the subsequent discussion on its utility and applications.
2. What is a corpus?: This chapter defines a corpus by drawing on scholarly definitions, emphasizing that it is a collection of language samples selected for systematic analysis rather than an arbitrary collection of texts.
3. What is the use of a corpus?: This section discusses the advantages of corpora, such as their large size and objectivity, which allow linguists to move beyond personal introspection to analyze language in actual use.
4. What are potential risks?: This chapter critically addresses the limitations and potential pitfalls of corpus-based research, including the dangers of over-reliance on statistical data at the expense of careful linguistic analysis.
5. What are the fields of application of corpus linguistics?: This chapter outlines various research areas benefiting from corpus data, including collocation analysis, contrastive linguistics, translation studies, and natural language processing.
6. What is the use of corpora in syntax and morphology?: This chapter explains how corpora facilitate the verification of grammatical rules and the analysis of word forms, providing a practical demonstration using COCA.
7. Conclusion: The final chapter summarizes the importance of corpora as an essential tool in modern linguistics that complements, rather than replaces, human linguistic intuition.
Keywords
Corpus Linguistics, Syntax, Morphology, Collocation, Language Variety, Introspection, Empirical Observation, Corpus of Contemporary American English, COCA, Natural Language Processing, Statistical Linguistics, Linguistic Research, Language Teaching, Frequency, Register.
Frequently Asked Questions
What is the core focus of this research paper?
The paper focuses on providing an accessible introduction to corpus linguistics, explaining its definition, practical utility, and its role as a supplementary tool for linguistic analysis.
What are the primary themes discussed in the work?
The work covers definitions of a corpus, the shift from introspection to objective data analysis, the advantages and risks of using corpora, and specific applications in syntax and morphology.
What is the main research objective?
The main objective is to provide an overview of the field of corpus linguistics and demonstrate how it functions as a powerful, data-driven alternative or extension to traditional methods.
Which scientific methodology is employed?
The paper uses a descriptive methodology, drawing upon established linguistic literature and providing a practical, empirical demonstration through the use of the Corpus of Contemporary American English (COCA).
What is covered in the main body of the paper?
The main body systematically explores definitions of a corpus, the advantages of size and objectivity, the critical limitations pointed out by scholars, and specific use-cases like collocation and syntax analysis.
Which keywords best characterize this work?
Key terms include Corpus Linguistics, Syntax, Morphology, Collocation, Empirical Observation, and Natural Language Processing.
How does the author define a corpus in this paper?
The author references John Sinclair, defining a corpus as a collection of language pieces selected and ordered according to explicit linguistic criteria to serve as a sample of the language.
What specific risks does the author identify regarding corpus usage?
The author notes the risk of prioritizing size over adequacy and the danger of paying less attention to detail, as well as the inherent limitation that a corpus cannot define what is impossible in a language.
What did the author demonstrate using the COCA database?
The author demonstrated how to look up a specific word ("skeptical"), compare its collocates across different sections like "fiction" and "newspaper," and utilize KWIC results for detailed analysis.
- Arbeit zitieren
- Theresa Rass (Autor:in), 2010, Corpus Linguistics - An Introduction to the Field and its Use in Linguistics, München, GRIN Verlag, https://www.hausarbeiten.de/document/181516