This paper deals with the International Corpus of English (ICE), which is an international project that was initiated by Professor Sidney Greenbaum in 1988. His book“Comparing English Worldwide”has been a very helpful resource for writing this paper.
Many countries and people have been - and still are presently - involved in the research and compilation of corpora. It is the corpora of eighteen countries from all over the world that make up the International Corpus of English. The paper gives information on the design and structure of the International Corpus, in general and in depth, with focus on the division into a written and spoken category of texts. The paper mainly talks about what the corpora of the ICE have in common and not so much about the individual differences among them.
Furthermore, the paper explains the theoretical given facts of text categories and the two different ways this particular corpus was annotated: textual and biographical/bibliographical markup.
Last but not least, one will briefly learn about the ICE Corpus Utility Program, which was especially developed in order to simplify the analysis of and ease the access to the ICE.
Table of Contents
1. Introduction
2. Information on the Background
3. The ICE Project
3.1. Participating Countries
3.2. Professors and Helpers
4. The Structure of the ICE
4.1. General Facts
4.2. Information in Detail
4.2.1. Written Texts
4.2.2. Spoken Texts
5. Texts
5.1. Subtexts
5.2. Text Units
5.3. Extra-Corpus Text
6. Corpus Annotation
6.1. Textual Markup
6.2. Biographical/Bibliographical Markup
7. ICECUP
8. Closing
Objectives and Themes
The primary objective of this paper is to provide a comprehensive overview of the International Corpus of English (ICE), outlining its design, structure, and the collaborative international efforts behind its compilation. The research focuses on the classification of textual data into written and spoken categories and examines the methodologies used for annotation and digital accessibility.
- Historical context and the role of Professor Sidney Greenbaum in initiating the ICE project.
- Global participation and the selection criteria for contributing countries.
- Structural design, including the distinction between written and spoken text categories.
- Annotation techniques, specifically textual and biographical/bibliographical markup.
- Technical tools such as the ICE Corpus Utility Program (ICECUP) for data analysis.
Excerpt from the Book
4.2.1. Written Texts
The written part of the ICE represents the broad spectrum of writing that exists, such as fiction, press reportage, editorials, and popular and learned writing. Furthermore, three types of writing not typically found in corpora are included: personal letters, business correspondence as well as student essays and exams.
Notably missing from the written part of the corpus is legal English, a highly specialized type of English that was excluded on the grounds that it represents a kind of English mainly intended for a very specialized audience.
Of course, the printed and non-printed texts differ in their intended readership and in their mode of composition. Printed material is written for a large, unrestricted audience that the writer does not know. In some cases, such as newspapers, popular writing, and fiction, this audience is the general public. The intended readership for non-printed material is much smaller. For social letters, it is usually one individual who is personally known to the writer. On the other hand, the addressee is not necessarily known personally to the writer of business letters.
The major difference between printed and non-printed texts is that non-printed texts are a direct product of the individual writer and have usually not been edited by anyone else. In contrast, writers of printed works are usually required to follow the house style of the publisher or newspaper for which they are writing.
Summary of Chapters
1. Introduction: This chapter introduces the International Corpus of English (ICE) project and outlines the paper's scope, focusing on corpus design and structure.
2. Information on the Background: This section details the origins of the project and the contributions of Professor Sidney Greenbaum in establishing the ICE.
3. The ICE Project: This chapter identifies the participating nations and recognizes the researchers and scholars involved in the development and maintenance of the corpus.
4. The Structure of the ICE: This section provides an in-depth analysis of the corpus design, distinguishing between written and spoken categories and setting standards for data inclusion.
5. Texts: This chapter categorizes the corpus into subtexts, text units, and extra-corpus materials to clarify how data is structured for analysis.
6. Corpus Annotation: This chapter explains the two primary markup methods: textual markup for typographical features and biographical/bibliographical markup for metadata.
7. ICECUP: This chapter describes the International Corpus of English Corpus Utility Program as a tool to facilitate research and grammatical analysis of the corpus.
8. Closing: The concluding chapter summarizes the project's significance and the ongoing collaborative efforts of the participating countries.
Keywords
International Corpus of English, ICE, Corpus Linguistics, Sidney Greenbaum, Text Categories, Spoken English, Written English, Corpus Annotation, Textual Markup, Bibliographical Markup, ICECUP, Linguistic Research, Language Varieties, Data Compilation, Grammar Analysis
Frequently Asked Questions
What is the core focus of this research paper?
This paper provides an overview of the International Corpus of English (ICE), covering its background, structural organization, and the methodologies used to annotate and analyze its data.
What are the primary themes discussed in the work?
The paper explores the global cooperation required to compile the ICE, the technical classification of text types, the importance of annotation, and the utility of specific software for linguistic research.
What is the main goal or research question?
The goal is to inform the reader about the design and structure of the ICE, specifically regarding how researchers manage the vast amount of written and spoken language data collected from eighteen different countries.
Which scientific methods are applied in the compilation of the ICE?
The project follows a systematic design where every corpus contains one million words, divided into 500 texts of approximately 2,000 words each, utilizing standardized textual and biographical annotation schemes.
What topics are covered in the main section of the paper?
The main section details the historical background, the selection of participating countries, the distinction between written and spoken texts, the use of subtexts and text units, and the implementation of markup for digital analysis.
Which keywords best describe the paper?
Key terms include International Corpus of English (ICE), corpus linguistics, textual markup, spoken/written categories, and linguistic annotation.
Why was "legal English" excluded from the written part of the corpus?
Legal English was excluded because it is a highly specialized variety of the language intended for a very specific, limited audience, making it less representative of general linguistic usage.
What is the function of the ICECUP software?
ICECUP is designed to exploit the grammatical annotations of the ICE, allowing users to browse texts, filter by specific markup, and conduct complex searches based on criteria like speaker demographics or text type.
- Arbeit zitieren
- M.A. Frauke Scheben (Autor:in), 2002, The International Corpus of English (ICE), München, GRIN Verlag, https://www.hausarbeiten.de/document/55689