Text Generation

Zhang, Li; Sun, Jian-Tao

doi:10.1007/978-0-387-39940-9_416

Text Generation

Li Zhang³ &
Jian-Tao Sun⁴

Reference work entry

1255 Accesses
2 Citations

Download reference work entry PDF

Synonyms

Natural language generation (NLG)

Definition

Text generation is a subfield of natural language processing. It leverages knowledge in computational linguistics and artificial intelligence to automatically generate natural language texts, which can satisfy certain communicative requirements.

Historical Background

Research work in the text generation field first appeared in the 1970s. Goldman's work on natural language generation from a deep conceptual base appeared in [2]. In the 1980s, more significant work was contributed in this field: McDonald saw text generation as a decision making problem [6], Appelt on language planning (1981), McKeown [8]. In the 1990s, a generic architecture for text generation was discussed, Reiter [10], Hovy [3]. Still today, variations on the generic architecture is a still a widely discussed question, Mellish et al. [9].

Foundations

Text Generation, or Natural language generation (NLG), is usually compared with another subfield of natural language processing – natural language understanding (NLU), which is generally considered as the inverse process of the former. Because in a highly abstract level, NLG task synthesizes machine representation of information into natural language texts, while NLU task parses and maps natural language texts into machine representations. However, upon inspection at a more concrete level, they can hardly be seen as “opposite,” because they are very different in problem sets, and by internal representations.

Text Generation System Architecture

Input and Output

The input of text generation system is information represented in non-linguistic format, such as numerical, symbolical, graphical, etc. The output is understandable natural language in text format, such as messages, documents, reports, etc.

Architectures

The Generic Architecture

Despite difference in application backgrounds and realization details, many of the current text generation systems followed a general architecture, which is known as the Pipelined Architecture or Consensus Architecture, usually described as in Fig. 1([11]; Edward Hovy also had a similar representation for this architecture).

figure 1_416 — **Text Generation. Figure 1**

As seen in the Fig. 1, the “Pipelined Architecture” describes a general strategy of tackling text generation problem from macro to micro, from inner structure organization to outer surface realization. Thus, language components such as paragraphs, sentences, and words will be coherently arranged together to meet certain communicative requirements.

The following are the detailed descriptions of the above stages:

Stage 1: Document Planning

Also known as Text Planning, Discourse Planning or Macro Planning). This includes:

Content determination: Also know as content selection and organization, which is to discover and determine the major topics the text should cover, given a set of communicative goals and representations of information or knowledge.
Document structuring: Determining the overall structure of the text/document. This structure categorizes and organizes sentence-leveled language components into clusters. The relationship between different components inside a cluster can be explanatory, descriptive, comparative, causal, sequential, etc.

Stage 2: Micro Planning

Also know as Sentence Planning. This is to convert a document plan into a sequence of sentence or phrase specifications, including:

Aggregation: To combine several linguistic structures (e.g., sentences, paragraphs) into a single and coherent structure. An example: Tomorrow will be cold. Tomorrow will be windy.
- →Tomorrow will be cold and windy.
Lexicalization: To choose appropriate words from possible lexicalizations based on the communicative background. Examples: (i) buy, purchase, take, etc, (ii) a lot of, large amounts of, etc.
Referring expression generation: To choose or introduce different means of reference for sentences, such as pronouns (pronominalization). There is usually more than one way to identify a specific object, for example: “Shakespeare, ” “the poet and playwright, ” “the Englishman, ”and “he/him” can all point to the same object. Example: Andrew wanted to sing at the birthday party.
- →He wanted to sing at the birthday party.
- →The boy wanted to sing at the birthday party.

Stage 3: Surface realization

Also know as Speech Synthesis. This is to finally synthesize the text according the text specifications made in the previous stages.

Structure realization: To mark up the text's surface structure, such as an empty line, or the boundaries between paragraphs, etc.
Linguistic realization: To smooth the text by inserting function words, reorder word sequences, and select appropriate inflections and tenses of words, etc.

Other Architectures:

Although the Pipelined Architecture provides a considerably articulate routine for text generation, it also provides predetermined restrictions for each stage in the process. Thus, the flexibility it can provide is limited, and is especially true for those sub-tasks in micro planning and surface realization stages. For example, the need for lexical selection can happen at any stage of the process. Thus, variations of the generic architecture and other methodologies have been discussed by many researchers (a recent discussion, Chris Mellish et al. [9]).

Key Applications

1.
Routine documentation or information generation: examples of information are weather forecast descriptions, transportation schedules, accounting spreadsheets, expert system knowledge bases, etc. Examples of documentation are technical reports and manuals, business letters, medical records, doctor prescriptions, etc.
2.
Literary writing: such as stories, poems, lyrics, couplets, etc. (Chinese couplet writer: generating a couplet sentence according to a given one. http://duilian.msra.cn).

Cross-references

Text semantic Explanation

Author information

Authors and Affiliations

Peking University, Beijing, China
Li Zhang
Microsoft Research Asia, Beijing, China
Jian-Tao Sun

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Tao Sun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, 266 Ferst Drive, 30332-0765, Atlanta, GA, USA
LING LIU (Professor) (Professor)
Database Research Group David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, N2L 3G1, Waterloo, ON, Canada
M. TAMER ÖZSU (Professor and Director, University Research Chair) (Professor and Director, University Research Chair)

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Zhang, L., Sun, JT. (2009). Text Generation. In: LIU, L., ÖZSU, M.T. (eds) Encyclopedia of Database Systems. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-39940-9_416

Download citation

DOI: https://doi.org/10.1007/978-0-387-39940-9_416
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-35544-3
Online ISBN: 978-0-387-39940-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Text Generation

Synonyms

Definition

Historical Background

Foundations

Text Generation System Architecture

Input and Output

Architectures

The Generic Architecture

Stage 1: Document Planning

Stage 2: Micro Planning

Stage 3: Surface realization

Other Architectures:

Key Applications

Cross-references

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Synonyms

Definition

Historical Background

Foundations

Text Generation System Architecture

Input and Output

Architectures

The Generic Architecture

Stage 1: Document Planning

Stage 2: Micro Planning

Stage 3: Surface realization

Other Architectures:

Key Applications

Cross-references

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation