The Design of the UCLA Phonological Segment Inventory Database

The Design of the UCLA Phonological Segment Inventory Database (UPSID)


10.1  Introduction

The discovery of generalizations concerning the content and structure of phonological inventories has been a significant objective of recent work in linguistics.

A.     Stanford Phonological Archive (SPA)

1.      Stanford’s team found that they had to limit the size of their language samples as their work processed, a principal reason for exclusion was the scarcity of adequately detailed phonological descriptions.

2.      For  retrieval of certain information, the true sample size is smaller than 196 languages, with each reduction, the likelihood that sample is no longer representative and properly balanced increase.

3.      There is a certain inflexibility inherent in the format, which is a text-oriented system: each segment is entered as an alphabetical character or string.

B.     UCLA Phonological Segment Inventory Database (UPSID)

1.      UPSID provides uniform data from a properly balanced sample of an adequate number of languages for statistically reliable conclusions to be reach.

2.      UPSID was designed to be narrower in the scope of information about each language entered, but to be more comprehensive in the number of languages entered.

3.      UPSID is designed to maximize the ease and flexibility with which numeral data can be manipulated.

10.2  Selection of Language for UPSID

A.     The ideal sample for purpose of statistical evaluation is a random sample.

B.     Sampling procedure:

1.      The principle on which UPSID is based is to select one and only one language from each moderately distant genetic grouping.

2.      The advantages:

a.       It precludes selection of data which represents arguably the same language in several varieties.

b.      It directs a principled search for the data to fulfill the quota design.

c.       Avoids undue reliance on description that happen to be at hand.

3.      The procedure has been to assemble the most comprehensive and accurate genetic classifications available.

4.      By synthesis of several classifications, the procedure has been to produce an overall classification for each of 11major groupings of languages plus several smaller groups.

10.3  Determining the Inventories

A.       Determining the phonological inventory for each language involves two principles aspects:

1.      How many contrastive units are there?

“Contrasts” are sound differences capable of distinguishing lexemes or morphemes in the language involved.

2.      What phonetic characteristics should be attributed to each one?

B.        Analysis of phonetic diphthongs

C.       Each segment that is judged to deserve inclusion in the inventory is represented by a phonetic specification.

D.       The variable set is designed so that there is a minimum of appeal to redundancy to interpret their meaning, also to accommodate some of the major indeterminacies found in the phonological sources.

10.4  Indices and Variables


10.5  Using UPSID

A.     UPSID consists of 9957 records constituting a SAS (Statistical Analysis System) file, which is a powerful and flexible data manipulation and statistical analysis system.

B.     Example: Use the data to check the hypothesis that “No Nasal consonant appears in a language unless a stop occurs at the same place of articulation.”

1.        Define more precisely what is intended by the terms “stop” and “nasal”.

2.        Select the data according to the specifications.

3.        Restate the hypothesis.

4.        Sum the totals for each variable on a language-by-language basis.

5.        Add the summed value for each original place variable and the corresponding nasal place variable for each language.

6.        Produce an index value of 11 (1 nasal, 1 affricate ) and no exception to the hypothesis.