SAL Noun Code Taxonomy

General Explanation of the SAL Noun Code Taxonomy

Nouns are organized in a taxonomy of Supersets, sets, and subsets.

Nouns have twelve Supersets:

  • concrete

  • mass

  • animate

  • place

  • information

  • abstract

  • process (intr)

  • process (tr)

  • measure

  • time

  • aspective

  • unknown

All noun Supersets have sets, but only some sets have subsets.

In the following, maroon denotes noun Superset, red denotes noun set, and blue denotes noun subset.

Mnemonics for each SAL element are provided for coders and rulewriters. Internal to the system, however, SAL codes are represented numerically. For Nouns, the numeric range signifies place on the taxonomy, as follows:

1-12 = Supersets

17-99 = sets

100-998 = subsets

Guidelines to SAL Coders

Nouns with Multiple Meanings

Many nouns fall into more than one SAL category. For example, passage can be both a conduit under Concrete, and a path under Place. It can also be a piece of writing or musical composition under Information.

Selection among the multiple meanings of a given noun can often be effected by the use of Subject Matter Codes (SMC) when entering the term in TermBuilder.

In some cases, however, Subject Matter Codes are not helpful. In such cases, the user must make an arbitrary choice among SAL codes at the time the word is entered. (Later development plans include giving the system the ability to resolve among the multiple meanings of a common noun on the basis of extra-sentential context. This capability does not presently exist in the Logos System.)

When making coding decisions, users should observe the coding priorities listed below.

Noun Coding Priorities

There is a critical set of priorities governing coding choices for nouns that should be observed, if translation degradation is to be avoided. The following represents the coding hierarchy in order of importance:

  • Verb-biased Nouns (See verbal abstracts set under Abstract Superset). Nouns coded for verb bias tell the system to expect a verb complement.

    Verb-biased codes are critical for parsing. For example:

    1. ways of cooking lentils

    2. types of cooking utensils.

    The verbal abstracts code given to ways in (1) biases the parser to expect a verb and therefore allows the parser to resolve cooking correctly to a verb. In (2) cooking is an adjective.

  • Nouns taking prepositional complementation. (See strong verbals under Abstract Superset.) For example:

    • attitude towards

    • interest in

    • anxiety about

    • phone connection to

    • attention to

    Prep governance codes are critical for parsing decisions regarding prepositional attachement.

  • Mass Nouns. Unlike count nouns, mass nouns can occur in the singular without an article or quantifier; e.g., Gold is expensive.

    Mass codes are critical to parsing. For example:

    1. Test gold for …

    2. … test tube for… .

    In (1), gold as a Mass noun helps the parser to see test as a verb. (Unlike count nouns, singular mass nouns without an article can be the object of a verb.) In (2), test must be a noun because tube is a singular count noun.

    Mass-like codes occur in various places in the SAL noun taxonomy. These include:

    • Mass Superset, which is mass by definition

    • trees/wood subset (e.g. oak) under Concrete Superset

    • edibles/color subset (e.g. orange) under Concrete Superset

    • mammals/food/fur subset (e.g. fox) under Animate Superset

    • fowl/food subset (e.g. duck) under Animate Superset

    • remote mass subset

  • Nouns denoting agents. Agentive type nouns occur in various places in the SAL noun taxonomy. These include:

    • Animate Superset, which is agentive by definition

    • agentive set under Concrete Superset

    • functional location (agentive) subset under Place Superset

    • geographical entities (agentive) subset under Place Superset

    • remote agentive subset (an optional subset code under any set or superset)

SAL Noun Code Hierarchy

For nouns and noun phrases that are able to take more than one code, assign that code which is highest in the following hierarchy.

Note that Process Nouns (WC 4 and 7) are not included here. Process Noun codes are derived automatically from their verbs. (Process Noun codes are preemptive.)

Characteristic

Applicable SAL Type

Mnemonic

Numeric (SS Set Subset)

Takes Verbal Complementation

purpose subset of ABSTRACT

ABpur

6 41 748

method/process/procedure subset of ABSTRACT

ABmeth

6 41 733

cause/potential/disposition subset of ABSTRACT

ABcause

6 41 602

Mass (non-count) Noun

entire MASS noun Superset

MASS

11

trees/wood subset of CONCRETE Superset

COtrwd

3 32 855

edibles/color subset of CONCRETE Superset

COedcol

3 18 855

remote MASS (floating subset)

(variable)

855

Takes Prepositional Complementation

strong verbals subset of ABSTRACT (code is specific for each prep governance)

ABxxx

6 nn 749

recorded data subset of INFORMATION

INdata

12 76

Denotes Agent

entire ANIMATE Superset

AN

5

entire agentive set of CONCRETE Superset

COagen

3 35

agentive geographical entity set of PLACE Superset

PLaggeo

9 94

instructional data set of INFORMATION

12 74

agentive functional location of PLACE Superset

PLagfunc

9 26 228

remote agentive (floating subset)

(variable)

228

All other SAL noun codes are more or less of equal weight.

A Caveat to SAL Coders

The organization of nouns into a small number of sub-classifications is inevitably going to be arbitrary and even seem unprincipled at times.

For example, the LOGOS system codes table as a supporting surface under Concrete Superset and platform as a Place noun, this on the grounds that the latter has human scale. But, by the same token, words like wall and fence are coded Concrete rather than Place despite their human scale.

There is no real defense of this except to repeat that any taxonomy that reduces 100,000 nouns to 100 categories is bound to incur these inconsistencies.

As one becomes familiar with SAL, idiosyncrasies such as this become less troublesome. It is only fair to say that natural language itself is riddled with unprincipled inconsistencies.