Syntactic+annotation

=Syntactic annotation=

=
The ENHG corpus has endeavored to follow the guidelines for the Penn Parsed Corpora of English as closely as possible. However, some changes have been made. The most general points are discussed below; discussions on the treatment of individual words and constructions can be found here.=====

Contents:

 * **Clause structure**
 * **Comparatives**
 * **Noun phrases and case**
 * **Verbs that take two accusatives**
 * **Impersonal experiencers**
 * **A list of verbal particles**

 =Clause structure=

Matrix clauses with inversion, including apparent verb-second (V2) clauses, are treated as IP-MAT and not CP, just as in the Penn Parsed Corpora of English and in the IcePaHC corpus. Similarly, subordinate clauses with apparent verb-second (or verb-first) orders are treated as normal IP-SUB complements of a CP.

This is necessary not only to enforce parallels between corpus, or even ease of parsing; it is also necessary to facilitate parsing of ambiguous structures. For example, many subordinate clauses may appear to exhibit embedded verb-second structures because of phrasal extraposition. Similarly, a strict verb-second annotation of matrix clauses might force an annotator to analyze any phrases violating V2 order as left-dislocated, obscuring possible verb-third orders in ENHG main clauses.



**Comparatives**
The ENHG corpus only has clausal comparatives; because apparent NP complements of the subordinating prepositions**WIE,** **ALS** and **DENN** are nominative, it is assumed that the complement is in fact a full clause with most material elided/absent. Elided material is not added in. The most common prepositions which introduce comparatives are **ALS,** **WIE** and **DENN.**

Although comparatives are usually embedded under an ADVP, ADJP, etc., clause-level PPs headed by ALS may also introduce a CP-CMP. In this case, ALS has a meaning like "in the way that," and the comparative clause has an adverbial gap.

 =Noun phrases and case=


 * Case is marked only on the phrase level, and never on the word level. Only NPs are marked for case.** In general, NPs with superficially ambiguous case are tagged according to principles of case assignment in Modern German. General guidelines for marking NPs with ambiguous case will be noted below, while any exceptions will be noted in the guidelines for individual words.

Rather than function dash tags (i.e. NP-SBJ, -OB1, -OB2, etc), **all NPs are tagged for case**. The dash tags for case are as follows: -NOM, -ACC, -DAT and -GEN. Note that these are genuine case tags, and may not directly map onto a grammatical function; for example, not all NPs tagged -GEN are possessives. Case tags are not appended only to NPs at the clause level, but also to NP complements of PP, NP, and so on.

Additional dash tags may be included to clarify the functions of exceptional NPs:
 * **Adverbial, temporal and directional** **NPs** are tagged i.e. NP-ACC-ADV, NP-ACC-TMP, and NP-ACC-DIR. When case is ambiguous, the default is to tag these NPs as accusative. However, note that certain temporal NPs (for example **DES ABENDS** or **EINES TAGES**) are genitive.
 * **Left-dislocated NPs** are tagged i.e. NP-NOM-LFD. Left dislocated free relatives, which have no overt material to indicate case, are by default tagged as nominative (this is also the default for other left-dislocated NPs with ambiguous case).
 * **Measure phrases** are tagged i.e. NP-ACC-MSR. When case is ambiguous, the default is to tag measure phrases as accusative.
 * **Nominative predicates** are tagged NP-NOM-PRD.
 * **Non-nominative subjects** are tagged i.e. NP-ACC-SBJ.
 * **Parenthetical NPs** include the case tag (i.e. NP-ACC-PRN).
 * **Reflexive NPs** are tagged i.e. NP-ACC-RFL.
 * **Vocative NPs** are tagged i.e. NP-NOM-VOC. Vocatives with ambiguous case are tagged as nominative by default.

Other distinctions in the functions of NPs are not currently annotated; for example, genitives of possessive are not distinguished from other genitive NPs (as is done, for example, in the IcePaHC corpus).

When NPs are **conjoined**, all of the conjoined NPs include a dash tag indicating case; however, the **NX** tag used in some conjunctions does not bear any case or function tags. Any additional function dash tags should appear only on the **highest** NP of the structure.

At the moment, there is one **exception** to this rule: **WNPs** never bear case or function tags. This decision hinges on the assumption that WNPs are coindexed with a trace that bears any case or function tags they lack; however, it leaves certain WNPs (for example, complements of WPPs) unmarked for case. This convention will be adjusted in future revisions.


 * Possessive pronouns** - In the ENHG corpus, possessive pronouns do not project an NP phrase (as they do in the IcePaHC). Instead, as in the Penn Parsed Corpora of English, they are tagged PRO$. However, occasionally a possessive pronoun may conjoin with a genitive NP. In this case, the possessive pronoun may project an NP-GEN to facilitate conjunction.



=Verbs that take two accusatives=

In Modern German, a few verbs take two accusative objects. Duden lists the following:
 * LEHREN
 * KOSTEN
 * ABFRAGEN
 * ABHÖREN

Additionally, the ENHG corpus contains examples in which the following verbs take two accusatives:
 * FRAGEN
 * BITTEN

The verbs listed above are the only exceptions to the generalization that every clause should have only one accusative object (accusative measure phrases, subjects of small clauses, and so on are not included in this generalization).



**Impersonal experiencers**
In sentences with impersonal experiencers, as in MICH HUNGERT, the NP is NOT tagged NP-ACC-SBJ, but merely as NP-ACC, due to evidence that impersonal experiencers in German do not have subjectlike properties. As a result, these sentences are essentially treated as subjectless in the corpus.

 =A list of verbal particles=

The following appear as verbal particles in the corpus. Note that many of these words may also have other grammatical functions in different contexts.
 * AB/ABE
 * AN
 * AUFF
 * AUẞ/AUS
 * DAHER
 * DAHYN
 * DAR
 * DURCH
 * EMPOR
 * ENTGEGEN
 * ENTZWEY
 * ERAB
 * ERAUFF
 * ERAUẞ/ERAUS
 * ERBEY
 * ERFUR
 * ERNYDDER
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">EYN
 * FORT/FURT
 * FUR
 * FURVBER
 * HEIM/HEYM
 * HER
 * HERAB
 * HERVBER
 * HYN/HYNN
 * HYNAB/HYNABE
 * HYNAUFF
 * HYNAU<span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">ẞ/HYNAUS/HYNNAUS
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">HYNDURCH
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">HYNEYN/HYNNEYN
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">HYNVBER
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">HYNVNTER
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">HYNWEG
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">KUND
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">LIEB
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">LOẞ/LOS
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">MIT
 * <span style="font-family: arial,sans-serif; font-size: small; line-height: 15px;">NACH
 * NYDER/NYDDER
 * VBIR
 * VM/VMB
 * VMBHER
 * WEG
 * WIDER
 * WUNDER
 * ZU/TZU