Semantic Role Labeling - Mapped Semantic Role Labels (MSRL)

3.3 Mapped Semantic Role Labels (MSRL)

3.3.1 Semantic Role Labeling

The task. Semantic role labeling (SRL) is a well-established text processing task in which the goal is to mark up text with a predefined set of frames and frame elements, also called roles. A frame is defined [63] as any system of concepts (roles) related in such a way that to understand any one concept it is necessary to under-stand the entire system.

Examples of frames are Addiction, Annoyance, Attack, Drinking etc. The lat-ter, for instance, consists of roles Drinker, Fluid, Quantity, Container and perhaps others. There are also some roles that can be included in any frame, e.g. Loca-tion, Time, Frequency, Purpose and Manner. Not every occurrence of a frame in natural text needs fill all the roles; for example, the sentence “[DRINKER Paul] took a [TARGET sip] of [FLUID red wine] from [CONTAINER the tall glass] and nodded approvingly.”

omits the Quantity role as well as all target-nonspecific roles. Note that this and other examples represent the ideal, human-produced labels which can be very hard for algorithms to reproduce because of rich grammar or metaphors (“sip of wine”).

The previous sentence also illustrates the standard bracket notation for marking up frames in natural text: everything contained in square brackets is a role filler, i.e. a text fragment filling a specific frame role, which in turn is given in subscript in all caps. The special “[TARGET . . . ]” role is filled by the word that evokes/triggers the frame.

The target role of a frame is not necessarily filled by a verb; take for exam-ple the following BiologicalUrge frame: “[EXPERIENCER He] gave me a [TARGET tired]

[EXPRESSOR shrug].”

The three stages of SRL.The process of automatic SRL decomposes naturally into three stages: frame identification (“which frame is evoked by the sentence?”), boundary detection (“which sentence fragments are role fillers?”) and role identifi-cation (“what roles do the role fillers fill?”). Although these problems can be solved jointly, it is easier and computationally much more efficient to approach them sepa-rately. This does not affect performance: it is intuitively clear that syntactic context should suffice for frame identification, but surprisingly, performing boundary detec-tion and role identificadetec-tion jointly does not bring significant gains either [25, 94].

Our method thus performs each of the three stages separately as well.

Stage 1. For the frame identification task, we use a simple recall-oriented ap-proach. First, we make the standard assumption that frames do not extend over more than one sentence. We then consider the lemmatized version of every word w in a sentences. If, for any framef, the lemma w occurs inf’s list of trigger words,

we consider s to contain f. Some of these decisions are revoked at the later stages if no convincing role fillers are identified for f in s.

Stages 2 and 3. For role boundary detection, we first perform full constituency parsing of sentences using Charniak’s parser [95]. We then treat both remaining stages of SRL as classification tasks over the nodes of the parse tree.

Based on recommendations in existing work, we derive the following features for every node:

Lemma of the target word

Phrase type (= Penn Treebank tag of node)

Governing category (= parent node’s tag; helps distinguish subjects from ob-jects)

Path from target to node

Position relative to target (left/right)

Passive/active voice of sentence. A sentence is considered passive if its tree contains a path of the form AUX^VP_VP_VPN.

Lemma of node’s lexical headword. The head word is derived using widely adopted rules developed by Collins [96].

POS tag of node’s headword.

Verb subcategorization, i.e. the ordered list of children of VP immediately containing the node.

It has been shown that the choice of the classifier is not of critical importance;

however, support vector machines (SVMs) are one of the most appropriate choices [97, 30]. We use a linear SVM with C = _avg(|x|¹ 2) implemented in the svmlight⁵ toolset. The parameters are the defaults recommended by svmlight.

For stage 2 (role boundary detection) we use the above features and train a classifier on FrameNet’s annotated data to classify parse tree nodes as eitherroleor none. We then discard all nodes which are classified as none with high confidence.

The threshold was set so that the on a held-out set, the discarding process was estimated to retain 95% of the true role nodes. This significantly speeds up the role identification step and, also very important, greatly reduces class imbalance in the remaining data.

In stage 3 (role identification), we classify all the nodes remaining after the boundary detection stage into one of multiple classes: all the roles belonging to the frame andnoRole. There is no clear consensus in the community on the best way to perform multi-class classification in this case, so we follow the recommendation by Hacioglu [98] and use one-vs-all rather than pairwise classifiers or multi-class SVM.

When combining the votes, we operate under two classes of constraints: the soft, local, per-node constraints suggest that each node should be assigned the class voted for with the highest confidence. Global constraints require that a role appear

5http://svmlight.joachims.org

only once in a frame and that role fillers be strictly disjoint. We therefore employ a constrained greedy algorithm to assign roles. Votes for all nodes and all classes are sorted in descending order of confidence. They are then greedily assigned one by one;

if an assignment would violate either of the two aforementioned global constraints, we discard the vote.

Additionally, based on an observed algorithm bias towards selecting nodes further from the root of the tree, we adjust the votes somewhat before sorting. Let us denote byf(v, r) the confidence of vote for role r on nodev. If f(v, r)> f(v,noRole) and, for some child node v⁰ of v, it holds that f(v⁰, r) > f(v, r), then we set f(v, r) :=

f(v⁰, r).

Minor issues. To prepare training data, we map FrameNet’s annotations (based on word-level boundaries) onto parse tree nodes. In great majority of the cases, a perfect correspondence can be found; if, due to errors in parsing or due to a convoluted sentence structure, a perfect match does not exist, we map the role-filler annotation to the leftmost highest node in the tree which is completely contained in the annotation. Informal inspection shows that in English, this tends to preserve the semantic head of the role filler. Akin to most of the existing work, we build a separate set of classifiers for every frame. This could be improved by taking into account that some roles (e.g. Place, Time) are shared across frames.

In this work, we limit ourselves to frames that describe actions, e.g. Drinking but not BiologicalState. There are several reasons for this: action frames are more informative, map to Cyc more cleanly and have better annotation coverage in train-ing data. Action frames were identified by havtrain-ing at least one verb trigger word and not more than 10 times as many non-verb trigger words. Of those, we discard frames with no annotated sentences. By manual inspection, we discarded a further 20 frames deemed too generic or irrelevant (e.g. Undergoing with the definition “An Entity is affected by an Event.”). We are left with approximately 550 frames. In particular, the following frames were discarded:

Being active An [Agent] is described as pursuing an [Activity], expending some effort

Being operational An [Artifact], either a machine or a network of operations, is in a state ready to perform its intended function.

Change posture A [Protagonist] changes the overall position and posture of the body.

Change resistance An [Agent] changes a [Patient]’s ability to resist literal or figurative attack Difficulty An [Experiencer] has an easy or difficult time carrying out an [Activity]

Event An [Event] takes place at a [Place] and [Time].

Eventive cognizer affecting An [Event] causes the [Cognizer] to accept some [Content]

Existence An Entity is declared to exist, generally irrespective of its position or even the possi-bility of its position being specified

Experiencer obj Some phenomenon (the [Stimulus]) provokes a particular emotion in an [Ex-periencer].

Familiarity An [Entity] is presented as having been seen or experienced by a (typically generic and backgrounded) [Cognizer] on a certain number of occasions, causing the [Entity] to have a certain degree of recognizability for the [Cognizer].

Have associated A [Topical entity] has properties which are affected by the existence and asso-ciation of an [Entity].

Likelihood This frame is concerned with the likelihood of a [Hypothetical event] occurring Locative relation A [Figure] is located relative to a [Ground] location

Means An [Agent] makes use of a [Means] (either an action or a (system of) entities standing in for the action) in order to achieve a [Purpose].

Mental stimulus stimulus focus A [Stimulus] serves to bring about an emotion of mental stim-ulation in an [Experiencer].

Obviousness A [Phenomenon] is portrayed with respect to the [Degree] of likelihood that it will be perceived and known, given the (usually implicit) [Evidence], [Perceiver], and the [Circumstances] in which it is considered

Predicament An [Experiencer] is in an undesirable [Situation], whose [Cause] may also be ex-pressed.

Taking time An [Activity] takes some [Time length] to complete

Turning out A [State of affairs] turns out to be true in someone’s knowledge of the world Undergoing An [Entity] is affected by an [Event].

In document Mentorica:prof.dr.DunjaMladeni´cSomentor:prof.dr.JanezDemˇsar Semantiˇcnipristopihkonstrukcijidomenskihpredloginodkrivanjumnenjiznaravnegabesedila MITJATRAMPUˇS (Strani 56-59)