• Rezultati Niso Bili Najdeni

The described explanation methods create unnatural input perturbations (not from the true data distribution), which we propose to replace with more natural perturbations (closer to the data distribution), created with a text generator. In this section, we provide background on text generation: in the first section, we describe language modeling, and in the second, we describe BERT, a specific language model we use.

3.2.1 Language modeling

Language modeling is a task in which the goal is to estimate the probability of observing a text sequence (e.g., a word or a sentence) or a probability of observing a text sequence after another text sequence. It is an important component in many applications, such as machine translation or paraphrase generation. Formally, given a sequence of wordss=w1, w2, . . . , wn, the task is to compute

P(s) =P(w1, . . . , wn) =P(w1)·P(w2|w1)·. . .·P(wn|w1. . . wn−1). (3.6) The prediction for each word is conditioned on its preceding words. In order to learn a good estimate of the probabilities, language models are trained using large corpora containing diverse texts. A trained language model can generate new sequences by sampling from the estimated probability distri-butions [62].

Traditional language modeling is limited to approaches conditioning only on the left context. A modification of language modeling, which uses parts of both the left and right context is called masked language modeling (MLM) [30]2. Given a partially hidden sequence, the goal of MLM is to estimate the

2While the task was named “masked language modeling” by Devlin et al., it is also known as “gap fill” or “Cloze task” [63].

3.2. TEXT GENERATION 19

probability of the hidden words given the visible words. Unlike the tradi-tional language model, a trained masked language model can not generate arbitrary new sequences out of the box but can instead reformulate an exist-ing sequence.

To have finer control on the modeled distribution, the language model can be conditioned on other signals in addition to the preceding and succeeding words. This enables the model to perform controlled text generation, e.g., the generation of examples that have a specific sentiment. While this provides more control over the generated text, the training process is no longer self-supervised, but self-supervised, as the label of the text needs to be provided.

In our work, we use a masked language model called BERT to generate more natural text perturbations, both in a masked language modeling as well as in a controlled masked language modeling (CMLM) setting.

3.2.2 BERT

BERT (Bidirectional Encoder Representations from Transformers) [30] is a bidirectional transformer-based [64] masked language model.

It is built from multiple transformer encoder layers, where each encoder layer is composed of a multi-head self-attention mechanism and a fully con-nected network. It operates on subword inputs, using a WordPiece vocabu-lary [65] to encode them.

As a result of a customized training objective, it is able to capture both the left and right context of a word. Instead of being optimized on the language modeling task, it is optimized on two tasks jointly:

• Masked language modeling. Given a partially masked sequence with a portion of words hidden, the model is trained to fill in the gaps correctly.

• Next sentence prediction. Given two sentences, the model is trained to predict whether the sentences are adjacent.

The training of BERT models is divided into two categories: pre-training and fine-tuning. During pre-training, the masked language modeling and next sentence prediction tasks are optimized on a large corpus of text in an unsupervised manner. During fine-tuning, a new layer (typically fully connected) is added on top of the pre-trained model and trained for a down-stream task, such as hate-speech classification.

Chapter 4 Methods

In this chapter, we describe the proposed modifications of the explanation methods. In Section 4.1, we describe the inclusion of language models as text generators into the explanation methods, and in Section 4.2, we de-scribe the calculation of explanations for text units longer than words, using dependency trees.

4.1 Explanation methods with text genera-tors

In this section, we describe the modified explanation methods which use language models to generate natural text perturbations: first, we describe the modifications in LIME, and then the modifications for IME.

4.1.1 LIME

To include language models into LIME, we first modify the definition of its interpretable representation. As described in Section 3.1.1, textual LIME uses a binary vector as an interpretable representation of the explained in-stance, where 1 indicates the original word is present in the sequence, and 0

21

indicates it is absent (replaced with a dummy word). The modified version of LIME keeps this binary representation, but simulates the absence of a word by replacing it with a word generated using a language model. When using a language model to re-generate the word, it may generate the original word, which violates the definition of the binary interpretable representation. To prevent this, a uniqueness constraint is imposed on the output of the lan-guage model, meaning the lanlan-guage model cannot replace the original word in the sequence with the same word. In practice, this is achieved by setting the probability of generating the original word to 0 and re-normalizing the output distribution using the softmax function. The modified process to ob-tain input perturbations is shown in comparison to the process used in LIME in Figure 4.1. Apart from the modified process to obtain the perturbations, the explanation method remains unchanged from the original LIME method.

To fully describe the modified process of creating perturbations, the user of the method chooses the generation strategy. The strategy determines how to construct the LIME’s non-interpretable representation (i.e., the actual text) given a binary interpretable representation, which indicates the words that are fixed and the words that need to be re-generated. Stated differently, it defines the local neighbourhood of the explained instance, in which the behaviour of the model is approximated with a surrogate model. In our work, we re-generate the words one by one in a left-to-right order using greedy decoding with dropout [66]. We only “hide” (mask) one word at a time in order to allow the generator to use the most context available. For example, if we need to re-generate two words, we first mask the first word and re-generate it, then mask the second word and re-generate it. Greedy decoding selects the replacement word as the one with the highest assigned probability, while dropout randomly disables (i.e., sets to 0) a portion of the language model weights and introduces variance into the distribution of replacement words with the goal to make the generated text more diverse [67].

The settings of this generation strategy were selected during development

4.1. EXPLANATION METHODS WITH TEXT GENERATORS 23

0 This show stars Peter Capaldi

This show needs more action good was

food so

The

This show is not good This show is very good This show PAD PAD PAD This show

good

This show is good

This show is very good

PAD

Figure 4.1: A comparison between the creation of perturbations in LIME and its modified version when computing an explanation for the instance

“The show is very good”. In both cases, the interpretable representation of LIME is first determined randomly. Then, instead of replacing the absent words (on positions marked with 0 in the interpretable representation) with a dummy word, the modified version replaces them with words generated by a language model.

and we use them as reasonable defaults. We leave the detailed exploration of alternative strategies and their effect on the produced explanations for further work.

4.1.2 IME

As in the modified version of LIME, the only difference in the modified version of IME is in the process of creating perturbations.

In IME, the perturbations are pairs of examples that are used to estimate the difference in the prediction of a model when we know the value of the i-th feature, and when we do not, across all possible subsets. For a specific feature subset Q, the first example in the perturbation is created by fixing the wordsQ∪ {i} and the second example is created by fixing the words Q, while replacing the remaining words with words from a random example in the sampling dataset. Because IME assumes feature independence, the words that are not fixed are replaced randomly without taking the fixed words into account. We propose two modified versions of IME. The first strongly relaxes the assumption of feature independence. The second keeps the assumption, but generates a sampling dataset which reduces the effect of the assumption on the created perturbations. Both versions of the modified process for cre-ating perturbations are shown in contrast to the original process in Figure 4.2 and described next.

The first version replaces the words which are not fixed randomly ac-cording to an output distribution of a language model conditioned on the remaining context. This means that instead of selecting the words blindly, they are selected to fit into the context. As currently defined, the method re-moves the feature independence assumption and generates replacement words twice per sample, once by assuming the wordsQ∪ {i} are fixed and once by assuming the words Qare fixed. However, as the context only differs by one word, we generate the replacement words only once per sample, by assum-ing the words Q are fixed. We make this assumption in order to reduce the

4.1. EXPLANATION METHODS WITH TEXT GENERATORS 25

sampling dataset X overbearing and over the top the promise of digital filmmaking

half an hour too long a technically superb film .

This and is the top

Q2 = {This, is}

This and is the good

This show is very long

Q1 = {This, show, is, very}

This show is very good

sampling dataset X the soup is too salty this is an absolute disgrace this show is incredibly boring

that was not very good This show is incredibly boring

Q2 = {This, is}

This show is incredibly good

This show is very salty

Q1 = {This, show, is, very}

This show is very good

This show is very good language

This showis very dark This show is very good

This show is verygood

language model

This reviewis not important

This review is not important

This review is not good

This show is very dark

This show is very good

IMEIME with an internal LMIME with an external LM

Figure 4.2: A comparison between the creation of perturbations in IME and its modified versions when computing the importance of the word “good” in the instance “The show is very good”. Each sample is made of two examples:

one where the words Qi are fixed (locked) and one where the words Qi ∪ {“good”} are fixed. The original IME (top row) replaces the words that are not fixed with those in a random example from the sampling dataset. IME with an internal LM (middle row) generates the words that are not fixed with a language model. IME with an external LM (bottom row) generates a new sampling dataset with variations of the explained instance, which is then used the same way as original IME.

computational cost of the modified method: by making this assumption, the method makes only a half of the generation calls it would otherwise make.

However, the assumption can be removed in case of a lightweight language model or an efficient generation strategy. Because this method uses a lan-guage model inside the explanation method, we refer to it as IME with an internal LM (IME + iLM).

The second version uses a language model to generate a dataset of ex-amples that are similar to the explained instance, which is then used as a sampling dataset in the original version of IME. Stated differently, the lan-guage model is used to generate the set X in Equation 3.5. To do this, we re-generate the words in the explained instance |X| times, where |X| is the size of the generated dataset, specified by the user. We re-generate all the words instead of randomly choosing which words should be fixed and which not because the latter process is done in IME, which uses the generated dataset. The motivation for this method is to further reduce the computa-tional cost of using a generator while creating more natural perturbations than in the original IME. The reduction in computational cost can either come from generating a dataset X whose size is smaller than the maximum number of samples used in IME, or generating the dataset in advance and benefiting from the batched generation. In contrast to the first modified ver-sion, this method uses a language model outside the explanation method, so we refer to it as IME with an external LM (IME + eLM).

Both modified versions require the user to specify a generation strategy.

We use the same generation strategy as in the modified version of LIME for IME with an internal LM. In IME with an external LM, we shuffle the generation order of words instead of generating them in left-to-right. The goal is to obtain more variations of naturally occurring text in the generated samples.