crf#

class LinearChainCrf(num_labels, crf_lr=0.1, with_start_stop_tag=True)[source]#

LinearChainCrf is a linear chain Conditional Random Field layer, it can implement sequential dependencies in the predictions. Therefore, it can take context into account whereas a classifier predicts a label for a single sample without considering “neighboring” samples. See https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers for reference.

Parameters:

num_labels (int) – The label number.
crf_lr (float, optional) – The crf layer learning rate. Defaults to 0.1.
with_start_stop_tag (bool, optional) – If set to True, the start tag and stop tag will be considered, the transitions params will be a tensor with a shape of [num_labels+2, num_labels+2]. Else, the transitions params will be a tensor with a shape of [num_labels, num_labels].

forward(inputs, lengths)[source]#

Computes the normalization in a linear-chain CRF. See http://www.cs.columbia.edu/~mcollins/fb.pdf for reference.

\[ \begin{align}\begin{aligned}F & = logZ(x) = log\sum_y exp(score(x,y))\\score(x,y) & = \sum_i Emit(x_i,y_i) + Trans(y_{i-1}, y_i)\\p(y_i) & = Emit(x_i,y_i), T(y_{i-1}, y_i) = Trans(y_{i-1}, y_i)\end{aligned}\end{align} \]

then we can get:

\[F(1) = log\sum_{y1} exp(p(y_1) + T([START], y1))\]

\[\begin{split}F(2) & = log\sum_{y1}\sum_{y2} exp(p(y_1) + T([START], y1) + p(y_2) + T(y_1,y_2)) \\ & = log\sum_{y2} exp(F(1) + p(y_2) + T(y_1,y_2))\end{split}\]

Further, We can get F(n) is a recursive formula with F(n-1).

Parameters:

inputs (Tensor) – The input predicted tensor. Its dtype is float32 and has a shape of [batch_size, sequence_length, num_tags].
lengths (Tensor) – The input length. Its dtype is int64 and has a shape of [batch_size].

Returns:

Returns the normalizers tensor norm_score. Its dtype is float32 and has a shape of [batch_size].

Return type:

Tensor

gold_score(inputs, labels, lengths)[source]#

Computes the unnormalized score for a tag sequence. $$ score(x,y) = sum_i Emit(x_i,y_i) + Trans(y_{i-1}, y_i) $$

Parameters:

inputs (Tensor) – The input predicted tensor. Its dtype is float32 and has a shape of [batch_size, sequence_length, num_tags].
labels (Tensor) – The input label tensor. Its dtype is int64 and has a shape of [batch_size, sequence_length]
lengths (Tensor) – The input length. Its dtype is int64 and has a shape of [batch_size].

Returns:

Returns the unnormalized sequence scores tensor unnorm_score. Its dtype is float32 and has a shape of [batch_size].

Return type:

Tensor

class LinearChainCrfLoss(crf)[source]#

The negative log-likelihood for linear chain Conditional Random Field (CRF).

Parameters:: crf (LinearChainCrf) – The LinearChainCrf network object. Its parameter will be used to calculate the loss.

forward(inputs, lengths, labels, old_version_labels=None)[source]#

Calculate the crf loss. Let $$ Z(x) = sum_{y’}exp(score(x,y’)) $$, means the sum of all path scores, then we have $$ loss = -logp(y|x) = -log(exp(score(x,y))/Z(x)) = -score(x,y) + logZ(x) $$

Parameters:

inputs (Tensor) – The input predicted tensor. Its dtype is float32 and has a shape of [batch_size, sequence_length, num_tags].
lengths (Tensor) – The input length. Its dtype is int64 and has a shape of [batch_size].
labels (Tensor) – The input label tensor. Its dtype is int64 and has a shape of [batch_size, sequence_length]
old_version_labels (Tensor, optional) – Unnecessary parameter for compatibility with older versions. Defaults to None.

Returns:

The crf loss. Its dtype is float32 and has a shape of [batch_size].

Return type:

Tensor

class ViterbiDecoder(transitions, with_start_stop_tag=True)[source]#

ViterbiDecoder can decode the highest scoring sequence of tags, it should only be used at test time.

Parameters:

transitions (Tensor) – The transition matrix. Its dtype is float32 and has a shape of [num_tags, num_tags].
with_start_stop_tag (bool, optional) – If set to True, the last row and the last column of transitions will be considered as start tag, the penultimate row and the penultimate column of transitions will be considered as stop tag. Else, all the rows and columns will be considered as the real tag. Defaults to None.

forward(inputs, lengths)[source]#

Decode the highest scoring sequence of tags.

Parameters:

inputs (Tensor) – The unary emission tensor. Its dtype is float32 and has a shape of [batch_size, sequence_length, num_tags].
length (Tensor) – The input length tensor storing real length of each sequence for correctness. Its dtype is int64 and has a shape of [batch_size].

Returns:

Returns tuple (scores, paths). The scores tensor containing the score for the Viterbi sequence. Its dtype is float32 and has a shape of [batch_size]. The paths tensor containing the highest scoring tag indices. Its dtype is int64 and has a shape of [batch_size, sequence_length].

Return type:

tuple

crf

Contents

crf#