crf¶

class
LinearChainCrf
(num_labels, crf_lr=0.1, with_start_stop_tag=True)[source]¶ LinearChainCrf is a linear chain Conditional Random Field layer, it can implement sequential dependencies in the predictions. Therefore, it can take context into account whereas a classifier predicts a label for a single sample without considering “neighboring” samples. See https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers for reference.
 Parameters
num_labels (int) – The label number.
crf_lr (float, optional) – The crf layer learning rate. Defaults to
0.1
.with_start_stop_tag (bool, optional) – If set to True, the start tag and stop tag will be considered, the transitions params will be a tensor with a shape of
[num_labels+2, num_labels+2]
. Else, the transitions params will be a tensor with a shape of[num_labels, num_labels]
.

forward
(inputs, lengths)[source]¶ Computes the normalization in a linearchain CRF. See http://www.cs.columbia.edu/~mcollins/fb.pdf for reference.
\[ \begin{align}\begin{aligned}F & = logZ(x) = log\sum_y exp(score(x,y))\\score(x,y) & = \sum_i Emit(x_i,y_i) + Trans(y_{i1}, y_i)\\p(y_i) & = Emit(x_i,y_i), T(y_{i1}, y_i) = Trans(y_{i1}, y_i)\end{aligned}\end{align} \]then we can get:
\[F(1) = log\sum_{y1} exp(p(y_1) + T([START], y1))\]\[\begin{split}F(2) & = log\sum_{y1}\sum_{y2} exp(p(y_1) + T([START], y1) + p(y_2) + T(y_1,y_2)) \\ & = log\sum_{y2} exp(F(1) + p(y_2) + T(y_1,y_2))\end{split}\]Further, We can get F(n) is a recursive formula with F(n1).
 Parameters
inputs (Tensor) – The input predicted tensor. Its dtype is float32 and has a shape of
[batch_size, sequence_length, num_tags]
.lengths (Tensor) – The input length. Its dtype is int64 and has a shape of
[batch_size]
.
 Returns
Returns the normalizers tensor
norm_score
. Its dtype is float32 and has a shape of[batch_size]
. Return type
Tensor

gold_score
(inputs, labels, lengths)[source]¶ Computes the unnormalized score for a tag sequence. $$ score(x,y) = sum_i Emit(x_i,y_i) + Trans(y_{i1}, y_i) $$
 Parameters
inputs (Tensor) – The input predicted tensor. Its dtype is float32 and has a shape of
[batch_size, sequence_length, num_tags]
.labels (Tensor) – The input label tensor. Its dtype is int64 and has a shape of
[batch_size, sequence_length]
lengths (Tensor) – The input length. Its dtype is int64 and has a shape of
[batch_size]
.
 Returns
Returns the unnormalized sequence scores tensor
unnorm_score
. Its dtype is float32 and has a shape of[batch_size]
. Return type
Tensor

class
LinearChainCrfLoss
(crf)[source]¶ The negative loglikelihood for linear chain Conditional Random Field (CRF).
 Parameters
crf (LinearChainCrf) – The
LinearChainCrf
network object. Its parameter will be used to calculate the loss.

forward
(inputs, lengths, labels, old_version_labels=None)[source]¶ Calculate the crf loss. Let $$ Z(x) = sum_{y’}exp(score(x,y’)) $$, means the sum of all path scores, then we have $$ loss = logp(yx) = log(exp(score(x,y))/Z(x)) = score(x,y) + logZ(x) $$
 Parameters
inputs (Tensor) – The input predicted tensor. Its dtype is float32 and has a shape of
[batch_size, sequence_length, num_tags]
.lengths (Tensor) – The input length. Its dtype is int64 and has a shape of
[batch_size]
.labels (Tensor) – The input label tensor. Its dtype is int64 and has a shape of
[batch_size, sequence_length]
old_version_labels (Tensor, optional) – Unnecessary parameter for compatibility with older versions. Defaults to
None
.
 Returns
The crf loss. Its dtype is float32 and has a shape of
[batch_size]
. Return type
Tensor

class
ViterbiDecoder
(transitions, with_start_stop_tag=True)[source]¶ ViterbiDecoder can decode the highest scoring sequence of tags, it should only be used at test time.
 Parameters
transitions (Tensor) – The transition matrix. Its dtype is float32 and has a shape of
[num_tags, num_tags]
.with_start_stop_tag (bool, optional) – If set to True, the last row and the last column of transitions will be considered as start tag, the the penultimate row and the penultimate column of transitions will be considered as stop tag. Else, all the rows and columns will be considered as the real tag. Defaults to
None
.

forward
(inputs, lengths)[source]¶ Decode the highest scoring sequence of tags.
 Parameters
inputs (Tensor) – The unary emission tensor. Its dtype is float32 and has a shape of
[batch_size, sequence_length, num_tags]
.length (Tensor) – The input length tensor storing real length of each sequence for correctness. Its dtype is int64 and has a shape of
[batch_size]
.
 Returns
Returns tuple (scores, paths). The
scores
tensor containing the score for the Viterbi sequence. Its dtype is float32 and has a shape of[batch_size]
. Thepaths
tensor containing the highest scoring tag indices. Its dtype is int64 and has a shape of[batch_size, sequence_length]
. Return type
tuple