SentencePieceProcessorLike#

class penzai.toolshed.token_visualization.SentencePieceProcessorLike[source]#

Bases: Protocol

Protocol defining the methods we need from a tokenizer.

Sentencepiece tokenizers conform to this interface, but anything else that implements it will also work.

Methods

GetPieceSize()

Returns the number of tokens in the vocabulary.

IdToPiece(id)

Decodes a token ID to a string.

IsControl(id)

Identifies whether a token is a control token.

__init__(*args, **kwargs)

GetPieceSize() int[source]#

Returns the number of tokens in the vocabulary.

IdToPiece(id: int) str[source]#

Decodes a token ID to a string.

IsControl(id: int) bool[source]#

Identifies whether a token is a control token.