Imdb

class paddle.text. Imdb ( data_file: str | None = None, mode: _ImdbDataSetMode = 'train', cutoff: int = 150, download: bool = True ) [source]

Implementation of IMDB dataset.

Parameters
  • data_file (str|None) – path to data tar file, can be set None if download is True. Default None.

  • mode (str) – ‘train’ ‘test’ mode. Default ‘train’.

  • cutoff (int) – cutoff number for building word dictionary. Default 150.

  • download (bool) – whether to download dataset automatically if data_file is not set. Default True.

Returns

instance of IMDB dataset

Return type

Dataset

Examples

>>> 
>>> import paddle
>>> from paddle.text.datasets import Imdb

>>> class SimpleNet(paddle.nn.Layer):
...     def __init__(self):
...         super().__init__()
...
...     def forward(self, doc, label):
...         return paddle.sum(doc), label


>>> imdb = Imdb(mode='train')

>>> for i in range(10):
...     doc, label = imdb[i]
...     doc = paddle.to_tensor(doc)
...     label = paddle.to_tensor(label)
...
...     model = SimpleNet()
...     image, label = model(doc, label)
...     print(doc.shape, label.shape)
paddle.Size([121]) paddle.Size([1])
paddle.Size([115]) paddle.Size([1])
paddle.Size([386]) paddle.Size([1])
paddle.Size([471]) paddle.Size([1])
paddle.Size([585]) paddle.Size([1])
paddle.Size([206]) paddle.Size([1])
paddle.Size([221]) paddle.Size([1])
paddle.Size([324]) paddle.Size([1])
paddle.Size([166]) paddle.Size([1])
paddle.Size([598]) paddle.Size([1])