load_dataset

load_dataset(dataset='small_table', tbl_type='polars')

Load a dataset hosted in the library as specified DataFrame type.

Parameters

dataset : Literal['small_table', 'game_revenue'] = 'small_table'

The name of the dataset to load. Current options are "small_table" and "game_revenue".

tbl_type : Literal['polars', 'pandas', 'duckdb'] = 'polars'

The type of DataFrame to generate from the dataset. The named options are "polars", "pandas", and "duckdb".

Returns

: FrameT | Any

The dataset for the Validate object. This could be a Polars DataFrame, a Pandas DataFrame, or a DuckDB table as an Ibis table.

Included Datasets

There are two included datasets that can be loaded using the load_dataset() function:

  • small_table: A small dataset with 13 rows and 8 columns. This dataset is useful for testing and demonstration purposes.
  • game_revenue: A dataset with 2000 rows and 11 columns. Provides revenue data for a game development company. For the particular game, there are records of player sessions, the items they purchased, ads viewed, and the revenue generated.

Supported DataFrame Types

The tbl_type= parameter can be set to one of the following:

  • "polars": A Polars DataFrame.
  • "pandas": A Pandas DataFrame.
  • "duckdb": An Ibis table for a DuckDB database.

Examples

Load the small_table dataset as a Polars DataFrame by calling load_dataset() with its defaults:

import pointblank as pb

small_table = pb.load_dataset()

pb.preview(small_table)
PolarsRows13Columns8
date_time
Datetime
date
Date
a
Int64
b
String
c
Int64
d
Float64
e
Boolean
f
String
1 2016-01-04 11:00:00 2016-01-04 2 1-bcd-345 3 3423.29 True high
2 2016-01-04 00:32:00 2016-01-04 3 5-egh-163 8 9999.99 True low
3 2016-01-05 13:32:00 2016-01-05 6 8-kdg-938 3 2343.23 True high
4 2016-01-06 17:23:00 2016-01-06 2 5-jdo-903 None 3892.4 False mid
5 2016-01-09 12:36:00 2016-01-09 8 3-ldm-038 7 283.94 True low
9 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 837.93 False high
10 2016-01-20 04:30:00 2016-01-20 3 5-bce-642 9 837.93 False high
11 2016-01-26 20:07:00 2016-01-26 4 2-dmx-010 7 833.98 True low
12 2016-01-28 02:51:00 2016-01-28 2 7-dmx-010 8 108.34 False low
13 2016-01-30 11:23:00 2016-01-30 1 3-dka-303 None 2230.09 True high

Note that the small_table dataset is a simple Polars DataFrame and using the preview() function will display the table in an HTML viewing environment.

The game_revenue dataset can be loaded as a Pandas DataFrame by specifying the dataset name and setting tbl_type="pandas":

import pointblank as pb

game_revenue = pb.load_dataset(dataset="game_revenue", tbl_type="pandas")

pb.preview(game_revenue)
PandasRows2000Columns11
player_id
object
session_id
object
session_start
datetime64[ns, UTC]
time
datetime64[ns, UTC]
item_type
object
item_name
object
item_revenue
float64
session_duration
float64
start_day
datetime64[ns]
acquisition
object
country
object
1 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:31:27+00:00 iap offer2 8.99 16.3 2015-01-01 00:00:00 google Germany
2 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:36:57+00:00 iap gems3 22.49 16.3 2015-01-01 00:00:00 google Germany
3 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:37:45+00:00 iap gold7 107.99 16.3 2015-01-01 00:00:00 google Germany
4 ECPANOIXLZHF896 ECPANOIXLZHF896-eol2j8bs 2015-01-01 01:31:03+00:00 2015-01-01 01:42:33+00:00 ad ad_20sec 0.76 16.3 2015-01-01 00:00:00 google Germany
5 ECPANOIXLZHF896 ECPANOIXLZHF896-hdu9jkls 2015-01-01 11:50:02+00:00 2015-01-01 11:55:20+00:00 ad ad_5sec 0.03 35.2 2015-01-01 00:00:00 google Germany
1996 NAOJRDMCSEBI281 NAOJRDMCSEBI281-j2vs9ilp 2015-01-21 01:57:50+00:00 2015-01-21 02:02:50+00:00 ad ad_survey 1.332 25.8 2015-01-11 00:00:00 organic Norway
1997 NAOJRDMCSEBI281 NAOJRDMCSEBI281-j2vs9ilp 2015-01-21 01:57:50+00:00 2015-01-21 02:22:14+00:00 ad ad_survey 1.35 25.8 2015-01-11 00:00:00 organic Norway
1998 RMOSWHJGELCI675 RMOSWHJGELCI675-vbhcsmtr 2015-01-21 02:39:48+00:00 2015-01-21 02:40:00+00:00 ad ad_5sec 0.03 8.4 2015-01-10 00:00:00 other_campaign France
1999 RMOSWHJGELCI675 RMOSWHJGELCI675-vbhcsmtr 2015-01-21 02:39:48+00:00 2015-01-21 02:47:12+00:00 iap offer5 26.09 8.4 2015-01-10 00:00:00 other_campaign France
2000 GJCXNTWEBIPQ369 GJCXNTWEBIPQ369-9elq67md 2015-01-21 03:59:23+00:00 2015-01-21 04:06:29+00:00 ad ad_5sec 0.12 18.5 2015-01-14 00:00:00 organic United States

The game_revenue dataset is a more real-world dataset with a mix of data types, and it’s significantly larger than the small_table dataset at 2000 rows and 11 columns.