import pointblank as pb
game_revenue_polars = pb.load_dataset("game_revenue")
pb.get_row_count(game_revenue_polars)2000
functionGet the number of rows in a table.
USAGE
The get_row_count() function returns the number of rows in a table. The function works with any table that is supported by the pointblank library, including Pandas, Polars, and Ibis backend tables (e.g., DuckDB, MySQL, PostgreSQL, SQLite, Parquet, etc.). It also supports direct input of CSV files, Parquet files, and database connection strings.
data : FrameT | AnyThe table for which to get the row count, which could be a DataFrame object, an Ibis table object, a CSV file path, a Parquet file path, or a database connection string. Read the Supported Input Table Types section for details on the supported table types.
intThe number of rows in the table.
The data= parameter can be given any of the following table types:
"polars")"pandas")"pyspark")"duckdb")*"mysql")*"postgresql")*"sqlite")*"mssql")*"snowflake")*"databricks")*"bigquery")*"parquet")*pathlib.Path object with .csv extension)pathlib.Path object, glob pattern, directory with .parquet extension, or partitioned dataset)The table types marked with an asterisk need to be prepared as Ibis tables (with type of ibis.expr.types.relations.Table). Furthermore, using get_row_count() with these types of tables requires the Ibis library (v9.5.0 or above) to be installed. If the input table is a Polars or Pandas DataFrame, the availability of Ibis is not needed.
To use a CSV file, ensure that a string or pathlib.Path object with a .csv extension is provided. The file will be automatically detected and loaded using the best available DataFrame library. The loading preference is Polars first, then Pandas as a fallback.
GitHub URLs pointing to CSV or Parquet files are automatically detected and converted to raw content URLs for downloading. The URL format should be: https://github.com/user/repo/blob/branch/path/file.csv or https://github.com/user/repo/blob/branch/path/file.parquet
Connection strings follow database URL formats and must also specify a table using the ::table_name suffix. Examples include:
"duckdb:///path/to/database.ddb::table_name"
"sqlite:///path/to/database.db::table_name"
"postgresql://user:password@localhost:5432/database::table_name"
"mysql://user:password@localhost:3306/database::table_name"
"bigquery://project/dataset::table_name"
"snowflake://user:password@account/database/schema::table_name"
When using connection strings, the Ibis library with the appropriate backend driver is required.
Getting the number of rows in a table is easily done by using the get_row_count() function. Here’s an example using the game_revenue dataset (itself loaded using the load_dataset() function):
2000
This table is a Polars DataFrame, but the get_row_count() function works with any table supported by pointblank, including Pandas DataFrames and Ibis backend tables. Here’s an example using a DuckDB table handled by Ibis:
2000
The get_row_count() function can directly accept CSV file paths:
The function supports various Parquet input formats:
336776
You can also use glob patterns and directories:
The function supports database connection strings for direct access to database tables:
2000
The function always returns the number of rows in the table as an integer value, which is 2000 for the game_revenue dataset.