API Reference
orbitalml
OrbitalML, translate scikit-learn pipelines into SQL queries
OrbitalML is a library for translating scikit-learn pipelines into SQL queries and Ibis expressions.
It provides a way to execute machine learning models on databases without the need for a python runtime environment.
orbitalml.ResultsProjection
Projection of the results of the pipeline.
This class is used to select the columns to be returned from the pipeline. It can be used to select specific columns to include in the final result set.
It can also be used to skip the select step of columns from the pipeline.
You can use the omit
method to skip the projection
step entirely.
Source code in orbitalml/translate.py
__init__
omit
classmethod
omit() -> ResultsProjection
orbitalml.parse_pipeline
parse_pipeline(
pipeline: Pipeline, features: FeaturesTypes
) -> ParsedPipeline
Parse a scikit-learn pipeline into an intermediate representation.
features
should be a mapping of column names that are the inputs of the
pipeline to their types from the :module:.types
module::
{
"column_name": types.DoubleColumnType(),
"another_column": types.Int64ColumnType()
}
Source code in orbitalml/ast.py
orbitalml.export_sql
export_sql(
table_name: str,
pipeline: ParsedPipeline,
dialect: str = "duckdb",
projection: ResultsProjection = ResultsProjection(),
optimize: bool = True,
) -> str
Export SQL for a given pipeline.
Given a orbitalml pipeline, this function generates a SQL query that can be used to execute the pipeline on a database. The generated SQL is compatible with the specified SQL dialect.
dialect
can be any of the SQL dialects supported by sqlglot,
see :class:sqlglot.dialects.DIALECTS
for a complete list of supported dialects.
If optimize
is set to True, the SQL query will be optimized using
sqlglot's optimizer. This can improve performance, but may fail if
the query is complex.
Source code in orbitalml/sql.py
orbitalml.ast
Translate scikit-learn models to an intermediate represetation.
The IR is what will be processed to generate the SQL queries.
orbitalml.ast.ParsedPipeline
An intermediate representation of a scikit-learn pipeline.
This object can be converted to a SQL query and run on a database. In can also be saved and loaded back in binary format to the sake of model distribution. Even though distributing the SQL query is usually more convenient.
Source code in orbitalml/ast.py
__init__
ParsedPipeline objects can only be created by the parse_pipeline function.
dump
dump(filename: str) -> None
Dump the parsed pipeline to a file.
Source code in orbitalml/ast.py
load
classmethod
load(filename: str) -> ParsedPipeline
Load a parsed pipeline from a file.
Source code in orbitalml/ast.py
orbitalml.ast.UnsupportedFormatVersion
Bases: Exception
Format of loaded pipeline is not supported.
This usually happens when trying to load a newer format version with an older version of the framework.
orbitalml.types
Data types of the features processed by models.
orbitalml.types.ColumnType
Bases: ABC
A base class representing the type of a column of data.
Source code in orbitalml/types.py
orbitalml.types.FloatColumnType
orbitalml.types.Float16ColumnType
orbitalml.types.DoubleColumnType
orbitalml.types.StringColumnType
orbitalml.types.Int64ColumnType
Bases: ColumnType
Mark a column as containing signed 64bit integer values
Source code in orbitalml/types.py
orbitalml.types.UInt64ColumnType
Bases: ColumnType
Mark a column as containing unsigned 64bit integer values
Source code in orbitalml/types.py
orbitalml.types.Int32ColumnType
Bases: ColumnType
Mark a column as containing signed 32bit integer values
Source code in orbitalml/types.py
orbitalml.types.UInt32ColumnType
Bases: ColumnType
Mark a column as containing unsigned 32bit integer values
Source code in orbitalml/types.py
orbitalml.types.Int16ColumnType
Bases: ColumnType
Mark a column as containing signed 16bit integer values
Source code in orbitalml/types.py
orbitalml.types.UInt16ColumnType
Bases: ColumnType
Mark a column as containing unsigned 16bit integer values
Source code in orbitalml/types.py
orbitalml.types.Int8ColumnType
Bases: ColumnType
Mark a column as containing signed 8bit integer values
Source code in orbitalml/types.py
orbitalml.types.UInt8ColumnType
Bases: ColumnType
Mark a column as containing unsigned 8bit integer values
Source code in orbitalml/types.py
orbitalml.types.BooleanColumnType
orbitalml.types.guess_datatypes
guess_datatypes(dataframe: Any) -> FeaturesTypes
Given a DataFrame, try to guess the types of each feature in it.
This procudes a :class:.FeaturesTypes
dictionary that can be used by
parse_pipeline to generate the SQL queries from the sklearn pipeline.
In most cases this shouldn't be necessary as the user should know on what data the pipeline was trained on, but it can be convenient when experimenting or writing tests.