float_field()`function`

Create a floating-point column specification for use in a schema.

USAGE

float_field(
    min_val=None,
    max_val=None,
    allowed=None,
    nullable=False,
    null_probability=0.0,
    unique=False,
    generator=None,
    dtype='Float64',
)

The float_field() function defines the constraints and behavior for a floating-point column when generating synthetic data with generate_dataset(). You can control the range of values with min_val= and max_val=, restrict values to a specific set with allowed=, enforce uniqueness with unique=True, and introduce null values with nullable=True and null_probability=. The dtype= parameter lets you choose between "Float32" and "Float64" precision.

When both min_val= and max_val= are provided, values are drawn from a uniform distribution across that range. If neither is specified, values are drawn uniformly from a large default range. If allowed= is provided, values are sampled from that specific list.

Parameters

min_val : float | None = None: Minimum value (inclusive). Default is None (no minimum).
max_val : float | None = None: Maximum value (inclusive). Default is None (no maximum).
allowed : list[float] | None = None: List of allowed values (categorical constraint). When provided, values are sampled from this list. Cannot be combined with min_val=/max_val=.
nullable : bool = False: Whether the column can contain null values. Default is False.
null_probability : float = 0.0: Probability of generating a null value for each row when nullable=True. Must be between 0.0 and 1.0. Default is 0.0.
unique : bool = False: Whether all values must be unique. Default is False. When True, the generator will retry until it produces n distinct values.
generator : Callable[[], Any] | None = None: Custom callable that generates values. When provided, this overrides all other constraints. The callable should take no arguments and return a single float value.
dtype : str = 'Float64': Float dtype. Default is "Float64". Options: "Float32", "Float64".

Returns

FloatField: A float field specification that can be passed to Schema().

Raises

: ValueError: If min_val is greater than max_val, if allowed is an empty list, if null_probability is not between 0.0 and 1.0, or if dtype is not a valid float type.

Examples

The min_val= and max_val= parameters define the generated value ranges:

import pointblank as pb

schema = pb.Schema(
    price=pb.float_field(min_val=0.01, max_val=9999.99),
    probability=pb.float_field(min_val=0.0, max_val=1.0),
    temperature=pb.float_field(min_val=-40.0, max_val=50.0),
)

pb.preview(pb.generate_dataset(schema, n=100, seed=23))

	price Float64	probability Float64	temperature Float64
PolarsRows100Columns3
1	9248.64401895442	0.9248652516259452	43.23787264633508
2	9486.04880781621	0.9486057779931771	45.37452001938594
3	8924.325591818912	0.8924333440485793	40.31900096437214
4	835.5150972932996	0.08355067683068362	-32.48043908523847
5	5920.270428312815	0.5920272268857353	13.282450419716177
96	4446.926385790886	0.4446925279641446	0.022327516773010814
97	3427.7653590611476	0.3427762214585577	-9.150140068729808
98	8923.280842563525	0.8923288689140904	40.309598202268134
99	8137.5531808932155	0.8137559456012128	33.238035104109144
100	8951.80870117522	0.8951816604808429	40.56634944327587

It’s also possible to restrict values to a discrete set with allowed=, which is useful for fixed pricing tiers or measurement levels:

schema = pb.Schema(
    discount=pb.float_field(allowed=[0.05, 0.10, 0.15, 0.20, 0.25]),
    weight_kg=pb.float_field(min_val=0.5, max_val=100.0),
)

pb.preview(pb.generate_dataset(schema, n=50, seed=23))

	discount Float64	weight_kg Float64
PolarsRows50Columns2
1	0.15	92.52409253678155
2	0.05	94.88627491032112
3	0.05	89.29711773283364
4	0.25	8.813292344653021
5	0.15	59.406709075130664
46	0.25	27.918663919265157
47	0.2	57.49577854139957
48	0.1	82.15598649681618
49	0.1	33.41508237533323
50	0.1	37.28056623460687

We can simulate missing measurements by introducing null values:

schema = pb.Schema(
    reading=pb.float_field(
        min_val=0.0, max_val=500.0,
        nullable=True, null_probability=0.2,
    ),
    calibration=pb.float_field(min_val=0.9, max_val=1.1),
)

pb.preview(pb.generate_dataset(schema, n=30, seed=7))

	reading Float64	calibration Float64
PolarsRows30Columns2
1	161.91638241658117	0.9647665529666325
2	75.42458696225096	0.9301698347849005
3	None	1.0301868946079709
4	36.21814333377138	0.9144872573335086
5	None	1.007176400861338
26	58.89611903918418	0.9235584476156737
27	154.24091205096718	0.9616963648203869
28	408.0631795600157	1.0632252718240063
29	90.36318996196874	0.9361452759847875
30	290.8000818312331	1.0163200327324933

Setting dtype="Float32" gives reduced precision, and a custom generator= provides full control over value generation:

import random, math

rng = random.Random(0)

schema = pb.Schema(
    sensor_value=pb.float_field(min_val=-10.0, max_val=10.0, dtype="Float32"),
    log_value=pb.float_field(generator=lambda: math.log(rng.uniform(1, 1000))),
)

pb.preview(pb.generate_dataset(schema, n=20, seed=99))

	sensor_value Float64	log_value Float64
PolarsRows20Columns2
1	-1.9204385011266734	6.738836419047254
2	-5.998491108501092	6.630942519000257
3	-6.4239535882677545	6.042991461173114
4	-5.031373629980624	5.559364739458459
5	5.197548730161559	6.237862500009073
16	-2.520404335509925	5.526471683068103
17	-2.237762978450628	6.813264923713322
18	3.647376330582926	6.890408378292458
19	-6.95446931399654	6.697536613756536
20	3.2113579182328227	6.804906921310479