Create a floating-point column specification for use in a schema.
float_field(
min_val=None,
max_val=None,
allowed=None,
nullable=False,
null_probability=0.0,
unique=False,
generator=None,
dtype="Float64"
)
The float_field() function defines the constraints and behavior for a floating-point column when generating synthetic data with generate_dataset(). You can control the range of values with min_val= and max_val=, restrict values to a specific set with allowed=, enforce uniqueness with unique=True, and introduce null values with nullable=True and null_probability=. The dtype= parameter lets you choose between "Float32" and "Float64" precision.
When both min_val= and max_val= are provided, values are drawn from a uniform distribution across that range. If neither is specified, values are drawn uniformly from a large default range. If allowed= is provided, values are sampled from that specific list.
Parameters
min_val: float | None = None
-
Minimum value (inclusive). Default is None (no minimum).
max_val: float | None = None
-
Maximum value (inclusive). Default is None (no maximum).
allowed: list[float] | None = None
-
List of allowed values (categorical constraint). When provided, values are sampled from this list. Cannot be combined with min_val=/max_val=.
nullable: bool = False
-
Whether the column can contain null values. Default is False.
null_probability: float = 0.0
-
Probability of generating a null value for each row when nullable=True. Must be between 0.0 and 1.0. Default is 0.0.
unique: bool = False
-
Whether all values must be unique. Default is False. When True, the generator will retry until it produces n distinct values.
generator: Callable[[], Any] | None = None
-
Custom callable that generates values. When provided, this overrides all other constraints. The callable should take no arguments and return a single float value.
dtype: str = "Float64"
-
Float dtype. Default is
"Float64". Options: "Float32", "Float64".
Returns
FloatField
-
A float field specification that can be passed to Schema().
Raises
ValueError
-
If
min_val is greater than max_val, if allowed is an empty list, if null_probability is not between 0.0 and 1.0, or if dtype is not a valid float type.
Examples
The min_val= and max_val= parameters define the generated value ranges:
import pointblank as pb
schema = pb.Schema(
price=pb.float_field(min_val=0.01, max_val=9999.99),
probability=pb.float_field(min_val=0.0, max_val=1.0),
temperature=pb.float_field(min_val=-40.0, max_val=50.0),
)
pb.preview(pb.generate_dataset(schema, n=100, seed=23))
|
|
|
|
|
| 1 |
9248.64401895442 |
0.9248652516259452 |
43.23787264633508 |
| 2 |
9486.04880781621 |
0.9486057779931771 |
45.37452001938594 |
| 3 |
8924.325591818912 |
0.8924333440485793 |
40.31900096437214 |
| 4 |
835.5150972932996 |
0.08355067683068362 |
-32.48043908523847 |
| 5 |
5920.270428312815 |
0.5920272268857353 |
13.282450419716177 |
| 96 |
4446.926385790886 |
0.4446925279641446 |
0.022327516773010814 |
| 97 |
3427.7653590611476 |
0.3427762214585577 |
-9.150140068729808 |
| 98 |
8923.280842563525 |
0.8923288689140904 |
40.309598202268134 |
| 99 |
8137.5531808932155 |
0.8137559456012128 |
33.238035104109144 |
| 100 |
8951.80870117522 |
0.8951816604808429 |
40.56634944327587 |
It’s also possible to restrict values to a discrete set with allowed=, which is useful for fixed pricing tiers or measurement levels:
schema = pb.Schema(
discount=pb.float_field(allowed=[0.05, 0.10, 0.15, 0.20, 0.25]),
weight_kg=pb.float_field(min_val=0.5, max_val=100.0),
)
pb.preview(pb.generate_dataset(schema, n=50, seed=23))
|
|
|
|
| 1 |
0.15 |
92.52409253678155 |
| 2 |
0.05 |
94.88627491032112 |
| 3 |
0.05 |
89.29711773283364 |
| 4 |
0.25 |
8.813292344653021 |
| 5 |
0.15 |
59.406709075130664 |
| 46 |
0.25 |
27.918663919265157 |
| 47 |
0.2 |
57.49577854139957 |
| 48 |
0.1 |
82.15598649681618 |
| 49 |
0.1 |
33.41508237533323 |
| 50 |
0.1 |
37.28056623460687 |
We can simulate missing measurements by introducing null values:
schema = pb.Schema(
reading=pb.float_field(
min_val=0.0, max_val=500.0,
nullable=True, null_probability=0.2,
),
calibration=pb.float_field(min_val=0.9, max_val=1.1),
)
pb.preview(pb.generate_dataset(schema, n=30, seed=7))
|
|
|
|
| 1 |
161.91638241658117 |
0.9647665529666325 |
| 2 |
75.42458696225096 |
0.9301698347849005 |
| 3 |
None |
1.0301868946079709 |
| 4 |
36.21814333377138 |
0.9144872573335086 |
| 5 |
None |
1.007176400861338 |
| 26 |
58.89611903918418 |
0.9235584476156737 |
| 27 |
154.24091205096718 |
0.9616963648203869 |
| 28 |
408.0631795600157 |
1.0632252718240063 |
| 29 |
90.36318996196874 |
0.9361452759847875 |
| 30 |
290.8000818312331 |
1.0163200327324933 |
Setting dtype="Float32" gives reduced precision, and a custom generator= provides full control over value generation:
import random, math
rng = random.Random(0)
schema = pb.Schema(
sensor_value=pb.float_field(min_val=-10.0, max_val=10.0, dtype="Float32"),
log_value=pb.float_field(generator=lambda: math.log(rng.uniform(1, 1000))),
)
pb.preview(pb.generate_dataset(schema, n=20, seed=99))
|
|
|
|
| 1 |
-1.9204385011266734 |
6.738836419047254 |
| 2 |
-5.998491108501092 |
6.630942519000257 |
| 3 |
-6.4239535882677545 |
6.042991461173114 |
| 4 |
-5.031373629980624 |
5.559364739458459 |
| 5 |
5.197548730161559 |
6.237862500009073 |
| 16 |
-2.520404335509925 |
5.526471683068103 |
| 17 |
-2.237762978450628 |
6.813264923713322 |
| 18 |
3.647376330582926 |
6.890408378292458 |
| 19 |
-6.95446931399654 |
6.697536613756536 |
| 20 |
3.2113579182328227 |
6.804906921310479 |