This is the third of four short posts about matten. The previous post covered the numeric core. This one covers a separate feature: ingesting messy, real-world data.
The problem
The numeric Tensor in the previous post is clean by construction: every cell is an f64. That is fine when your data is already clean. It is less fine when it arrives from a JSON API or a CSV file that has missing cells, integer values alongside floats, or the occasional boolean flag.
The dynamic feature adds an ingestion-and-cleanup layer for that case. It is not a second compute engine — you cannot do arithmetic on a dynamic tensor directly. The idea is simpler: ingest heterogeneous data, inspect and clean it, then convert explicitly to a numeric tensor when you are confident the data is ready.
Enable it:
matten = { version = "0.28", features = ["dynamic"] }
Ingesting mixed data
from_json_dynamic and from_csv_dynamic accept data with mixed types. Each cell lands in an Element variant: Float, Int, Bool, Text, or None (for JSON null or an empty CSV field).
use matten::{NumericPolicy, Tensor};
// A JSON table with mixed numeric kinds and a missing cell
let json = "[[1, 2.5, null], [4.0, 5, 6]]";
let t = Tensor::from_json_dynamic(json)?;
assert!(t.is_dynamic());
assert_eq!(t.shape(), &[2, 3]);
assert_eq!(t.count_none(), 1);
The same on-ramp works for CSV:
use matten::Tensor;
let csv = "10.0,20.0,30.0\n40.0,,60.0\n70.0,80.0,\n"; // two empty cells
let t = Tensor::from_csv_dynamic(csv)?;
assert_eq!(t.count_none(), 2);
The format differs; the workflow does not.
Inspecting missing values
Before cleaning, you can see where the gaps are:
// none_mask: a numeric tensor of 0.0 / 1.0, one per cell
let mask = t.none_mask();
assert_eq!(mask.get(&[0, 2]), Some(1.0)); // null at [0,2]
assert_eq!(mask.get(&[0, 0]), Some(0.0)); // present
// schema_summary gives a readable type breakdown
println!("{}", t.schema_summary());
// e.g. "Float: 4, Int: 1, None: 1"
Converting to a numeric tensor
The conversion step is explicit by design. try_numeric() is strict and refuses if any None, Bool, or Text values are present:
// This fails — there is a null in the data
assert!(t.try_numeric().is_err());
try_numeric_with(policy) lets you state exactly what to do with each variant:
use matten::NumericPolicy;
// Treat None as 0.0; Int and Float both become f64
let clean = t.try_numeric_with(NumericPolicy::default().none_as(0.0))?;
assert!(!clean.is_dynamic());
assert_eq!(clean.as_slice(), &[1.0, 2.5, 0.0, 4.0, 5.0, 6.0]);
Other policy options:
// none_as_nan: missing → f64::NAN instead of a chosen sentinel
let p = NumericPolicy::default().none_as_nan();
// allow_bool: true → 1.0, false → 0.0
let p = NumericPolicy::default().allow_bool();
// allow_text_parse: try to parse text cells as f64
let p = NumericPolicy::default().allow_text_parse();
// Chain options together
let p = NumericPolicy::default().none_as(0.0).allow_bool();
// Accept all variants permissively
let p = NumericPolicy::permissive();
Cleaning before converting
You can also fill missing values before the conversion step:
// Replace every None with 0.0 in place
let filled = t.fill_none(0.0);
assert!(filled.is_numeric_convertible());
// Then convert strictly
let numeric = filled.try_numeric()?;
forward_fill_none is also available for time-series-style forward propagation.
What dynamic is not
A few things that are intentionally absent from the dynamic feature:
- No arithmetic on dynamic tensors. Call
try_numeric()first. - No dynamic reshape, slice, or reduction.
- No serde for dynamic tensors.
The point is a clean handoff: messy input → inspect and clean → numeric Tensor → ordinary numeric work. That boundary is deliberate.
Links: crates.io · docs.rs · mdBook · repository
Top comments (0)