Add native type extraction support to BaseDataLoader and ARFF#655
Merged
Conversation
Introduces SUPPORTS_NATIVE_TYPES, NATIVE_TYPE_MAPPING, and extract_native_types() on the dataloader base so self-describing formats can expose column types directly instead of going through statistical inference. Default returns None, preserving existing behavior for all current loaders. get_metadata() now surfaces the capability flag for the frontend.
ARFF files declare each attribute as NUMERIC, INTEGER, REAL, NOMINAL, STRING, or DATE in the header. extract_native_types now reads the scipy meta object (previously discarded) and builds the DashAI column type dict directly: numeric kinds map to Float/Integer, nominal maps to Categorical with categories taken verbatim from the header. Refactors _read_arff_file to share the raw scipy call.
When the request sets use_native_types and the chosen dataloader declares SUPPORTS_NATIVE_TYPES, call extract_native_types on the prepared file and short-circuit the DashAIPtype/Dummy inference loop. The returned dict reuses the inferred_types response field, so existing frontend and dataset_job paths consume it unchanged.
DataloaderConfigBar fetches the selected dataloader's metadata and, when supports_native_types is true, renders a Switch above the Inference Rows input. The toggle defaults to on so the first preview already runs in native mode, propagates use_native_types in the params payload, and swaps the row-count input label to Preview Rows when active. New i18n keys land in en, es, and pt locales.
Adds tests for SUPPORTS_NATIVE_TYPES metadata exposure, the full schema returned by extract_native_types over a mixed-attribute ARFF (NUMERIC/REAL/INTEGER/NOMINAL with categories), shape parity with DashAIPtype.infer_types, and the negative path where loaders without an override return None.
BaseDataLoader.get_metadata now exposes the supports_native_types flag for the frontend toggle. Update test_components_api fixtures to match the new metadata dict shape (defaults to False for stock test loaders).
cristian-tamblay
approved these changes
May 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This pull request adds support for native column types in self describing dataset formats across the DashAI backend and frontend. Dataloaders can now expose native schema information, allowing users to choose between using metadata defined types or statistical inference during dataset preview. The ARFF dataloader implements this functionality, and the API/UI were updated to surface and configure the new behavior.
Type of Change
Check all that apply like this [x]:
Changes (by file)
Backend
BaseDataLoader: AddedSUPPORTS_NATIVE_TYPES,NATIVE_TYPE_MAPPING, and the overridableextract_native_typesmethod with default no-op behavior.ARFFDataLoader: Implemented native type extraction from ARFF headers and mapped ARFF attribute types to DashAI column types.datasets.py/ preview endpoint: Added theuse_native_typesparameter topreview_with_typesand updated preview logic to use metadata defined types when available.supports_native_typesfield so the frontend can dynamically detect support.Frontend
DataloaderConfigBar.jsx: Added UI controls for enabling/disabling native types when supported by the selected dataloader.use_native_typesflag.en,es,pt): Added localization entries for the new "Use Native Types" option and related preview labels/descriptions.Testing (optional)
Reviewers should verify:
Notes (optional)
This implementation establishes a generic interface for native schema/type support in dataloaders, making it easier to extend the feature to additional self describing formats in the future.