Skip to content

Add native type extraction support to BaseDataLoader and ARFF#655

Merged
cristian-tamblay merged 6 commits into
developfrom
feat/dataloader-types
May 28, 2026
Merged

Add native type extraction support to BaseDataLoader and ARFF#655
cristian-tamblay merged 6 commits into
developfrom
feat/dataloader-types

Conversation

@Irozuku
Copy link
Copy Markdown
Collaborator

@Irozuku Irozuku commented May 28, 2026

Summary

This pull request adds support for native column types in self describing dataset formats across the DashAI backend and frontend. Dataloaders can now expose native schema information, allowing users to choose between using metadata defined types or statistical inference during dataset preview. The ARFF dataloader implements this functionality, and the API/UI were updated to surface and configure the new behavior.

image

Type of Change

Check all that apply like this [x]:

  • Backend change
  • Frontend change
  • CI / Workflow change
  • Build / Packaging change
  • Bug fix
  • Documentation

Changes (by file)

Backend

  • BaseDataLoader: Added SUPPORTS_NATIVE_TYPES, NATIVE_TYPE_MAPPING, and the overridable extract_native_types method with default no-op behavior.
  • ARFFDataLoader: Implemented native type extraction from ARFF headers and mapped ARFF attribute types to DashAI column types.
  • datasets.py / preview endpoint: Added the use_native_types parameter to preview_with_types and updated preview logic to use metadata defined types when available.
  • Component metadata endpoints: Added the supports_native_types field so the frontend can dynamically detect support.
  • Backend tests: Updated and extended tests to validate native type extraction and API behavior.

Frontend

  • DataloaderConfigBar.jsx: Added UI controls for enabling/disabling native types when supported by the selected dataloader.
  • Dataset preview requests: Updated payloads to include the use_native_types flag.
  • Preview table labels: Adjusted row labels dynamically depending on whether native types or inferred types are being displayed.
  • Translation files (en, es, pt): Added localization entries for the new "Use Native Types" option and related preview labels/descriptions.

Testing (optional)

Reviewers should verify:

  • ARFF datasets correctly extract and display native column types.
  • Disabling "Use Native Types" falls back to statistical inference.
  • Dataloaders without native type support do not display the toggle in the UI.
  • Preview endpoint behavior remains unchanged for unsupported formats.

Notes (optional)

This implementation establishes a generic interface for native schema/type support in dataloaders, making it easier to extend the feature to additional self describing formats in the future.

Irozuku added 6 commits May 28, 2026 12:48
Introduces SUPPORTS_NATIVE_TYPES, NATIVE_TYPE_MAPPING, and
extract_native_types() on the dataloader base so self-describing
formats can expose column types directly instead of going through
statistical inference. Default returns None, preserving existing
behavior for all current loaders. get_metadata() now surfaces the
capability flag for the frontend.
ARFF files declare each attribute as NUMERIC, INTEGER, REAL, NOMINAL,
STRING, or DATE in the header. extract_native_types now reads the
scipy meta object (previously discarded) and builds the DashAI column
type dict directly: numeric kinds map to Float/Integer, nominal maps
to Categorical with categories taken verbatim from the header.
Refactors _read_arff_file to share the raw scipy call.
When the request sets use_native_types and the chosen dataloader
declares SUPPORTS_NATIVE_TYPES, call extract_native_types on the
prepared file and short-circuit the DashAIPtype/Dummy inference
loop. The returned dict reuses the inferred_types response field,
so existing frontend and dataset_job paths consume it unchanged.
DataloaderConfigBar fetches the selected dataloader's metadata and,
when supports_native_types is true, renders a Switch above the
Inference Rows input. The toggle defaults to on so the first preview
already runs in native mode, propagates use_native_types in the
params payload, and swaps the row-count input label to Preview Rows
when active. New i18n keys land in en, es, and pt locales.
Adds tests for SUPPORTS_NATIVE_TYPES metadata exposure, the full
schema returned by extract_native_types over a mixed-attribute
ARFF (NUMERIC/REAL/INTEGER/NOMINAL with categories), shape parity
with DashAIPtype.infer_types, and the negative path where loaders
without an override return None.
BaseDataLoader.get_metadata now exposes the supports_native_types
flag for the frontend toggle. Update test_components_api fixtures
to match the new metadata dict shape (defaults to False for stock
test loaders).
@Irozuku Irozuku added enhancement New feature or request front Frontend work back Backend work labels May 28, 2026
@cristian-tamblay cristian-tamblay merged commit d16eb0b into develop May 28, 2026
19 checks passed
@cristian-tamblay cristian-tamblay deleted the feat/dataloader-types branch May 28, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

back Backend work enhancement New feature or request front Frontend work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants