Migrate a Contentful space export into Drupal using the core Migrate API, with custom process plugins for the transforms YAML can't express: Rich Text AST → HTML with embedded-entry/asset resolution, asset → Media (self-contained staging + SHA-256 dedupe), and Contentful reference → Drupal internal-link rewriting.
This module is a runtime, not a fixed content model. Every Contentful space
has a different model, so the module executes per-space migration YAML rather
than shipping one schema. The worked migrations in
migrations/examples/ are templates you adapt; the same
YAML can also be produced by companion content-model analysis tooling.
Early development (1.0.x-dev). The migration runtime and the
contentful:export Drush command are built and covered by unit and kernel
tests — including a real two-pass Migrate run that resolves embeds and stages
assets to Media. All three presentation modes are documented with worked
templates, plus a shipped embed-rendering recipe (see
Presentation modes) and opt-in
author attribution. This module is not yet
covered by Drupal's security advisory policy — use at your own risk.
- Requirements
- How it works
- Embedded entry/asset resolution
- Installation
- Configuration
- Presentation modes
- Repeatable / delta imports
- Author attribution (opt-in)
- What's included
- Upgrading
- Roadmap
- Not in scope
- Maintainers
- Drupal
^11 - PHP 8.3+
- Core Migrate — the only hard dependency
contentful/rich-text ^4.0(Composer) — the PHP Rich Text AST renderer, verified to install conflict-free on Drupal 11.3 / PHP 8.x
Situational dependencies — enable only when your mapping uses them:
| When the source space has… | Enable |
|---|---|
| Component types mapped to Paragraphs | paragraphs, entity_reference_revisions |
| Assets mapped to Media | core media, image, file |
| Config-entity migrations + Drush | migrate_plus, migrate_tools |
| Location fields | geofield |
The source plugin extends core SourcePluginBase and migrations are discovered
from a migrations/ directory (core, not migrate_plus), so a node-only
migration needs none of the above. See composer.json suggest and the
comments in contentful_migration.info.yml.
- Extract.
drush contentful:export --space-id=…wraps thecontentful-exportCLI (assets included) and stages the JSON dump + asset binaries at a Drupal stream-wrapper location (defaultprivate://contentful) the migration reads. The management token is read from theCONTENTFUL_MANAGEMENT_TOKENenvironment variable, never the process argv. (You can also runcontentful-exportby hand.) Add--include-usersto also stage the space's members for author attribution. - Ingest + track. The
contentful_exportMigrate source plugin reads that JSON, filters by content type, and flattens per-locale fields (no-fallback). Migrate's map tables give you idempotent re-runs anddrush migrate:rollback. - Transform. Custom process plugins handle what YAML can't:
contentful_rich_text— Rich Text AST → HTML, resolving embedded entries/assets to<drupal-entity-embed>/<drupal-media>tokens via the migrate map, and inlineentry-hyperlinknodes to anchors on the target entity's canonical path. Two-pass: entities first, bodies second.contentful_asset_to_media+contentful_media_bundle— stage an asset into a Media entity (MIME → bundle), deduplicating identical bytes by SHA-256. Self-contained: it does not usemigrate_file_to_media(a Drupal-to-Drupal tool that does not fit external ingestion).contentful_internal_link— rewrite a Contentful entry/asset reference to anentity:{type}/{id}link URI, resolved through the migrate map and alias-safe (Drupal resolves the URI at render, so links survive later path changes).- Multi-value Paragraph references use the idiomatic core
#2890844 workaround
(
sub_process+ no-stubmigration_lookup+skip_on_emptyrow guardextract) — no custom plugin needed, and unresolved refs never becomeentity_reference_revisionsNULL writes.
The migration targets standard Drupal entities (nodes, Paragraphs, Media, menus). How you present them — decoupled, recoupled, or semi-decoupled — is a separate layer (see Presentation modes); the content migration is identical across all three.
A naive Rich Text migration loses content silently. contentful/rich-text's
default embed renderers emit Contentful-id placeholders (<div>Entry#ID</div>) —
visible garbage in Drupal, not empty. And an unrecognised node type would make
the Parser throw at parse time (InvalidArgumentException) — so the plugin
pre-sanitizes the AST first, dropping (and logging) only the unknown node and
keeping the rest of the body.
This module owns the full renderer list: custom NodeRenderers resolve sys.id
→ Drupal entity via the migrate map and emit clean embed tokens, ending in a
logging catch-all that never silently empties, with graceful degradation on
parse failure. This path — the project's biggest risk — is proven by
tests/src/Unit/RichText/ContentfulRichTextTest.php and the end-to-end
tests/src/Kernel/ContentfulMigrationTest.php.
Migrated bodies carry <drupal-media> and <drupal-entity-embed> tokens that
a text format must render: <drupal-media> renders via core media's
media_embed filter; <drupal-entity-embed> requires the contrib
Entity Embed filter. Without
one, bodies show raw tokens — on Drupal pages and in JSON:API's
body.processed alike. The bundled recipe ships that format:
drush recipe modules/contrib/contentful_migration/recipes/contentful_embedcreates the contentful_embed text format — media_embed plus a
restrictive filter_html allow-list covering exactly the markup the migration
emits (narrower than full_html, a security improvement). The example
migrations point body/format at it, and a kernel test renders a freshly
migrated body through it. Spaces with no embedded-entry blocks (most, in the
profiled corpus) are done there: embedded assets render on core media.
Embedded entries need contrib entity_embed added to the format — the tag
is pre-allowed so tokens survive until then. See
recipes/contentful_embed/README.md.
Inline entry-hyperlink nodes (a link, inside body text, to another entry)
resolve through the same embed_migrations map to an anchor on the target
entity's canonical path, carrying data-entity-type / data-entity-uuid. The
bare path always resolves on its own; the data attributes let the contrib
Linkit filter rewrite the href
alias-safely if it is enabled on the destination text format (and that
format permits those attributes). This is deliberately weaker than the
link-field case (contentful_internal_link), which Drupal core resolves
alias-safely with no contrib module — a raw body href has no equivalent core
mechanism. A target with no canonical URL (e.g. a Paragraph) or an unresolved
target degrades to its plain link text: the words are kept, the dead link
dropped, and the loss logged.
Inline asset-hyperlink nodes (a link, inside body text, to a file: a PDF,
a download) resolve to the migrated file's URL when the media and file
modules are installed — the author-intended target, not the Media entity's
canonical page (a Drupal-ism that is often access-restricted). Resolution rides
the same embed_migrations.Asset candidates as embedded assets, then hops
media → source field → file → URL through the media-source API, so
non-standard source field names work; with a private files scheme the URL is
/system/files/…, which keeps file_download access control. Without
media/file — or for an asset that cannot resolve to a local file (not
migrated, since deleted, oEmbed remote video) — the node degrades exactly as
above: link text kept, dead link dropped, loss logged.
Install with Composer (this pulls in contentful/rich-text):
composer require 'drupal/contentful_migration:^1.0@dev'
drush en contentful_migrationDrop the @dev once a tagged release is available. Then enable the situational
modules your mapping needs (see Requirements).
The module is driven by Migrate YAML — one migration per Contentful content
type — placed in a module's migrations/ directory (core discovery) or imported
as migrate_plus config entities. Start from
migrations/examples/, which demonstrates every pattern
the toolkit emits:
- single and multi-value Paragraph references (two-pass, #2890844)
- asset → Media with MIME → bundle mapping
- a two-pass Rich Text body with embed resolution
- a translation pass
- a self-referential type resolved to a Drupal menu with a parent-attachment pass
Rich Text embed resolution and internal-link rewriting are configured per
migration via an embed_migrations / link_migrations map — each keys a
Contentful linkType to an ordered list of candidate migrations to resolve
against. See contentful_blog_post_body.yml (Pass B) for the embed map.
Reference lookups in the examples disable Migrate stubs (no_stub: true).
That is intentional. A Contentful reference that did not migrate should be
empty, skipped, or logged by the mapping, not materialized as a half-empty
Drupal entity. Multi-value Paragraph references add a skip_on_empty row
guard inside sub_process, so a single unresolved child is dropped before
extract reads target_id / target_revision_id.
Self-referential structures use the same rule. The navigation example creates
menu links in Pass A and attaches parents in Pass B with core
menu_link_parent; it does not rely on same-migration row order. That keeps
recursive source graphs out of Drupal default-content _meta.depends cycle
territory and avoids the entity_reference_revisions NULL fatal class exposed
by upstream issue-queue testing.
Map sys_id to a plain text field (the worked example uses
field_contentful_id) on every destination bundle whose entries a consumer
may need to look up by their original Contentful id — see the PATTERN comment
in contentful_blog_post.yml. The migrate map already records sys.id →
entity, but map tables are migration infrastructure: JSON:API never exposes
them and a migrate:reset erases them. A field makes the identity durable
content, queryable by any decoupled consumer
(/jsonapi/node/blog_post?filter[field_contentful_id]=<sys.id>) — see
Converting a Contentful front end.
The content migration is identical across all three modes — presentation is a
layer on top, and the module ships exactly its generic slice: the
contentful_embed recipe (the one piece of
standing config, because migrated bodies depend on it — kernel-tested against
a freshly migrated body) and modes/examples/ (worked
reference templates, the same adapt-these contract as migrations/examples/).
- Decoupled — Drupal as a headless content store. One core enable step
(
drush en jsonapi, read-only by default), the embed-token contract for SPA consumers, CORS-as-environment-config notes, a front-end conversion guide, and contrib pointers:modes/examples/decoupled/. - Recoupled — Drupal renders, against a theme you provide. The
editorInterfaces→ widget map derives field/widget/formatter choices from data 211 of 218 profiled exports already carry, with four worked display templates:modes/examples/recoupled/. - Semi-decoupled — Drupal shell + JS-hydrated islands. An architecture
pattern, honestly documented as a pointer (the union of the other two plus
a frontend choice):
modes/examples/semi-decoupled/.
Per-space display config (real bundle/field names) is generated, not
shipped: derive it from your export's editorInterfaces + the widget map —
by hand from the templates, or via companion display-planning tooling, the
same division of labor as migration YAML. The boundary is written down in
modes/examples/README.md: that directory stays
reference patterns forever, and the module never ships applied per-space
config.
drush contentful:jsonapi-map emits a JSON manifest of every
contentful_export migration: Contentful type → JSON:API resource, field id →
JSON:API field name (authoritative when jsonapi is installed;
convention-derived otherwise). Hand it to whoever — or whatever — is rewriting
front-end queries; it is the machine-readable companion to the
front-end conversion guide.
Pipelines the extractor cannot classify are marked "kind": "manual".
Migrate's id-map makes re-imports idempotent: re-run contentful:export and
drush migrate:import, and changed entries re-import onto the same Drupal
entities. The stock track_changes source option controls re-run cost —
verified on this source plugin by a real kernel migrate run:
source:
plugin: contentful_export
# …
track_changes: true # re-import only rows whose content changedWithout track_changes, already-imported rows are skipped even if their
content changed — set it for any space you intend to re-export.
(high_water_property is deliberately not documented here: this source yields
rows in export-file order, not date order, and the high-water interaction with
an unordered iterator is unverified. track_changes alone covers the
re-import case; high-water support may follow once it has a test.)
Deletions do not propagate. A full export is a snapshot with no deletion
tombstones, and migrate:import never deletes destination content — an entry
deleted in Contentful lingers in Drupal until you reconcile it (diff the
migration's id-map source ids against the new export's sys.id set, then
drush migrate:rollback the missing ids, or remove them by hand). This module
is deliberately not a sync engine.
Entry timestamps migrate out of the box (sys.createdAt/updatedAt →
created/changed, two core process plugins). Authors need one more step:
the export JSON carries only opaque author ids — bare
{sys:{linkType:User,id}} links, present in 204 of 218 profiled exports, with
zero user objects in any of them. Two supported paths, smallest first:
static_map(no fetch). Map the handful of author ids you care about to existing Drupal accounts on thesys_created_by/sys/idnested source key. Zero network, zero new users — right for spaces with few authors, or when editors already have real Drupal accounts.--include-users(CMA fetch).drush contentful:export --space-id=… --include-usersmakes a separate paginated Content Management API call (contentful-exportitself never includes users) and stages each member entry-shaped inusers.json. Thecontentful_userexample then imports them as blocked stubs —status: 0, no roles, no password: attribution targets, never logins — and entry migrations resolveuidviamigration_lookup(worked snippet incontentful_blog_post.yml).
Handled for you, deterministically, at the export step: duplicate display
names are disambiguated with the member's sys id, and names are clipped to
Drupal's 60-character limit — both hard per-row failures otherwise (user
names are a database-level unique key). Because the staged names depend only
on the CMA data, a regenerated users.json is stable and track_changes
re-imports never rename members. Stated edges: member email is an
admin-token-only CMA attribute (with a non-admin token, stubs import
mail-less — users.json holds names and emails, which is why it stays in the
private-by-default export dir); a member whose name collides with an
existing site user fails that row loudly (rename the account or
static_map that member). Entries whose author left the space (id
absent from users.json) fall back to anonymous explicitly; entries with no
author link at all (uncommon — 14 of 218 profiled exports lack createdBy)
leave uid unset, which Drupal fills with the importing user: anonymous
under a standard drush migrate:import, but a logged-in admin running
imports through a UI (e.g. migrate_tools) would own them. Neither path is
ever silently attributed to uid 1. The staged-file → stubs → attribution
chain, including both degrades, is kernel-tested.
src/Plugin/migrate/source/ContentfulExport.php JSON source: locale flatten, content_type filter
src/Plugin/migrate/process/
ContentfulRichText.php AST → HTML, embed resolution
ContentfulAssetToMedia.php asset → Media, SHA-256 dedupe
ContentfulMediaBundle.php MIME → media bundle
ContentfulInternalLink.php reference → entity: link URI
src/RichText/ NodeRenderer impls + sys.id resolver
src/Source/ContentfulEntryFlattener.php pure locale/field flattener
src/Export/ pure export config/summary/users-fetch helpers
src/JsonApiMap/ migration → JSON:API manifest extractor (pure)
src/Drush/Commands/ drush contentful:export, contentful:jsonapi-map
migrations/examples/ 9 worked migration YAMLs + README
recipes/contentful_embed/ text format rendering the embed tokens
modes/examples/ presentation-mode templates + WIDGET-MAP
tests/ unit + kernel coverage
Behavior changes between releases are cataloged here (and in each release's notes on drupal.org); none break an API. Per-space migration YAML you authored is yours — upgrades never rewrite it.
- 1.0.0-beta1 — inline
asset-hyperlinknodes emit a real<a href>to the migrated file when themedia+filemodules are enabled (previously: always plain text — see Inline hyperlinks). A body re-imported after upgrading gains the file links its earlier import dropped; the plain-text degrade remains the modules-absent behavior, so nothing silently loses content. - 1.0.0-beta1 — the example migrations'
body/formatdefault is nowcontentful_embed(wasfull_html), pointing fresh migrations at the recipe-shipped format that actually renders their tokens. New migrations only: per-space YAML you authored on alpha releases keepsfull_htmluntil you adopt the recipe and edit your Pass-B format (a documented two-step inrecipes/contentful_embed/README.md). - 1.0.0-beta2 —
contentful:exportgains--include-users(author attribution). Off by default: without the flag, nothing changes — no extra network call, no users.json, no new users. - 1.0.0-beta3 — additive only: new read-only
contentful:jsonapi-mapcommand (mapping manifest) and thefield_contentful_ididentity-preservation pattern in the worked examples (preserving Contentful identity). No existing migration, command, or rendering behavior changes.
Empty by graduation: presentation modes and asset hyperlinks shipped in beta1,
opt-in author → user mapping in beta2, the decoupled conversion toolkit
(identity pattern, conversion guide, contentful:jsonapi-map) in beta3.
Nothing further is planned before 1.0.0 — candidates beyond it (e.g.
high_water_property support once it has a kernel test of its own) are
tracked in the issue queue, not promised here.
Import and rollback aren't wrapped by design: once a space's migrations exist,
drush migrate:import --execute-dependencies and drush migrate:rollback are
already the right tools.
Each exclusion below is a deliberate decision, not an omission — with the evidence it rests on (218 real space exports profiled):
- Live/bidirectional sync. One-way migration only. Repeatable / delta imports cover the re-export → re-import case; a live two-way bridge is a different product.
- Full edit/revision history. Exports carry only version counters — none
of the 218 profiled exports contain revision snapshots (recovering history
needs per-entry Management-API calls). Authorship metadata is migratable:
sys.createdAt/updatedAtmap tocreated/changedwith two core process plugins (seemigrations/examples/contentful_blog_post.yml), and author attribution is supported as an opt-in export step. - Roles/permissions. Role definitions appear in most real exports (155/218) — the exclusion is not data availability. Contentful's policy rules do not map onto Drupal's permission model, and auto-generating roles risks granting more than intended. Model roles deliberately in Drupal.
- Webhooks, UI extensions, SSO. Contentful platform config with no safe Drupal equivalent: webhooks target Contentful's event model, UI extensions are app-framework artifacts, SSO is organization-level configuration.
- Theme/design-system generation. Recoupled presentation consumes a provided theme; this module ships content, not design.
- Contentful CDA emulation. Evaluated and gated, not forgotten: the
envelope (
sys+fields+includes) is reproducible, but the contract (RichText AST, Images API parametric transforms) is not — at least 14% of profiled spaces carry embedded-entry RichText that would break unchanged front ends. Decision record and reopen-gates:spike/cda-emulation/DECISION.md. Conversion is the supported path (see Presentation modes).
- Alex Urevick-Ackelsberg (alex ua)
The current maintainer list is on the project page.