Skip to content

WIP: New drag and drop API#4571

Draft
eira-fransham wants to merge 43 commits into
rust-windowing:masterfrom
slint-ui:drag-n-drop
Draft

WIP: New drag and drop API#4571
eira-fransham wants to merge 43 commits into
rust-windowing:masterfrom
slint-ui:drag-n-drop

Conversation

@eira-fransham
Copy link
Copy Markdown

@eira-fransham eira-fransham commented May 19, 2026

This PR implements a new API for drag and drop, with a DataTransfer type which abstracts over the various clipboard/drag and drop APIs across different platforms. I built this on top of #2429 in order to ensure @SludgePhD gets credit if it gets merged, although admittedly I ended up removing pretty much all of their work while I was reworking the design.

This is being built in order to help support drag-and-drop work in Slint's winit backend. As part of that work, I did extensive research on how drag-and-drop and clipboard APIs are implemented across different platforms, and wrote a (still WIP) research document that can be found here.

Some platforms (Wayland, X11) always transfer bytes with a MIME type, other platforms have a set of standardised transferrable types. However, all types that are supported cross-platform (images, RTF, HTML, plaintext, URIs/URI lists) are cleanly expressible using MIME types on all supported platforms. As part of this PR, I've written up a quick-and-dirty summary of which types are supported on different platforms.

Design

The new API is inspired by the browser's DataTransfer API. The main complexity comes from supporting both the common set of capabilities (the types and traits in winit-core/src/data_transfer.rs) while also allowing a consumer to use the platform-specific APIs.

The design may look somewhat complex, and I am open to suggestions for simplifying it, but OS drag-and-drop/clipboard/etc APIs are just fundamentally complex. Unfortunately, there's going to be a lot of complexity here no matter the implementation. This article by a Wayland maintainer describes it as "arguably one of the most complicated parts of the core Wayland protocol", and from researching the design for other platforms it seems like Wayland actually has the simplest API.

Current state

X11 and macOS are working, with the supported types in the example (string, HTML, file URIs) tested on my two machines:

  • X11: Tested under both KDE and Cosmic, both with XWayland, on the latest version of CachyOS. A colleague has tested it under native X11, although I don't know which distro or DE.
  • macOS: Tested under latest version of macOS (macOS Tahoe 26.4.1)

Windows and Wayland are yet to be completed, but I have machines running both and so can test them.

@kchibisov
Copy link
Copy Markdown
Member

I haven't read very much into impl. details, but from what I've seen:

  1. I think operations for fetching should be on ActiveEventLoop. You can not fetch clipboard on a window without loop on Wayland at all.
  2. The transfer/mime stuff should work with clipboard, because Drag and Drop and system clipboard is the same. So once winit supports clipboard, it should use the similar-ish system.
  3. We should keep in mind that on Wayland, if you get 2-3 types mime types, data that will be provided for each will be different, thus picking what to load should be up to user, and also, the loading of data from the Dnd/Clipboard is async operation, so the data should come back in terms of Event back to user. So, something like ActiveEventLoop::request_data_transfer_data(DataTransferId) -> Result<()> and a callback with data Vec<u8>, mime: Mime on ApplicationTrait should come later once we done reading the data back. Any blocking operation won't work, unfortunately.

In the current API, we have this URI list preloaded, but I think this is solved, we should only load with async-ish API once user asks, I think, if you want to follow Wayland.

If the API is not lazy, I'm not sure how it should be done, so if yo can provide tl;dr it would help (haven't found clear wording in your research).

This article by a Wayland maintainer describes it as "arguably one of the most complicated parts of the core Wayland protocol"

Well, because it's a lot of work, you need to negotiate what you want to read (image/text/html/whatever), then both ends should do non-blocking writing/reading to an FD, thus implying that you plug that FD into epoll based event loop or something. And ensure that you don't hang each other.

And compositor does very little here actually, it just lets you exchange FDs with other end, but all this negotiation + writing the right mime type is on the client.

So for example, when something drags object/pastes, you have a bunch of mime types you can use to query data, e.g. image/text/audio, then you ask for audio, and other end should provide audio.

Then you want to initiate drag and drop, and you started dragging something, but this something is either text or image, and then depending on what the other end picks(e.g. you drag from winit to firefox, and firefox picks one of two mimetypes you gave to it), winit must reply with the right data. Note that you don't create buffer for both image and text before hand, you only create them once you get event what type of data other end wants. It can also ask you for both, or ask you something from time to time as long as you have advertised something.

@eira-fransham
Copy link
Copy Markdown
Author

eira-fransham commented May 21, 2026

@kchibisov Thanks for your response. I appreciate you clarifying the way that data transfer (i.e. clipboard and drag-and-drop) works, but it might be worth reading through the PR since I have an extensive doc comment explaining the exact concerns that you bring up and how the API addresses them. In particular:

  • The API already handles type multiplexing. That is not Wayland-specific, it's done on all platforms
  • The API is already lazy and asynchronous (in the general sense, not in the async-keyword sense). See the docs for fetch_data_transfer + the DataTransferResult event

The implementation in this PR only addresses drag-and-drop, but it is specifically designed to support clipboard operations in the future. Clipboard operations are planned in a follow-up PR, and only left unimplemented for now for two reasons:

  1. I wanted to focus on the new API rather than adding too much new implementation (see also the note about removing the Wayland implementation in the PR description). This is already going to be a reasonably-large PR and I didn't want to overload the winit team.
  2. The most-pressing concern from the side of Slint is drag-and-drop. Implementing the clipboard via a side-channel is a lot simpler and more self-contained than trying to do so for drag-and-drop.

The type hierarchy is like so:

  • DataTransfer - the offer of multiple typed views of some data (clipboard or drag-and-drop). None of the data is guaranteed to be resolved at this time. Corresponds to wl_data_offer, NSPasteboard, Windows.ApplicationModel.DataTransfer.DataPackage, etc.
  • TransferType + TypeHint - the platform-specific and cross-platform representation of a type, respectively. In most cases, TypeHint is enough. It covers the set of advertisable types which have some equivalent in the data transfer mechanism on every platform.
  • TypedData (could potentially do with a better name) - the data resulting from the resolved request for some specific type of a DataTransfer. Like with DataTransfer and TransferType, this can be interacted with generically or downcast to a platform-specific type.

The API flow as-implemented by this PR is like so:

  • Instead of receiving a parsed value (as in the design on master), the user just gets an ID.
  • The user can use that ID to request the types of the data transfer synchronously, and/or request a certain type of the data of the transfer asynchronously.
  • The user can accept or reject a drag operation by using that same DataTransferId. Even though a DataTransferId could also refer to clipboard data, in this case I figured it was better to just error out if the ID of a data transfer that wasn't part of a drag operation. Alternatively, a separate DragOperationId could be introduced to avoid the overloading, but I don't see a reason to do so.
  • When the fetch of some type in a data transfer has completed, a dyn TypedData is passed using the DataTransferResult event. The user may read this TypedData using the specified data format, with helpers for plaintext and URI lists since those are special-cased on some platforms and have platform-specific encoding (UTF-8 vs UTF-16) that users should not have to handle manually.

Note that we cannot just return a potentially-blocking io::Read impl when fetching a specific type, as at least on X11 the data is transferred as part of the event loop. TypedData must have the invariant that directly reading it on the event loop may stall the application but can never cause a deadlock. That's also why types can be fetched synchronously - all platforms support reading the types offered by another application without stalling the event loop.

Regarding moving fetch_data_transfer (+ the other data transfer related stuff) to the event loop: the event loop feels like the most-natural place for it, but X11 handles selection transfer on a per-window basis rather than per-event-loop, so putting it on the window was the lowest common denominator w.r.t. cross-platform use. I believe that X11 is the only platform that does it this way though, so I wouldn't be against putting it on the event loop instead.

@kchibisov
Copy link
Copy Markdown
Member

Regarding moving fetch_data_transfer (+ the other data transfer related stuff) to the event loop: the event loop feels like the most-natural place for it, but X11 handles selection transfer on a per-window basis rather than per-event-loop, so putting it on the window was the lowest common denominator w.r.t. cross-platform use. I believe that X11 is the only platform that does it this way though, so I wouldn't be against putting it on the event loop instead.

But we can pass WindowId to which window deliver callback. Clipboard is the same, it can not work without window on Wayland, but still, transfer itself is event loop based. Window could be implementation detail of a transfer/etc. For X11 people tended to use hidden empty util window(for clipboard for sure, but it's a bit slow).

Also, X11 is nearly dead, so no point in design around it.

@eira-fransham
Copy link
Copy Markdown
Author

Yeah, as of the latest few commits I’ve moved the whole drag-and-drop API to the event loop. Fetching still needs to be done asynchronously to support X11 though unfortunately, there’s not really a way around that without running the risk of deadlocking the event loop. The user can always immediately try to read the data without waiting for the DataTransferResult event, that’ll work on most platforms.

I’ve also updated the dnd example to show how to use the new API.

Copy link
Copy Markdown
Member

@kchibisov kchibisov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally using event will be fine, I guess. The API should certainly be async, sync won't work on Wayland as well.

Comment thread winit/examples/dnd.rs Outdated

self.last_dnd_fetch = None;
},
WindowEvent::DataTransferResult { serial, .. } => {
Copy link
Copy Markdown
Member

@kchibisov kchibisov May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we really need it as Event or a series of callbacks on an ApplicationTrait? That way you can make this async API safe to use and also don't have to clone data around etc in some 'potential' cases.

Copy link
Copy Markdown
Member

@kchibisov kchibisov May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for example, on Wayland, you'll get transfer id + bunch of mime types, then user e.g. requests Utf8, and then event loop will slowly read it and call back to user with the type it requested or signal error if reading data failed (other end died during transaction), or maybe just callback with e.g. Vec<8> and a hint + transfer user used, so we move all convertions to user of the API.

That way we don't need to enforce UTF-8, I'd prefer winit to be non-opinionated here and just pass data around as it goes.

Copy link
Copy Markdown
Author

@eira-fransham eira-fransham May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So w.r.t. the UTF-8 side of things: that is already how it's implemented in winit on master. The cross-platform interface converts everything to UTF-8, since Rust strings are UTF-8. With the design in this PR, the user can downcast to the platform-specific types if they want to access something that isn't UTF-8 strings. That being said, I'm not happy with how the UTF-16 detection works inside the X11 implementation, that's still a work in progress.

It's not really possible to just unconditionally return a vector of bytes, or at least it's not very helpful to do so. On Wayland and X11, all clipboard/dnd data is ultimately just a binary blob, but that's simply not true for Windows or macOS. In particular, file URIs on both Windows and macOS are an array of individual items, each of which representing a single path. Even if all types could be exposed as a binary blob cross-platform, the encoding of strings differs between platforms. The user can't even just use OsStr or something like that since on X11, it's not guaranteed that strings will be UTF-8. Firefox transfers HTML as UTF-16, for example. Images can be exposed as raw bytes in a known format on every platform, and I believe that the same is true for audio although I'm not totally sure if that's true on macOS. Of the types that are supported cross-platform (the ones in TypeHint), AFAIK the only ones requiring special handling for cross-platform support are strings and file URIs, which is why they have special-cased helper methods in TypedData.

I've been talking with my colleagues, and I believe that it's not necessary to have this DataTransferResult event. It's only necessary on X11 to prevent deadlocking the event thread, but the way that Qt handles it is simply to drive the event loop inside their equivalent of fetch_data_transfer. It feels kinda gross and I wish it wasn't necessary, but the X11 spec mandates that the XConvertSelection message is handled synchronously, followed by sending SelectionNotify synchronously, so we can peek the upcoming messages inside fetch_data_transfer and if the source application sends some other non-SelectionNotify message instead then we can just return an error and return to the regular event loop.

For every other platform, the event is unnecessary. While all other platforms have some way of sending data asynchronously, none of them require actually driving the event loop to do so, so the application can read the TypedData directly in the event loop without deadlocking. It'll still potentially stall the event loop, but that's fine. If they don't want to stall, they can transfer the TypedData to another thread and handle it there.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Wayland mandates it? You need to async read the pipe which you negotiate in async manner as well and the thing can even die inside the negotiation. You also need async when you send data to yourself, because you'll overflow channel and need to release reader/writer (hence non blocking IO is mandatory and driving IO read/write by event loop is also mandatory).

Like I'm not sure how you'd do anything sync on Wayland.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if you block on sending data to yourself then the application or UI framework developer has complete control over all the code in question and I’d argue it’s not winit's job to prevent that bug. The DataTransferResult event doesn’t even prevent that bug on Wayland since it only advertises that the data is ready to start reading, not that all data has been received. We should probably mention the chance of deadlocking in the docs, though.

Wayland has an async interface (as do all platforms, including macOS and Windows) but, unlike X11, it doesn’t require driving the event loop to receive data. You send a file descriptor to the source application and the source application writes to it. The fact that X11 requires driving the event loop is the only reason that the event is necessary, so I think that it should be up to the user whether the data transfer is read synchronously or asynchronously.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why use sync interface in the first place if it's error prone when we can do async which is safer to use 🤔 I guess so the users don't wait for the transaction to complete?

Copy link
Copy Markdown
Author

@eira-fransham eira-fransham May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The event that this thread is in reference to doesn't solve the issues mentioned above. The only issue it solves is specifically needing to drive the event loop between requesting a type from a data transfer and being able to access it, which is only necessary on X11 and has a workaround (that Qt uses, so clearly it's at least a functional solution). I'm completely in favour of having async IO methods on TypedData, e.g. try_async_read(&self) -> Option<Box<dyn AsyncRead>>, which would solve the issues you mention, but I think those methods should coexist with sync equivalents because of function colouring.

Copy link
Copy Markdown
Author

@eira-fransham eira-fransham May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as of 98106b8, I've changed X11's weird behaviour to just result in runtime errors instead of leaking them into the API. Now the API is non-blocking, meaning error-returning methods can return io::ErrorKind::WouldBlock instead of waiting. If the user explicitly wants to block, they can call wait_for_data, which will return io::ErrorKind::Deadlock if waiting would deadlock on that platform (i.e. on X11).

This means that, so long as the user correctly handles errors, they have a path to support all backends, without needing to add new events to the API that only make sense on X11. This design has all the benefits of the previous event-driven design, but avoids leaking X11's weirdness into the API.

An async interface is still desirable, but since the rest of winit is still sync I think that's best left for a follow-up PR (or for an external crate, since the per-platform APIs are still accessible with this design). I think writing a Future/Stream/AsyncRead-based API for each platform is out of scope for this PR, custom Future implementations are extremely hard to get correct. As far as I can tell, the only platform that could easily return an AsyncRead for clipboard/drag-and-drop data is Wayland, since it just uses file handles. On every other platform it would have to be custom-implemented.

Copy link
Copy Markdown
Author

@eira-fransham eira-fransham May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A colleague pointed out that it's probably better to make wait_for_data an internal implementation detail of the X11 SelectionReader and remove it from the cross-platform trait. The user doesn't really have a reason to handle the WouldBlock (from the reader) and Deadlock (from wait_for_data) error cases differently and it's only a major problem on X11. On Wayland we choose how the file descriptor is implemented, so if there's a risk of deadlocks that's something that winit can solve internally.

This commit also adds deadlock detection to X11,
since on that platform the event loop needs to be
polled in order for the selection to be received.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants