|  | .. SPDX-License-Identifier: GPL-2.0 | 
|  | .. _iomap_porting: | 
|  |  | 
|  | .. | 
|  | Dumb style notes to maintain the author's sanity: | 
|  | Please try to start sentences on separate lines so that | 
|  | sentence changes don't bleed colors in diff. | 
|  | Heading decorations are documented in sphinx.rst. | 
|  |  | 
|  | ======================= | 
|  | Porting Your Filesystem | 
|  | ======================= | 
|  |  | 
|  | .. contents:: Table of Contents | 
|  | :local: | 
|  |  | 
|  | Why Convert? | 
|  | ============ | 
|  |  | 
|  | There are several reasons to convert a filesystem to iomap: | 
|  |  | 
|  | 1. The classic Linux I/O path is not terribly efficient. | 
|  | Pagecache operations lock a single base page at a time and then call | 
|  | into the filesystem to return a mapping for only that page. | 
|  | Direct I/O operations build I/O requests a single file block at a | 
|  | time. | 
|  | This worked well enough for direct/indirect-mapped filesystems such | 
|  | as ext2, but is very inefficient for extent-based filesystems such | 
|  | as XFS. | 
|  |  | 
|  | 2. Large folios are only supported via iomap; there are no plans to | 
|  | convert the old buffer_head path to use them. | 
|  |  | 
|  | 3. Direct access to storage on memory-like devices (fsdax) is only | 
|  | supported via iomap. | 
|  |  | 
|  | 4. Lower maintenance overhead for individual filesystem maintainers. | 
|  | iomap handles common pagecache related operations itself, such as | 
|  | allocating, instantiating, locking, and unlocking of folios. | 
|  | No ->write_begin(), ->write_end() or direct_IO | 
|  | address_space_operations are required to be implemented by | 
|  | filesystem using iomap. | 
|  |  | 
|  | How Do I Convert a Filesystem? | 
|  | ============================== | 
|  |  | 
|  | First, add ``#include <linux/iomap.h>`` from your source code and add | 
|  | ``select FS_IOMAP`` to your filesystem's Kconfig option. | 
|  | Build the kernel, run fstests with the ``-g all`` option across a wide | 
|  | variety of your filesystem's supported configurations to build a | 
|  | baseline of which tests pass and which ones fail. | 
|  |  | 
|  | The recommended approach is first to implement ``->iomap_begin`` (and | 
|  | ``->iomap_end`` if necessary) to allow iomap to obtain a read-only | 
|  | mapping of a file range. | 
|  | In most cases, this is a relatively trivial conversion of the existing | 
|  | ``get_block()`` function for read-only mappings. | 
|  | ``FS_IOC_FIEMAP`` is a good first target because it is trivial to | 
|  | implement support for it and then to determine that the extent map | 
|  | iteration is correct from userspace. | 
|  | If FIEMAP is returning the correct information, it's a good sign that | 
|  | other read-only mapping operations will do the right thing. | 
|  |  | 
|  | Next, modify the filesystem's ``get_block(create = false)`` | 
|  | implementation to use the new ``->iomap_begin`` implementation to map | 
|  | file space for selected read operations. | 
|  | Hide behind a debugging knob the ability to switch on the iomap mapping | 
|  | functions for selected call paths. | 
|  | It is necessary to write some code to fill out the bufferhead-based | 
|  | mapping information from the ``iomap`` structure, but the new functions | 
|  | can be tested without needing to implement any iomap APIs. | 
|  |  | 
|  | Once the read-only functions are working like this, convert each high | 
|  | level file operation one by one to use iomap native APIs instead of | 
|  | going through ``get_block()``. | 
|  | Done one at a time, regressions should be self evident. | 
|  | You *do* have a regression test baseline for fstests, right? | 
|  | It is suggested to convert swap file activation, ``SEEK_DATA``, and | 
|  | ``SEEK_HOLE`` before tackling the I/O paths. | 
|  | A likely complexity at this point will be converting the buffered read | 
|  | I/O path because of bufferheads. | 
|  | The buffered read I/O paths doesn't need to be converted yet, though the | 
|  | direct I/O read path should be converted in this phase. | 
|  |  | 
|  | At this point, you should look over your ``->iomap_begin`` function. | 
|  | If it switches between large blocks of code based on dispatching of the | 
|  | ``flags`` argument, you should consider breaking it up into | 
|  | per-operation iomap ops with smaller, more cohesive functions. | 
|  | XFS is a good example of this. | 
|  |  | 
|  | The next thing to do is implement ``get_blocks(create == true)`` | 
|  | functionality in the ``->iomap_begin``/``->iomap_end`` methods. | 
|  | It is strongly recommended to create separate mapping functions and | 
|  | iomap ops for write operations. | 
|  | Then convert the direct I/O write path to iomap, and start running fsx | 
|  | w/ DIO enabled in earnest on filesystem. | 
|  | This will flush out lots of data integrity corner case bugs that the new | 
|  | write mapping implementation introduces. | 
|  |  | 
|  | Now, convert any remaining file operations to call the iomap functions. | 
|  | This will get the entire filesystem using the new mapping functions, and | 
|  | they should largely be debugged and working correctly after this step. | 
|  |  | 
|  | Most likely at this point, the buffered read and write paths will still | 
|  | need to be converted. | 
|  | The mapping functions should all work correctly, so all that needs to be | 
|  | done is rewriting all the code that interfaces with bufferheads to | 
|  | interface with iomap and folios. | 
|  | It is much easier first to get regular file I/O (without any fancy | 
|  | features like fscrypt, fsverity, compression, or data=journaling) | 
|  | converted to use iomap. | 
|  | Some of those fancy features (fscrypt and compression) aren't | 
|  | implemented yet in iomap. | 
|  | For unjournalled filesystems that use the pagecache for symbolic links | 
|  | and directories, you might also try converting their handling to iomap. | 
|  |  | 
|  | The rest is left as an exercise for the reader, as it will be different | 
|  | for every filesystem. | 
|  | If you encounter problems, email the people and lists in | 
|  | ``get_maintainers.pl`` for help. |