| Idle/background work classes design doc: |
| |
| Right now, our behaviour at idle isn't ideal, it was designed for servers that |
| would be under sustained load, to keep pending work at a "medium" level, to |
| let work build up so we can process it in more efficient batches, while also |
| giving headroom for bursts in load. |
| |
| But for desktops or mobile - scenarios where work is less sustained and power |
| usage is more important - we want to operate differently, with a "rush to |
| idle" so the system can go to sleep. We don't want to be dribbling out |
| background work while the system should be idle. |
| |
| The complicating factor is that there are a number of background tasks, which |
| form a heirarchy (or a digraph, depending on how you divide it up) - one |
| background task may generate work for another. |
| |
| Thus proper idle detection needs to model this heirarchy. |
| |
| - Foreground writes |
| - Page cache writeback |
| - Copygc, rebalance |
| - Journal reclaim |
| |
| When we implement idle detection and rush to idle, we need to be careful not |
| to disturb too much the existing behaviour that works reasonably well when the |
| system is under sustained load (or perhaps improve it in the case of |
| rebalance, which currently does not actively attempt to let work batch up). |
| |
| SUSTAINED LOAD REGIME |
| --------------------- |
| |
| When the system is under continuous load, we want these jobs to run |
| continuously - this is perhaps best modelled with a P/D controller, where |
| they'll be trying to keep a target value (i.e. fragmented disk space, |
| available journal space) roughly in the middle of some range. |
| |
| The goal under sustained load is to balance our ability to handle load spikes |
| without running out of x resource (free disk space, free space in the |
| journal), while also letting some work accumululate to be batched (or become |
| unnecessary). |
| |
| For example, we don't want to run copygc too aggressively, because then it |
| will be evacuating buckets that would have become empty (been overwritten or |
| deleted) anyways, and we don't want to wait until we're almost out of free |
| space because then the system will behave unpredicably - suddenly we're doing |
| a lot more work to service each write and the system becomes much slower. |
| |
| IDLE REGIME |
| ----------- |
| |
| When the system becomes idle, we should start flushing our pending work |
| quicker so the system can go to sleep. |
| |
| Note that the definition of "idle" depends on where in the heirarchy a task |
| is - a task should start flushing work more quickly when the task above it has |
| stopped generating new work. |
| |
| e.g. rebalance should start flushing more quickly when page cache writeback is |
| idle, and journal reclaim should only start flushing more quickly when both |
| copygc and rebalance are idle. |
| |
| It's important to let work accumulate when more work is still incoming and we |
| still have room, because flushing is always more efficient if we let it batch |
| up. New writes may overwrite data before rebalance moves it, and tasks may be |
| generating more updates for the btree nodes that journal reclaim needs to flush. |
| |
| On idle, how much work we do at each interval should be proportional to the |
| length of time we have been idle for. If we're idle only for a short duration, |
| we shouldn't flush everything right away; the system might wake up and start |
| generating new work soon, and flushing immediately might end up doing a lot of |
| work that would have been unnecessary if we'd allowed things to batch more. |
| |
| To summarize, we will need: |
| |
| - A list of classes for background tasks that generate work, which will |
| include one "foreground" class. |
| - Tracking for each class - "Am I doing work, or have I gone to sleep?" |
| - And each class should check the class above it when deciding how much work to issue. |