• mina86.com

  • Categories
  • Code
  • Contact
  • Rust’s worst feature*

    * available in Rust nightly.

    There are several aspects of Rust that I’m not particularly fond of but the one that takes the cake is core::io::BorrowedBuf which I despise with passion. It’s a nightly feature which puts in question my extreme emotions about it. On the other hand it means there’s time to stop it from getting stabilised and figure out something better.

    In this article I’ll describe the problem the feature addresses, the issues I have with the solution and describe some alternatives. As it turns out, things aren’t as easy as they seem on the first look.

    The problem

    Consider the slow_copy routine below which copies data between two I/O streams. On each iteration of the loop, it zero-initialises the buffer which wastes time considering that read will override the data. The compiler doesn’t know that and has no choice but to fill the array with zeros each time. Even an obvious optimisation of moving the buffer declaration outside of the loop isn’t available to the compiler.

    fn slow_copy(
      mut rd: impl std::io::Read,
      mut wr: impl std::io::Write,
    ) -> std::io::Result<()> {
      loop {
        let mut buf = [0; 4096];
        let read = rd.read(&mut buf)?;
        if read == 0 {
          break Ok(());
        }
        wr.write_all(&buf[..read])?;
      }
    }

    An attempt at a solution is to use MaybeUninit which makes it possible to declare a region of uninitialised memory. Some explicit pointer casting is necessary, but otherwise code using it is straightforward.

    use core::mem::MaybeUninit;
    
    pub fn unsound_copy(
        mut rd: impl std::io::Read
        mut wr: impl std::io::Write,
    ) -> std::io::Result<()> {
      loop {
        let mut buf = [MaybeUninit::<u8>::uninit(); 4096];
        // SAFETY: This is unsound.
        // For demonstration purposes only.
        let buf = unsafe {
          &mut *(&mut buf as *mut [_] as *mut [u8])
        };
        let read = rd.read(buf)?;
        if read == 0 {
          break Ok(());
        }
        wr.write_all(&buf[..read])?;
      }
    }

    While replacing the array of zeros by an array of uninitialised values may work in specific circumstances, the code is unsound. Change to the compiler, its options, modification of unrelated parts of the code or using the function for a different Read trait implementation may break the program in unpredictable ways.

    The BorrowedBuf

    The solution in nightly Rust is the BorrowedBuf struct. It’s a bytes slice which remembers how much of it has been initialised. It doesn’t own the memory and operates on a borrowed buffer (hence the name). It can point at an array on the stack or a slice living on the heap (such as Vec’s spare capacity). A naïve use of the feature is the following:

    #![feature(core_io_borrowed_buf, read_buf)]
    
    use core::io::BorrowedBuf;
    use core::mem::MaybeUninit;
    
    fn almost_good_copy(
        mut rd: impl std::io::Read,
        mut wr: impl std::io::Write,
    ) -> std::io::Result<()> {
      loop {
        let mut buf = [MaybeUninit::uninit(); 4096];
        let mut buf = BorrowedBuf::from(&mut buf[..]);
        rd.read_buf(buf.unfilled())?;
        if buf.len() == 0 {
          break Ok(());
        }
        wr.write_all(buf.filled())?;
      }
    }

    Issues with the BorrowedBuf

    While almost_good_copy appears to work as expected, BorrowedBuf isn’t without its share of problems. I describe them below.

    Optionality

    The BorrowedBuf does not seamlessly integrate with existing Rust code. In fact quite the opposite. APIs need to support it explicitly. For example, many third-party Read implementations do not provide read_buf method. In its absence, the default version initialises the memory and calls read negating any potential benefits of BorrowedBuf.

    Similarly, functions which take output slice as an argument — such as rand’s RngCore::fill_bytes — could benefit from being able to write to uninitialised memory. However, to offer that benefit, they need to be changed to support BorrowedBuf. A motivated programmer can try adding necessary support to actively maintained packages, like rand, but what if one is stuck at an older version of the crate or deals with apparently abandoned crates like hex or base64. To support those cases, forking would be necessary leading the programmer towards deeper circles of dependency hell.

    Ergonomics

    Then again, should functions such as fill_bytes integrate with BorrowedBuf in the first place instead of taking an &mut [MaybeUninit<u8>] argument? The issue with the latter is that there’s no safe way to convert &mut [T] into &mut [MaybeUninit<T>].1 As such, users who so far happily used such functions with regular initialised buffers would need convoluted incantation to make their previously straightforward code to compile. Meanwhile, creating BorrowedBuf is somewhat convenient and can be done from initialised and uninitialised buffers alike.

    Lack of generality

    In addition to RngCore::fill_bytes, the rand crate offers Rng::fill method which fills a generic slice of integers with random data. It could easily work with BorrowedBuf except that the struct works on u8 slices only. As a result, a crate which deals with different types cannot consistently use BorrowedBuf.

    I don’t know the reasons why BorrowedBuf is not generic. It’s possible that its design focused on addressing the the Read trait use case only. Complications around dealing with Drop types could have been a contributing factor. However, even then the type could be generic on Copy types.

    Easy of misuse

    Read::read_buf being optional brings another problem. Without full understanding of the behaviour and interactions of the BorrowedBuf type, it’s easy to misuse it such as in almost_good_copy. One can be excused from assuming that the function eliminates unnecessary initialisation. It declares an uninitialised region, wraps it in BorrowedBuf and reads data into it. Even inspection of the assembly output shows lack of the memset call.

    Alas, while almost_good_copy avoids memory initialisation when reading data from a File, it wastes time zeroing the buffer when, for example, decompressing data with help of flate2 crate (which does not offer custom read_buf method) effectively becoming a slow_copy.

    Unless the underlying type is known, the programmer must assume that read_buf may resort to filling the memory. The proper use of BorrowedBuf is to construct it only once so that it can remember that the memory has been initialised.

    #![feature(core_io_borrowed_buf, read_buf)]
    
    use core::io::BorrowedBuf;
    use core::mem::MaybeUninit;
    
    fn copy(
      mut rd: impl std::io::Read,
      mut wr: impl std::io::Write,
    ) -> std::io::Result<()> {
      let mut buf = [MaybeUninit::uninit(); 4096];
      let mut buf = BorrowedBuf::from(&mut buf[..]);
      loop {
        buf.clear();
        rd.read_buf(buf.unfilled())?;
        if buf.len() == 0 {
          break Ok(());
        }
        wr.write_all(buf.filled())?;
      }
    }

    Complexity

    With BorrowedBuf’s complexity it’s not hard to imagine why people might use it in inefficient way. The struct is harder to understand than the unsound casting in unsound_copy. This may lead people to use the more straightforward option even if it’s not correct. An analogy to a Vec<u8> with its contents and spare capacity partially helps — a BorrowedBuf has analogous filled and unfilled parts — but is an oversimplified view. A BorrowedBuf is also split into initialised and uninitialised parts. The documentation visualises it as follows:

    Capacity
    FilledUnfilled
    InitialisedUninitialised

    There are reasons for this madness. Consider loop in the copy function above. If BorrowedBuf only knew how much of it was filled, each call to buf.clear() would lose the information about memory being initialised. In the default implementation of read_buf it would need to unnecessarily zero the whole buffer. Separately storing information about how much of the buffer has been filled and initialised, let the type avoid double-initialisation of memory.

    Alternative model

    As an aside, I find modelling BorrowedBuf as divided into filled and spare capacity with spare capacity further divided into initialised and uninitialised as more intuitive. Leaning into the analogy of Vec is in my opinion more natural and it helps by reinforcing terminology used in existing parts of the language rather than introducing new models.

    Capacity
    FilledSpare capacity
    InitialisedUninitialised

    What people want

    Having looked at issues with BorrowedBuf, let’s consider what people actually want.2 The easiest mental model is that uninitialised memory stores arbitrary data, unknown unless accessed. To achieve such semantics, the uninitialised memory would need to be frozen. A frozen region becomes safe to read and can be accessed through regular Rust references. With freezing operation available, the buffer definition in the copying routine could be turned into something like:

      let mut buf = [MaybeUninit::uninit(); 4096];
      // SAFETY: u8 has no invalid bit patterns.
      let buf = unsafe {
        MaybeUninit::slice_freeze_mut(&mut buf)
      };

    Or alternatively:

      let buf = MaybeUninit::frozen();
      // SAFETY: u8 has no invalid bit patterns.
      let mut buf: [u8; 4096] = unsafe { buf.assume_init() };

    Unsafe blocks are required to account for invalid bit patterns. With a trait like bytemuck::AnyBitPattern, a safe versions could exist. Either of those alternatives would require no new methods on the Read trait and would work without any modifications on methods such as rand’s fill_bytes.

    Why can’t we have what we want?

    Reading uninitialised memory is hardly an issue when analysing things on hardware level. So long as a memory address is mapped with proper permissions, accessing data from it will always produce some value. There’s no undefined behaviour there.3 In fact, in typical Linux environment all newly allocated anonymous pages are zero-initialised.4

    tautology:
      cmp  BYTE PTR [rdi], 0
      je   tautology_ok
      cmp  BYTE PTR [rdi], 0
      jne  tautology_ok
      mov  al, 0
      ret
    tautology_ok:
      mov  al, 1
      ret
    An x86 assembly function which checks whether value in memory is zero or non-zero. This seemingly tautological test can fail when operating on a memory page marked with MADV_FREE and the kernel changes the mapping in between the two memory reads.

    Unfortunately, even when looking from the point of view of machine code, this analysis isn’t complete…

    Giving advice about use of memory

    MADV_FREE flag of the madvise system call allows user space to advise the kernel that (until next write) it no longer cares about contents of specified anonymous pages. This optimisation enables the kernel to discard those pages without swapping them to disk. While the advice is in effect, the user space can access the memory, but has no guarantee whether it’ll read the old values or zeros. Even code written directly in assembly language, like the tautolagy function on the right can result in unexpected behaviour.

    This isn’t a theoretical concern either. jemalloc, a somewhat popular memory allocator, uses MADV_FREE when memory is freed. As a result, new allocations returned from the allocator may point to region of memory where the MADV_FREE advice is in effect. Nicholas Ormrod, in his talk about C++ std::string at Facebook, describes how interaction between jemalloc, MADV_FREE and reading uninitialised memory resulted in outages.

    Page touching

    To prevent this issue, the proposed slice_freeze_mut function would need to write into each page of the slice to make sure the kernel notices that the program cares about contents of the page again. This could be a simple loop stepping 4 KiB at a time and look something like the following:

    pub unsafe fn slice_freeze_mut<T>(
      slice: &mut [MaybeUninit<T>]
    ) -> &mut [T] {
      const PAGE_SIZE: usize = 4096;
      let ptr = slice.as_mut_ptr() as *mut _;
      let len = slice.len() * size_of::<T>();
      // SAFETY: It’s always safe to split MU object into MU bytes.
      let bytes: &mut [MaybeUninit<u8>] = unsafe {
        core::slice::from_raw_parts(ptr, len);
      };
      for el in bytes.iter_mut().step_by(PAGE_SIZE) {
        let p = el.as_mut_ptr();
        // SAFETY: Unsafe without language semantics change
        // since we’re reading uninitialised byte.
        unsafe { p.write_volatile(p.read()) };
      }
      // SAFETY: Caller promises that T has no invalid bit patterns,
      // but this is still unsafe without language semantics change
      // since we haven’t initialised all the bytes.
      unsafe { &mut *(slice as *mut _ as *mut [T]) }
    }

    Unfortunately, this would hardly be the no-operation that people expect from writing into uninitialised memory. It would be an improvement over a full initialisation and would address some issues with BorrowedBuf but would do that at the cost of unavoidable page touching.

    It may seem that the second form — the MaybeUninit::frozen().assume_init() variant — which creates frozen buffer directly on stack could be easier to optimise. The compiler controls the stack and unless it issues madvise, no stack pages will be marked MADV_FREE. Unfortunately it’s not clear that always hold true. For example, with async programming the stack lives God-knows-where and there may be other corner cases that would need to be considered.

    Conclusion

    I started this article with a promise of some alternatives to BorrowedBuf and yet, as I conclude it, no working alternative is presented. Indeed, this is perhaps what frustrates me the most about the BorrowedBuf. On the face of it, writing data into uninitialised memory is a feature with an obvious solution, but it doesn’t take long before all the obvious solutions clash with Rust’s safety requirements.

    So what’s a lowly programmer to do? Donald Knuth is often quoted as stating that ‘premature optimisation is the root of all evil’. True to that adage, in most cases it’s safe to pay the price of the memory initialisation. I/O operations usually take orders of magnitude more time so the time saved not initialising the memory is often negligible.

    But there is more to Knuth’s quote:

    We should forget about small efficiencies, say about 97% of the time: premature optimisation is the root of all evil.

    Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.5

    For the remaining 3%, the options now are somewhat bleak and depend on the particular code base. They may require switching to nightly compiler, patching third-party crates, going straight to doing unsafe syscalls (e.g. read) or isolating critical code paths and writing them in C.

    And while we deal with the lack of ideal solution for writing to uninitialised memory, maybe someone will figure out some alternative fast and ergonomic approach.

    1 The reference conversion itself is safe since all possible values of type T are valid values of type MaybeUninit<T> and both those types have the same layout. However, the latter allows writing arbitrary data into the object which may result in invalid representation of T (see playground demonstration). With interior mutability, even converting shared references may lead to issues.

    2 I am aware that I presumptuously speak for everyone. However, I do believe that alternatives presented here, if they existed, would be favoured by everyone and that includes contributors to the BorrowedBuf struct. As I discuss later, the type is the way it is not because that’s what anyone finds appealing but due to other constraints.

    3 Semantics that reading uninitialised memory has arbitrary but consistent value can be useful in practice. Briggs and Torczon describe in An efficient representation for sparse sets an algorithm which is built on such semantics.

    4 The atypical environment is µClinux which runs on platforms without memory management unit (MMU). It supports MAP_UNINITIALIZED option which skips zeroing of the memory region. However, even with that flag, allocated pages maintain consistent state.

    5 Donald E. Knuth. 1974. Structured Programming with go to Statements. ACM Computing Surveys. Vol. 6, Issue 4 (Dec. 1974), 261–301. doi:10.1145/356635.356640.