rust - How to create uninitialized Vec - Stack Overflow

I've written some code that reads a binary file into a Vec. I did it using Vec::with_capacity() an

I've written some code that reads a binary file into a Vec. I did it using Vec::with_capacity() and unsafe { ...set_len(); }. It works, but clippy complains and the official docs claim that reading into an unintialized vec is undefined behavior. Since the compiler lets it happen, I suspect it's not really that undefined, and I can just shut up clippy with an #[allow(...)], but I'd like to do it "right" if I can. There's lots of examples of using MaybeUninit<> with variables and arrays, but not with Vec, and many of the examples jump through ridiculous hoops for what is a really simple thing. This is a big Vec, so initializing it is a totally unacceptable cost. So what's the "no cost abstraction" way of doing this?

lazy_static! {
/// Map from perfect hash to equivalence class
pub static ref OJP_HIGH_MP5_TABLE_1: Vec<u16> = {
    let mut path: PathBuf = cargo_home().unwrap();
    path.push("onejoker");
    path.push("ojp_high_mp5_table_1.bin.gz");

    let file = fs::File::open(&path).unwrap();
    let mut reader = BufReader::new(file);
    let mut decoder = GzDecoder::new(&mut reader);

    let mut bytes: Vec<u8> = Vec::with_capacity((MP5_SIZE + 1) * 2);
    #[allow(clippy::uninit_vec)]
    unsafe { bytes.set_len((MP5_SIZE + 1) * 2) };
    let mut buf: [MaybeUninit<u8>; IO_BUF_SIZE] =
    unsafe { MaybeUninit::uninit().assume_init() };
    let mut start: usize = 0;

    loop {
        let n = decoder.read(
            unsafe { &mut *(buf.as_mut_ptr() as *mut [u8; IO_BUF_SIZE]) }
        ).unwrap();
        if n == 0 {
            break;
        }
        bytes[start..start + n].copy_from_slice(
            unsafe { &*(buf.as_ptr() as *const [u8; IO_BUF_SIZE]) }[..n].as_ref()
        );
        start += n;
    }
    assert_eq!(start, (MP5_SIZE + 1) * 2);

    let len = bytes.len() / 2;
    let capacity = bytes.capacity() / 2;
    let ptr = bytes.as_mut_ptr() as *mut u16;
    std::mem::fet(bytes);    // prevents deallocation of the vec

    unsafe { Vec::from_raw_parts(ptr, len, capacity) }
};

}

I've written some code that reads a binary file into a Vec. I did it using Vec::with_capacity() and unsafe { ...set_len(); }. It works, but clippy complains and the official docs claim that reading into an unintialized vec is undefined behavior. Since the compiler lets it happen, I suspect it's not really that undefined, and I can just shut up clippy with an #[allow(...)], but I'd like to do it "right" if I can. There's lots of examples of using MaybeUninit<> with variables and arrays, but not with Vec, and many of the examples jump through ridiculous hoops for what is a really simple thing. This is a big Vec, so initializing it is a totally unacceptable cost. So what's the "no cost abstraction" way of doing this?

lazy_static! {
/// Map from perfect hash to equivalence class
pub static ref OJP_HIGH_MP5_TABLE_1: Vec<u16> = {
    let mut path: PathBuf = cargo_home().unwrap();
    path.push("onejoker");
    path.push("ojp_high_mp5_table_1.bin.gz");

    let file = fs::File::open(&path).unwrap();
    let mut reader = BufReader::new(file);
    let mut decoder = GzDecoder::new(&mut reader);

    let mut bytes: Vec<u8> = Vec::with_capacity((MP5_SIZE + 1) * 2);
    #[allow(clippy::uninit_vec)]
    unsafe { bytes.set_len((MP5_SIZE + 1) * 2) };
    let mut buf: [MaybeUninit<u8>; IO_BUF_SIZE] =
    unsafe { MaybeUninit::uninit().assume_init() };
    let mut start: usize = 0;

    loop {
        let n = decoder.read(
            unsafe { &mut *(buf.as_mut_ptr() as *mut [u8; IO_BUF_SIZE]) }
        ).unwrap();
        if n == 0 {
            break;
        }
        bytes[start..start + n].copy_from_slice(
            unsafe { &*(buf.as_ptr() as *const [u8; IO_BUF_SIZE]) }[..n].as_ref()
        );
        start += n;
    }
    assert_eq!(start, (MP5_SIZE + 1) * 2);

    let len = bytes.len() / 2;
    let capacity = bytes.capacity() / 2;
    let ptr = bytes.as_mut_ptr() as *mut u16;
    std::mem::fet(bytes);    // prevents deallocation of the vec

    unsafe { Vec::from_raw_parts(ptr, len, capacity) }
};

}

Share edited Nov 22, 2024 at 5:58 Lee Daniel Crocker asked Nov 22, 2024 at 5:09 Lee Daniel CrockerLee Daniel Crocker 13.2k3 gold badges31 silver badges56 bronze badges 30
  • 2 Any reason why std::fs::read is not acceptable? – kmdreko Commented Nov 22, 2024 at 5:29
  • 3 “Since the compiler lets it happen, I suspect it's not really that undefined” You used an unsafe block, explicitly opting out of compiler protection from undefined behavior. (And when people talk about undefined behavior, it’s effectively always in reference to things that a compiler lets happen.) – Ry- Commented Nov 22, 2024 at 5:38
  • 2 @LeeDanielCrocker It's not exactly clear why pushing bytes onto a Vec doesn't work for this use case. This seems like a case of over-complicating what is actually a simple thing. – cdhowie Commented Nov 22, 2024 at 6:05
  • 2 @LeeDanielCrocker What makes you think pushing bytes onto a Vec will double-initialize it? – cdhowie Commented Nov 22, 2024 at 6:12
  • 2 Read::read_to_end seems to be exactly what you want to do here‽ No need for an uninitialized Vec. – cafce25 Commented Nov 22, 2024 at 6:28
 |  Show 25 more comments

1 Answer 1

Reset to default 1

As mentioned in the comments by cdhowie and cafce25, you’re mainly doing a Read::read_to_end. Your loop gets the same result as this safe Rust, but the safe Rust actually copies less (there’s no intermediate buf):

let file = fs::File::open(&path).unwrap();
let mut reader = BufReader::new(file);
let mut decoder = GzDecoder::new(&mut reader);

let mut bytes: Vec<u8> = Vec::with_capacity((MP5_SIZE + 1) * 2);
decoder.read_to_end(&mut bytes).unwrap();
assert_eq!(bytes.len(), (MP5_SIZE + 1) * 2);

Exposing this Vec<u8> as u16s is a little more involved, but you can still do it in safe Rust with a crate like bytemuck. A Vec<u8>’s storage only has 1-byte alignment, so as you brought up in a comment, you would have to allocate an extra byte of space to guarantee correctness:

// note: with a suitable error type,
// this can return `Result` and use `?` instead of `unwrap`
fn read_table<T, const N: usize>(mut source: impl Read) -> &'static [T; N]
where
    T: bytemuck::AnyBitPattern,
{
    use std::mem;

    let align = mem::align_of::<[T; N]>();
    let mut bytes: Vec<u8> = Vec::with_capacity(
        mem::size_of::<[T; N]>()
        + (align - 1)
    );

    // ensure actual data is aligned
    let align_offset = bytes.as_ptr().align_offset(align);
    bytes.resize(align_offset, 0);

    source.read_to_end(&mut bytes).unwrap();

    bytemuck::cast_slice(&bytes.leak()[align_offset..]).try_into().unwrap()
}

lazy_static! {

pub static ref OJP_HIGH_MP5_TABLE_1: &'static [u16; MP5_SIZE + 1] = {
    let mut path: PathBuf = cargo_home().unwrap();
    path.push("onejoker");
    path.push("ojp_high_mp5_table_1.bin.gz");

    let file = fs::File::open(&path).unwrap();
    let mut reader = BufReader::new(file);
    let decoder = GzDecoder::new(&mut reader);

    read_table(decoder)
};

}

playground

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1742296394a4417206.html

相关推荐

  • rust - How to create uninitialized Vec - Stack Overflow

    I've written some code that reads a binary file into a Vec. I did it using Vec::with_capacity() an

    23小时前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信