Given the untyped_sequence
and int_sequence
below:
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence
member, then mutating the int
data using the untyped_sequence
member?
- If yes - why?
- If no - why?
GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.
Minimal runnable example ():
#include <string.h>
#include <stdio.h>
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
typedef union {
int_sequence typed;
untyped_sequence untyped;
} sequence;
void untyped_zero_first(untyped_sequence untyped) {
memset(untyped.data, 0, untyped.size * untyped.item_size);
}
int main(void) {
int ints[4] = {1, 2, 3, 4};
sequence s = {
.typed.data = ints,
.typed.size = 4,
.typed.item_size = sizeof(int)
};
untyped_zero_first(s.untyped);
// prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}
Given the untyped_sequence
and int_sequence
below:
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
QUESTION: Is it UB to put them as two union members, initialize an instance of that union using the int_sequence
member, then mutating the int
data using the untyped_sequence
member?
- If yes - why?
- If no - why?
GCC, Clang and MSVC give no warnings about this, but that doesn't necessarily mean anything.
Minimal runnable example (https://godbolt./z/PT6ahh4qq):
#include <string.h>
#include <stdio.h>
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
typedef union {
int_sequence typed;
untyped_sequence untyped;
} sequence;
void untyped_zero_first(untyped_sequence untyped) {
memset(untyped.data, 0, untyped.size * untyped.item_size);
}
int main(void) {
int ints[4] = {1, 2, 3, 4};
sequence s = {
.typed.data = ints,
.typed.size = 4,
.typed.item_size = sizeof(int)
};
untyped_zero_first(s.untyped);
// prints "0, 0, 0, 0" for GCC, Clang, MSVC - but is ut UB?
printf("%d, %d, %d, %d\n", ints[0], ints[1], ints[2], ints[3]);
}
Share
Improve this question
asked Nov 17, 2024 at 14:34
Johann GerellJohann Gerell
25.7k11 gold badges76 silver badges126 bronze badges
7
|
Show 2 more comments
2 Answers
Reset to default 4Is this union pointer member type punning UB in C?
Yes, in that the language spec does not define the behavior (as opposed to explicitly declaring it undefined).
Unlike C++, C does not have a sense of an "active" member of a union. Accessing a different member than was initialized or last stored does not, in and of itself, produce undefined behavior. Since C17, the behavior is not even implementation-defined. You can just do it, which involves (as a note in the spec clarifies) reinterpreting the appropriate part of the stored value according to the type of the accessed member.
But in your particular case, that's not enough. C does not require that the size and representation of type void *
be the same as the size and representation of type int *
. As far as the spec is concerned, there is no telling, at the point where your example code calls untyped_zero_first(s.untyped)
, what s.untyped.data
points to. It might even be a trap representation if your implementation's void *
representation affords those.
In practice, you're unlikely to run into a modern platform in which different object pointer types in fact do have different size or representation, so your code is likely to work as intended, but C does not guarantee that.
- The pointers and other fields union punning is implementation defined.
Union Type-Punning Exception (C11, Section 6.5.2.3, Paragraph 3):
"A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit-field, to the unit in which it resides), and vice versa."
"If the member used to access the contents of a union object is not the same as the member last stored into, the behavior is implementation-defined."
- Using the pointers (it may invoke UB)
Effective Type Rule (C11, Section 6.5, Paragraph 7):
"An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object,
- a qualified version of a type compatible with the effective type of the object,
- a type that is the signed or unsigned type corresponding to the effective type of the object,
- a character type."
Strict Aliasing Rule (C11, Section 6.5, Paragraph 7):
- "An object shall have its stored value accessed only by an lvalue expression that has one of the following types: a type compatible with the effective type of the object..."
Answering in a few words:
- union type punning is implementation defined
- using the pointers depends on the referenced objects and pointer types. It may invoke undefined behaviour (UB)
Example invoking and not invoking UB assuming assuming the correctness if the implementation.
typedef struct {
void* data; // first item
size_t size; // number of items
size_t item_size; // item byte size
} untyped_sequence;
typedef struct {
int* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} int_sequence;
typedef struct {
float* data; // first int
size_t size; // number of ints
size_t item_size; // int byte size
} float_sequence;
typedef union {
int_sequence typed;
untyped_sequence untyped;
float_sequence floatseq;
} sequence;
void untyped_zero_first(untyped_sequence untyped) {
memset(untyped.data, 0, untyped.size * untyped.item_size);
}
int main(void) {
int ints[4] = {1, 2, 3, 4};
//no UB here
sequence s =
{
.typed.data = ints,
.typed.size = 4,
.typed.item_size = sizeof(int)
};
untyped_zero_first(s.untyped);
printf("%d, %d, %d, %d\n", s.typed.data[0], s.typed.data[1], s.typed.data[2], s.typed.data[3]);
//UB
printf("%f, %f, %f, %f\n", s.floatseq.data[0], s.floatseq.data[1], s.floatseq.data[2], s.floatseq.data[3]);
}
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745630467a4637077.html
void *
can be just converted to anint *
easily. – KamilCuk Commented Nov 17, 2024 at 15:25void *
andint *
may differ in size, code risks UB. Considervoid *
not fully well defined whenint *
is smaller. – chux Commented Nov 17, 2024 at 15:26untyped_sequence
andint_sequence
could differ in size. – chux Commented Nov 17, 2024 at 15:39