I was looking into GNU tr in bash on Debian Linux. The regex engine appears to have a [:lower:] and [:upper:] shorthand. The regex matches on "lowercase" and "uppercase" letters. The definition of these is not trivial: is Ñ
an uppercase letter? (Examples here.)
It seems to map to an "islower" function which is defined in the C language, somehow.
://cplusplus/reference/clibrary/cctype/islower/
Notice that what is considered a letter may depend on the locale being used; In the default C locale, a lowercase letter is any of: a b c d e f g h i j k l m n o p q r s t u v w x y z.
For a detailed chart on what the different ctype functions return for each character of the standard ANSII character set, see the reference for the header.
.c#L392
I can't find where islower is defined, perhaps within a specific C implementation (e.g. gcc
).
It also appears to depend on the "locale". Does this occur at compile time, or live in runtime? .html
I was looking into GNU tr in bash on Debian Linux. The regex engine appears to have a [:lower:] and [:upper:] shorthand. The regex matches on "lowercase" and "uppercase" letters. The definition of these is not trivial: is Ñ
an uppercase letter? (Examples here.)
It seems to map to an "islower" function which is defined in the C language, somehow.
https://en.cppreference/w/c/string/byte/islower
http://web.archive./web/20120308171350/https://cplusplus/reference/clibrary/cctype/islower/
Notice that what is considered a letter may depend on the locale being used; In the default C locale, a lowercase letter is any of: a b c d e f g h i j k l m n o p q r s t u v w x y z.
For a detailed chart on what the different ctype functions return for each character of the standard ANSII character set, see the reference for the header.
https://github/coreutils/coreutils/blob/1f0bf8d7c4b7131c6a8762de02ea01affef4db65/src/tr.c#L392
I can't find where islower is defined, perhaps within a specific C implementation (e.g. gcc
).
It also appears to depend on the "locale". Does this occur at compile time, or live in runtime? https://docs.oracle/cd/E19253-01/817-2521/overview-1002/index.html
Share Improve this question asked Nov 16, 2024 at 15:53 Atomic TripodAtomic Tripod 3462 silver badges9 bronze badges 5 |1 Answer
Reset to default 3The determination of lower case letters, per locale, is commonly determined before compile time.
localeconv()
[formatting of numeric quantities] allows the dynamic changing of some locale attributes, but not the determination of lower case.
The locale may change with char *setlocale(int category, const char *locale);
At program startup, the equivalent of setlocale(LC_ALL, "C");
is executed.
At least 2 locales are defined:
"C"
: A minimal C environment. This is defined in the spec with'a' - 'z'
, and nothing else, as lower case letters.""
: Implementation's native environment.
Some implementations allow for dozens of different locales. Some only have the minimal 2 - which might use the same determination of lower case letters - so no functional difference.
Thus the behavior of islower()
can change during a program's run.
Soapbox C's locale is an initial attempt to localize code to various country/culture standards. Yet it is cumbersome, inadequate and incurs troubles with multi-threading. Proceed with caution.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745653622a4638411.html
tr
has some definition ofislower
for any character I give it. How does it determine it? Is there an example implementation I could look at? – Atomic Tripod Commented Nov 16, 2024 at 15:59isupper
andislower
, then the source for those are available as well. – Some programmer dude Commented Nov 16, 2024 at 16:12