c - What is an input item in scanf? - Stack Overflow

I'm learning C and trying to understand how scanf works. I can't understand some terms: the t

I'm learning C and trying to understand how scanf works. I can't understand some terms: the term "input item", "initial subsequence", "matching sequence". I am reading this .html. It says:

An input item shall be defined as the longest sequence of input bytes (up to any specified maximum field width, which may be measured in characters or bytes dependent on the conversion specifier) which is an initial subsequence of a matching sequence.

Assume that in format i have %d and in stdin 4.5. What is initial subsequence and what is matching sequence? And what is input item?

I thought that input item is what corresponds to this specifier, i.e. for %d corresonding symbols are numbers (maybe + - signs at the beginning), but then it says:

Except in the case of a % conversion specifier, the input item (or, in the case of a %n conversion specification, the count of input bytes) shall be converted to a type appropriate to the conversion character. If the input item is not a matching sequence, the execution of the conversion specification fails;

i.e. the input item may not correspond, which means it does not consist only of symbols that correspond to the specifier.

So, can you explain these terms to me? And tell me if this is where I'm looking for documentation for C functions? What sites are the best places to read documentation for C functions?

I'm learning C and trying to understand how scanf works. I can't understand some terms: the term "input item", "initial subsequence", "matching sequence". I am reading this https://pubs.opengroup./onlinepubs/9699919799/functions/scanf.html. It says:

An input item shall be defined as the longest sequence of input bytes (up to any specified maximum field width, which may be measured in characters or bytes dependent on the conversion specifier) which is an initial subsequence of a matching sequence.

Assume that in format i have %d and in stdin 4.5. What is initial subsequence and what is matching sequence? And what is input item?

I thought that input item is what corresponds to this specifier, i.e. for %d corresonding symbols are numbers (maybe + - signs at the beginning), but then it says:

Except in the case of a % conversion specifier, the input item (or, in the case of a %n conversion specification, the count of input bytes) shall be converted to a type appropriate to the conversion character. If the input item is not a matching sequence, the execution of the conversion specification fails;

i.e. the input item may not correspond, which means it does not consist only of symbols that correspond to the specifier.

So, can you explain these terms to me? And tell me if this is where I'm looking for documentation for C functions? What sites are the best places to read documentation for C functions?

Share Improve this question asked Mar 23 at 3:33 AkramatAkramat 1035 bronze badges
Add a comment  | 

4 Answers 4

Reset to default 7

A key problem this text is dealing with is that sometimes whether a sequence of characters matches the pattern for a conversion depends on characters not yet read. For example, the input text “0x3” matches the pattern for %x, but the input text “0xy” does not. At the point where we have read “0x”, “0x” by itself does not match the pattern, and we do not know whether the next character will form a matching sequence. We must read the next character to find out.

So the rule for when scanf continues reading characters cannot be “As long as the characters match the goal pattern, keep reading.” If that were the rule, we would read “0”, see that matches a possible valid input for %x, then read “x”, see that “0x” is not a valid input for %x, and stop. That will not work, because it fails to read “0x3”, which is a valid input for %x.

The rule must be that scanf continues reading as long as the input could be a matching sequence if the coming characters complete a match. A way to say this technically is the characters read so far are an initial subsequence of a matching sequence.

Assume that in format I have %d and in stdin 4.5. What is initial subsequence and what is matching sequence? And what is input item?

A matching sequence for %d is optionally a “-”, then one or more decimal digits. Consider the matching sequences “123” and “-123”. Initial subsequences of “123” are the empty string, “1”, “12”, and “123”. Initial subsequences of “-123” are the empty string, “-”, “-1”, “-12”, and “-123”. So an initial subsequence of a matching sequence for %d is optionally a “-”, then zero or more decimal digits. Note that “-” is an initial subsequence but not a matching sequence.

It is more interesting to consider %e, where “3.4e-5” is a matching sequence but “3.4,” or “3.4ex” is not. Now scanf has to read two characters where it is unknown whether there will be a match. When scanf has read “3.4”, that is a matching sequence, but it has to continue as long as the characters form an initial subsequence. Next we have “3.4e”, which is no longer a matching sequence but is still an initial subsequence. Then “3.4e-” is also an initial subsequence. With “3.4e-5” we once again have a matching sequence as well as an initial subsequence. scanf must continue reading. If the next character is a space, “3.4e-5 ” is not an initial subsequence, so the space is rejected (and is “put back” into the input stream as if it had never been read), and “3.4e-5” is the input item. This input item is a matching sequence, so it is converted to a float.

Now consider reading “3.4ex”. As above, at “3.4e”, scanf must continue reading. It gets the “x” and sees that “3.4ex” is not an initial subsequence. It rejects the “x” and puts it back into the input stream. Now it is done reading, and “3.4e” is the input item. This is not a matching sequence, so it is a matching failure. scanf does not perform a conversion for this, and it returns.

Note that with “3.4ex”, if we could put back two characters instead of just one, we could revert to “3.4”, match that for %e, and convert it to float. However, the C standard does not require I/O streams to support more than one character of put-back, and scanf is specified to work with only one character of put-back. This is why scanf is specified to read until a match is impossible, then put back the non-matching character, then see if what it did read is a match. If we had more levels of put-back, scanf could read until a match is impossible, then put back all the characters needed to reduce the input to a matching sequence, and then convert that.

(Note that, as discussed in the comments, some scanf implementations do not conform to this specification of the C standard and may put back multiple characters.)

What sites are the best places to read documentation for C functions?

There is no single site that is the best place, nor any small set of sites. The C standard is the most authoritative place to read about functions in the standard C library, but understanding them is informed by computer science theory and by history of the development of the C language. With this issue in scanf, theory about parsing, formal languages, and finite-state machines is informative about how a computer has to go about reading characters to interpret them. Those are parts of study of a computer science education, not something you easily get from narrowly focused web sites.

It's a little confusing.

Essentially, scanf reads bytes for a conversion specifier until the next byte couldn't possibly be part of what it's supposed to read for that conversion specifier. Then it looks at what it's read, and that's the input item for that conversion specifier. (Under the hood, it probably "unreads" the next byte with ungetc.)


"Matching sequence" means any sequence of characters that would match the conversion specification. It is not restricted to sequences of characters that actually appear in the input. For example, 0 and 0xff are both matching sequences for the conversion specification %x.

An "initial subsequence of a matching sequence" means any sequence of characters that could be the start of a matching sequence. For example, 0, 0x, and 0xff are all initial subsequences of the matching sequence 0xff for the conversion specification %x. They are also initial subsequences of many other matching sequences, like 0xff00 or 0xff1.

An input item is the longest sequence of input characters that's an initial subsequence of a matching sequence. For example, if the input is 0xgoose and scanf is processing a conversion specifier of %x, then the input item is 0x. It's not 0, even though 0 would be a matching sequence, and it's not anything longer than 0x, because nothing longer could be the start of a matching sequence.

matching sequence

is any sequence of characters that is a valid representation for the conversion specifier used. It does not refer to the actual input. It refers to all and any valid (aka matching) sequence.

For %d examples of matching sequences are 123456, 987654 and many, many more.

initial subsequence

is a part (or all) of another sequence starting from the left most character, i.e. initial character.

For instance, for the sequence 1234, initial subsequences are 1, 12, 123 and 1234

input item

is longest initial subsequence of a matching sequence. In other words, the characters that will be converted and then stored in the supplied variable.

For instance, for the sequence 1234Z and conversion specifier %d the longest intial subsequence of a matching sequence is 1234.

A test harness to help assess answers as compared to real code.

#include <assert.h>
#include <stdio.h>

int main() {
  const char *fname = "c:/tmp/junk1.txt";
  const char *s = "3.4ex";
  float x = 42.0f;  // Some value
  FILE *f = fopen(fname, "w");
  assert(f);
  fprintf(f, "%s", s);
  fclose(f);

  f = freopen(fname, "r", stdin);
  assert(f);
  int cnt = scanf("%f", &x);
  printf("Test:\"%s\", Conversion count:%d, value:%f", s, cnt, x);
  fflush(stdout);
  int ch = fgetc(f);
  printf(", Next character:%d %c\n", ch, ch);
  return 0;
}

Sample output: (Note that this output is not certainly compliant with the C spec.)

Test:"3.4ex", Conversion count:1, value:3.400000, Next character:101 e

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744298177a4567365.html

相关推荐

  • c - What is an input item in scanf? - Stack Overflow

    I'm learning C and trying to understand how scanf works. I can't understand some terms: the t

    7天前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信