I have a file structured as such:
IID SuperPops
FGXXXXXXXX R6
FGXXXXXXXX R12
Containing 524123 lines.
I need to read the first column as an array:
len(np.loadtxt('foo.txt',usecols=0,dtype=str))
524123
But if i also try to skip the header all of a sudden 10 lines go missing:
len(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))
524112
This does not happen if i change type though:
len(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1))
524122
len(np.loadtxt('foo.txt',usecols=0,dtype='<U10',skiprows=1))
524122
The 10 missing entries are all of length 10:
[len(elem) for elem in set(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1)) - set(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))]
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
What could be the reason and why is it specifically triggered by skiprows?
I have a file structured as such:
IID SuperPops
FGXXXXXXXX R6
FGXXXXXXXX R12
Containing 524123 lines.
I need to read the first column as an array:
len(np.loadtxt('foo.txt',usecols=0,dtype=str))
524123
But if i also try to skip the header all of a sudden 10 lines go missing:
len(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))
524112
This does not happen if i change type though:
len(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1))
524122
len(np.loadtxt('foo.txt',usecols=0,dtype='<U10',skiprows=1))
524122
The 10 missing entries are all of length 10:
[len(elem) for elem in set(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1)) - set(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))]
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]
What could be the reason and why is it specifically triggered by skiprows?
Share Improve this question asked Mar 15 at 15:29 user2834012user2834012 211 bronze badge 01 Answer
Reset to default 2This appears to be a bug with np.loadtxt
: it loads the file in blocks of 50000 lines, and applies skiprows
at each block instead of only at the beginning.
At 524K lines, you're getting 11 blocks, so 11 lines are skipped instead of just the first one.
A fix for this bug was merged a month ago, a week after NumPy 2.2.3 was published, so I would expect NumPy 2.2.4 to stop displaying this bug.
Refs:
- issue on GitHub: https://github/numpy/numpy/issues/28315
- merged PR fixing it: https://github/numpy/numpy/pull/28319
I figured this out by noticing that line 50002 got skipped if the file had more than 50001 lines, and then Googling "loadtxt 50000" brought me to the issue quoted above.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744611473a4583794.html
评论列表(0条)