python - np.loadtxt skips unexpected lines when reading strings - Stack Overflow

I have a file structured as such:IID SuperPopsFGXXXXXXXXR6FGXXXXXXXXR12Containing 524123 lines.

I have a file structured as such:

IID SuperPops
FGXXXXXXXX  R6
FGXXXXXXXX  R12

Containing 524123 lines.

I need to read the first column as an array:

len(np.loadtxt('foo.txt',usecols=0,dtype=str))
524123

But if i also try to skip the header all of a sudden 10 lines go missing:

len(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))
524112

This does not happen if i change type though:

len(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1))
524122
len(np.loadtxt('foo.txt',usecols=0,dtype='<U10',skiprows=1))
524122

The 10 missing entries are all of length 10:

[len(elem) for elem in set(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1)) - set(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))]
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

What could be the reason and why is it specifically triggered by skiprows?

I have a file structured as such:

IID SuperPops
FGXXXXXXXX  R6
FGXXXXXXXX  R12

Containing 524123 lines.

I need to read the first column as an array:

len(np.loadtxt('foo.txt',usecols=0,dtype=str))
524123

But if i also try to skip the header all of a sudden 10 lines go missing:

len(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))
524112

This does not happen if i change type though:

len(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1))
524122
len(np.loadtxt('foo.txt',usecols=0,dtype='<U10',skiprows=1))
524122

The 10 missing entries are all of length 10:

[len(elem) for elem in set(np.loadtxt('foo.txt',usecols=0,dtype='<U20',skiprows=1)) - set(np.loadtxt('foo.txt',usecols=0,dtype=str,skiprows=1))]
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10]

What could be the reason and why is it specifically triggered by skiprows?

Share Improve this question asked Mar 15 at 15:29 user2834012user2834012 211 bronze badge 0
Add a comment  | 

1 Answer 1

Reset to default 2

This appears to be a bug with np.loadtxt: it loads the file in blocks of 50000 lines, and applies skiprows at each block instead of only at the beginning.

At 524K lines, you're getting 11 blocks, so 11 lines are skipped instead of just the first one.

A fix for this bug was merged a month ago, a week after NumPy 2.2.3 was published, so I would expect NumPy 2.2.4 to stop displaying this bug.

Refs:

  • issue on GitHub: https://github/numpy/numpy/issues/28315
  • merged PR fixing it: https://github/numpy/numpy/pull/28319

I figured this out by noticing that line 50002 got skipped if the file had more than 50001 lines, and then Googling "loadtxt 50000" brought me to the issue quoted above.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744611473a4583794.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信