node.js - tree-sitter size limitation (fails if code is >32kb) - Stack Overflow

I am trying to understand how tree-sitter works, but I ran into an issue when parsing "large"

I am trying to understand how tree-sitter works, but I ran into an issue when parsing "large" files. A file slightly over 32KB consistently causes tree-sitter to fail.

What I tested:

I wrote a minimal example, which demonstrates the issue. It simply generates fake java code which gets increasingly larger, until it fails. Here is that code:

const Parser = require('tree-sitter');
const Java = require('tree-sitter-java');

const parser = new Parser();
parser.setLanguage(Java);

for (let i = 1925; i < 1930; i += 1) {
    const largeSource = 'class A { ' + 'void method() {} '.repeat(i) + '}';
    try {
        const tree = parser.parse(largeSource);
        console.log(`Success: Source size: ${largeSource.length} bytes`);
    } catch (err) {
        console.error(`Error: Source size: ${largeSource.length} bytes`);
    }
}

Output:

Success: Source size: 32736 bytes
Success: Source size: 32753 bytes
Error: Source size: 32770 bytes
Error: Source size: 32787 bytes
Error: Source size: 32804 bytes

The same issue occurs with tree-sitter-javascript, failing at 32KB and above.

My Questions:

  • Is that expected behaviour? Am I missing something?
  • Is this size limitation a documented behavior? Can it be configured?

My Use Case

My goal is to create chunks of code and embedding the chunks to save them in a vecror database, so I can easily find things in a large codebase.

I could split the files into chunks before parsing, but I'd like to avoid that unless necessary.

Any insights would be appreciated!

I am trying to understand how tree-sitter works, but I ran into an issue when parsing "large" files. A file slightly over 32KB consistently causes tree-sitter to fail.

What I tested:

I wrote a minimal example, which demonstrates the issue. It simply generates fake java code which gets increasingly larger, until it fails. Here is that code:

const Parser = require('tree-sitter');
const Java = require('tree-sitter-java');

const parser = new Parser();
parser.setLanguage(Java);

for (let i = 1925; i < 1930; i += 1) {
    const largeSource = 'class A { ' + 'void method() {} '.repeat(i) + '}';
    try {
        const tree = parser.parse(largeSource);
        console.log(`Success: Source size: ${largeSource.length} bytes`);
    } catch (err) {
        console.error(`Error: Source size: ${largeSource.length} bytes`);
    }
}

Output:

Success: Source size: 32736 bytes
Success: Source size: 32753 bytes
Error: Source size: 32770 bytes
Error: Source size: 32787 bytes
Error: Source size: 32804 bytes

The same issue occurs with tree-sitter-javascript, failing at 32KB and above.

My Questions:

  • Is that expected behaviour? Am I missing something?
  • Is this size limitation a documented behavior? Can it be configured?

My Use Case

My goal is to create chunks of code and embedding the chunks to save them in a vecror database, so I can easily find things in a large codebase.

I could split the files into chunks before parsing, but I'd like to avoid that unless necessary.

Any insights would be appreciated!

Share Improve this question asked Mar 13 at 16:55 CodeBreakerCodeBreaker 1701 silver badge10 bronze badges
Add a comment  | 

1 Answer 1

Reset to default -1

This seems to work for me:

function parseLargeSource(source) {
    const chunkSize = 30000; // maximum length to return in one call

    const parsedTree = parser.parse((offset) => {            
        if (offset < source.length) {
            return source.slice(offset, offset + chunkSize);
        }
        return null;
    });

    return parsedTree;
}

It is generated by Deepseek R1, but I tested it and does exactly what I expect it to. I know that ai answers are not allowed here, but since nobody answered the question within a week and this solution works, I feel like it adds value for anyone having this issue in the future. (And it was not straight forward to get to that point, other ai models (claude 3.7 and o3 mini high) did not find a working solution, and it took many prompts to get there) However if mods feel like it makes sense to remove it, feel free to do so.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744689389a4588117.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信