c# - WeCantSpell.Hunspell reports individual characters as correctly spelled words - Stack Overflow

I need to spell check a very large number of sentences and return those without errors. I have tried bo

I need to spell check a very large number of sentences and return those without errors. I have tried both Hunspell and WeCantSpell.Hunspell. In both I find that all individual characters "b", "M" , "Q", etc. appearing alone in a sentence, surrounded by whitespace, are allowed and pass spell check as valid words. I'm using out-of-the box English dictionary files en_us.dic and en_us.aff. The dictionary file contains, I can't imagine why, individual entries for every character in the alphabet. I expected to be able to "fix" this by simply removing those entries, but that doesn't seem to have any effect. I started by removing the entry "B/MNT" from the .dic file, but both "b" and "B" continue to pass spell check. I have tried working with and without the en_us.aff file. (I have looked at a Hunspell man file to get my head around how the flags work, but I find the examples a bit hard to follow.)

My code and output follows:

using WeCantSpell.Hunspell;
internal static HashSet<string> TestMutations(HashSet<string> mutations)
{
    HashSet<string> survivors = new HashSet<string>();
    
    var dictionary = WordList.CreateFromFiles("en_us.dic");
    {
        foreach (string mutation in mutations)
        {
            bool allCorrect = true;
            string[] words = mutation.Split(' ');

            foreach (string word in words)
            {
                if (!dictionary.Check(word)) 
                { 
                    allCorrect = false;
                    break;
                }                    
            }
            if (allCorrect)survivors.Add(mutation);
        }
    }

    return survivors;
}

And a simple test with it's output:

Console.WriteLine("Testing Spell Check with Can't Spell:");

HashSet<string> sentences = new HashSet<string>();
sentences.Add("This is all spelled correctly.");
sentences.Add("Here arre some missspellings.");
sentences.Add("To be or not to b.");
sentences.Add("This B should not survive.");
sentences.Add("This sentence is also good.");


Console.WriteLine("The following sentences are being tested by spell check:");
foreach(string sentence in sentences)
{
    Console.WriteLine(sentence);
}

HashSet<string> survivors = Simulation.TestMutations(sentences);
Console.WriteLine();
Console.WriteLine("These are the survivors of the spell check:");

foreach(string survivor in survivors)
{
    Console.WriteLine(survivor);
}


Testing Spell Check with Can't Spell:
The following sentences are being tested by spell check:
This is all spelled correctly.
Here arre some missspellings.
To be or not to b.
This B should not survive.
This sentence is also good.

These are the survivors of the spell check:
This is all spelled correctly.
To be or not to b.
This B should not survive.
This sentence is also good.

Here is an excerpt from my modified dictionary file, showing the missing entry for B:

Azov/M
Aztec/SM
Aztecan/M
Aztlan/M
BA/M
BASIC/SM
BB/M
BBB/M

Thanks for any suggestions, including, perhaps especially, docs or tutorials about how these dictionary files work as I expect to need to tweak other spell check behaviors for this app.

I need to spell check a very large number of sentences and return those without errors. I have tried both Hunspell and WeCantSpell.Hunspell. In both I find that all individual characters "b", "M" , "Q", etc. appearing alone in a sentence, surrounded by whitespace, are allowed and pass spell check as valid words. I'm using out-of-the box English dictionary files en_us.dic and en_us.aff. The dictionary file contains, I can't imagine why, individual entries for every character in the alphabet. I expected to be able to "fix" this by simply removing those entries, but that doesn't seem to have any effect. I started by removing the entry "B/MNT" from the .dic file, but both "b" and "B" continue to pass spell check. I have tried working with and without the en_us.aff file. (I have looked at a Hunspell man file to get my head around how the flags work, but I find the examples a bit hard to follow.)

My code and output follows:

using WeCantSpell.Hunspell;
internal static HashSet<string> TestMutations(HashSet<string> mutations)
{
    HashSet<string> survivors = new HashSet<string>();
    
    var dictionary = WordList.CreateFromFiles("en_us.dic");
    {
        foreach (string mutation in mutations)
        {
            bool allCorrect = true;
            string[] words = mutation.Split(' ');

            foreach (string word in words)
            {
                if (!dictionary.Check(word)) 
                { 
                    allCorrect = false;
                    break;
                }                    
            }
            if (allCorrect)survivors.Add(mutation);
        }
    }

    return survivors;
}

And a simple test with it's output:

Console.WriteLine("Testing Spell Check with Can't Spell:");

HashSet<string> sentences = new HashSet<string>();
sentences.Add("This is all spelled correctly.");
sentences.Add("Here arre some missspellings.");
sentences.Add("To be or not to b.");
sentences.Add("This B should not survive.");
sentences.Add("This sentence is also good.");


Console.WriteLine("The following sentences are being tested by spell check:");
foreach(string sentence in sentences)
{
    Console.WriteLine(sentence);
}

HashSet<string> survivors = Simulation.TestMutations(sentences);
Console.WriteLine();
Console.WriteLine("These are the survivors of the spell check:");

foreach(string survivor in survivors)
{
    Console.WriteLine(survivor);
}


Testing Spell Check with Can't Spell:
The following sentences are being tested by spell check:
This is all spelled correctly.
Here arre some missspellings.
To be or not to b.
This B should not survive.
This sentence is also good.

These are the survivors of the spell check:
This is all spelled correctly.
To be or not to b.
This B should not survive.
This sentence is also good.

Here is an excerpt from my modified dictionary file, showing the missing entry for B:

Azov/M
Aztec/SM
Aztecan/M
Aztlan/M
BA/M
BASIC/SM
BB/M
BBB/M

Thanks for any suggestions, including, perhaps especially, docs or tutorials about how these dictionary files work as I expect to need to tweak other spell check behaviors for this app.

Share Improve this question asked Mar 21 at 18:17 Steve PenceSteve Pence 35 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

Editor here: All letters ARE considered words, have real-world dictionary entries, and of course are correctly spelled.

While a letter by itself may sometimes be a MISTAKE, it is not a misspelling.

e.g.

The letter A is the first letter of the alphabet.

In this problem, solve for x.

If you want to get a warning about letters that you think would likely be mistakes if found alone in your context, maybe just write your own test that looks for those letters by themselves and presents them as warnings about likely typos.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744339926a4569342.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信