I need to spell check a very large number of sentences and return those without errors. I have tried both Hunspell and WeCantSpell.Hunspell. In both I find that all individual characters "b", "M" , "Q", etc. appearing alone in a sentence, surrounded by whitespace, are allowed and pass spell check as valid words. I'm using out-of-the box English dictionary files en_us.dic and en_us.aff. The dictionary file contains, I can't imagine why, individual entries for every character in the alphabet. I expected to be able to "fix" this by simply removing those entries, but that doesn't seem to have any effect. I started by removing the entry "B/MNT" from the .dic file, but both "b" and "B" continue to pass spell check. I have tried working with and without the en_us.aff file. (I have looked at a Hunspell man file to get my head around how the flags work, but I find the examples a bit hard to follow.)
My code and output follows:
using WeCantSpell.Hunspell;
internal static HashSet<string> TestMutations(HashSet<string> mutations)
{
HashSet<string> survivors = new HashSet<string>();
var dictionary = WordList.CreateFromFiles("en_us.dic");
{
foreach (string mutation in mutations)
{
bool allCorrect = true;
string[] words = mutation.Split(' ');
foreach (string word in words)
{
if (!dictionary.Check(word))
{
allCorrect = false;
break;
}
}
if (allCorrect)survivors.Add(mutation);
}
}
return survivors;
}
And a simple test with it's output:
Console.WriteLine("Testing Spell Check with Can't Spell:");
HashSet<string> sentences = new HashSet<string>();
sentences.Add("This is all spelled correctly.");
sentences.Add("Here arre some missspellings.");
sentences.Add("To be or not to b.");
sentences.Add("This B should not survive.");
sentences.Add("This sentence is also good.");
Console.WriteLine("The following sentences are being tested by spell check:");
foreach(string sentence in sentences)
{
Console.WriteLine(sentence);
}
HashSet<string> survivors = Simulation.TestMutations(sentences);
Console.WriteLine();
Console.WriteLine("These are the survivors of the spell check:");
foreach(string survivor in survivors)
{
Console.WriteLine(survivor);
}
Testing Spell Check with Can't Spell:
The following sentences are being tested by spell check:
This is all spelled correctly.
Here arre some missspellings.
To be or not to b.
This B should not survive.
This sentence is also good.
These are the survivors of the spell check:
This is all spelled correctly.
To be or not to b.
This B should not survive.
This sentence is also good.
Here is an excerpt from my modified dictionary file, showing the missing entry for B:
Azov/M
Aztec/SM
Aztecan/M
Aztlan/M
BA/M
BASIC/SM
BB/M
BBB/M
Thanks for any suggestions, including, perhaps especially, docs or tutorials about how these dictionary files work as I expect to need to tweak other spell check behaviors for this app.
I need to spell check a very large number of sentences and return those without errors. I have tried both Hunspell and WeCantSpell.Hunspell. In both I find that all individual characters "b", "M" , "Q", etc. appearing alone in a sentence, surrounded by whitespace, are allowed and pass spell check as valid words. I'm using out-of-the box English dictionary files en_us.dic and en_us.aff. The dictionary file contains, I can't imagine why, individual entries for every character in the alphabet. I expected to be able to "fix" this by simply removing those entries, but that doesn't seem to have any effect. I started by removing the entry "B/MNT" from the .dic file, but both "b" and "B" continue to pass spell check. I have tried working with and without the en_us.aff file. (I have looked at a Hunspell man file to get my head around how the flags work, but I find the examples a bit hard to follow.)
My code and output follows:
using WeCantSpell.Hunspell;
internal static HashSet<string> TestMutations(HashSet<string> mutations)
{
HashSet<string> survivors = new HashSet<string>();
var dictionary = WordList.CreateFromFiles("en_us.dic");
{
foreach (string mutation in mutations)
{
bool allCorrect = true;
string[] words = mutation.Split(' ');
foreach (string word in words)
{
if (!dictionary.Check(word))
{
allCorrect = false;
break;
}
}
if (allCorrect)survivors.Add(mutation);
}
}
return survivors;
}
And a simple test with it's output:
Console.WriteLine("Testing Spell Check with Can't Spell:");
HashSet<string> sentences = new HashSet<string>();
sentences.Add("This is all spelled correctly.");
sentences.Add("Here arre some missspellings.");
sentences.Add("To be or not to b.");
sentences.Add("This B should not survive.");
sentences.Add("This sentence is also good.");
Console.WriteLine("The following sentences are being tested by spell check:");
foreach(string sentence in sentences)
{
Console.WriteLine(sentence);
}
HashSet<string> survivors = Simulation.TestMutations(sentences);
Console.WriteLine();
Console.WriteLine("These are the survivors of the spell check:");
foreach(string survivor in survivors)
{
Console.WriteLine(survivor);
}
Testing Spell Check with Can't Spell:
The following sentences are being tested by spell check:
This is all spelled correctly.
Here arre some missspellings.
To be or not to b.
This B should not survive.
This sentence is also good.
These are the survivors of the spell check:
This is all spelled correctly.
To be or not to b.
This B should not survive.
This sentence is also good.
Here is an excerpt from my modified dictionary file, showing the missing entry for B:
Azov/M
Aztec/SM
Aztecan/M
Aztlan/M
BA/M
BASIC/SM
BB/M
BBB/M
Thanks for any suggestions, including, perhaps especially, docs or tutorials about how these dictionary files work as I expect to need to tweak other spell check behaviors for this app.
Share Improve this question asked Mar 21 at 18:17 Steve PenceSteve Pence 35 bronze badges1 Answer
Reset to default 2Editor here: All letters ARE considered words, have real-world dictionary entries, and of course are correctly spelled.
While a letter by itself may sometimes be a MISTAKE, it is not a misspelling.
e.g.
The letter A is the first letter of the alphabet.
In this problem, solve for x.
If you want to get a warning about letters that you think would likely be mistakes if found alone in your context, maybe just write your own test that looks for those letters by themselves and presents them as warnings about likely typos.
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744339926a4569342.html
评论列表(0条)