c# - Check string for two consecutive letters followed by a lower case character - Stack Overflow

I have a collection of strings. I added a "." at the end of each in a foreach loop and concat

I have a collection of strings. I added a "." at the end of each in a foreach loop and concatenated them into a single string.

However, not all strings after concatenation need a "."

So now I have a long string where I want to remove unnecessary "." So I need to check for example "awefeaefe. efewgwe waggrgrgae. Weefwafewf ewfefw. Ewfewfgewgr. ewgfewg"

Where ". " is followed by a lower case character, for example "e", I want to delete the "."

Where ". " is followed by an upper case character, for example "E", do nothing.

I have tried creating a foreach (char c in parsedPara) loop to get every 3 characters and check each one, but it's missing 2 out of every 3 combinations of letters as it's running on characters consecutively and I also don't know how to get the index for the correct "." character in my original string from the loop if I find a combination anyway.

I have also tried creating a badString = ". " + char.toLower() but I don't have a character to put into .toLower. I know the 3rd character is going to be lowercase, but I don't know what character it will be.

Example as requested:

    public class AnalyzeImage
    {
        public async Task analyzeImage(string imageUri)
        {
            string endpoint = Environment.GetEnvironmentVariable("VISION_ENDPOINT");
            string key = Environment.GetEnvironmentVariable("VISION_KEY");

            ImageAnalysisClient client = new ImageAnalysisClient(new Uri(endpoint), new AzureKeyCredential(key));

            ImageAnalysisResult result = client.Analyze(new Uri(imageUri), VisualFeatures.Read, new ImageAnalysisOptions { GenderNeutralCaption = true });

            string cleanLine;
            string parsedPara = string.Empty;
            string miniString = string.Empty;

            int i = 0;
            foreach (DetectedTextBlock block in result.Read.Blocks)
            {
                foreach (DetectedTextLine line in block.Lines)
                {
                    cleanLine = line.Text.Replace("'", "");

                    if (!cleanLine.EndsWith(".") || !cleanLine.EndsWith(",") || !cleanLine.EndsWith("!") || !cleanLine.EndsWith("?") || !cleanLine.EndsWith("-"))
                    {
                        cleanLine += ". "; //Add period character to the end of strings missing a closing character.
                    }
                    if (cleanLine.EndsWith(".."))
                    {
                        cleanLine.Remove(cleanLine.Length - 1);
                    }

                    parsedPara += cleanLine; //Concatenate strings into a single string.

                    foreach (char c in parsedPara) //This is where I start trying to check every combination of 3 characters to identify any misplaced period characters mid sentence.
                    {
                        miniString = miniString + c;

                        if (miniString.Length > 2)
                        {
                            miniString = string.Empty;
                        } else if (miniString.Length == 3) 
                        {
                            char firstChar = miniString[0];
                            char secondChar = miniString[1];
                            char thirdChar = miniString[2];

                            if(firstChar.ToString() == "." && secondChar.ToString() == " " && char.IsLower(thirdChar))
                            {
                                Debug.WriteLine("HIT!");
                            }

                        }
                        i++;
                    }
                }
            }
            Debug.WriteLine("parsedPara: " + parsedPara);
        }
    }
}

I have a collection of strings. I added a "." at the end of each in a foreach loop and concatenated them into a single string.

However, not all strings after concatenation need a "."

So now I have a long string where I want to remove unnecessary "." So I need to check for example "awefeaefe. efewgwe waggrgrgae. Weefwafewf ewfefw. Ewfewfgewgr. ewgfewg"

Where ". " is followed by a lower case character, for example "e", I want to delete the "."

Where ". " is followed by an upper case character, for example "E", do nothing.

I have tried creating a foreach (char c in parsedPara) loop to get every 3 characters and check each one, but it's missing 2 out of every 3 combinations of letters as it's running on characters consecutively and I also don't know how to get the index for the correct "." character in my original string from the loop if I find a combination anyway.

I have also tried creating a badString = ". " + char.toLower() but I don't have a character to put into .toLower. I know the 3rd character is going to be lowercase, but I don't know what character it will be.

Example as requested:

    public class AnalyzeImage
    {
        public async Task analyzeImage(string imageUri)
        {
            string endpoint = Environment.GetEnvironmentVariable("VISION_ENDPOINT");
            string key = Environment.GetEnvironmentVariable("VISION_KEY");

            ImageAnalysisClient client = new ImageAnalysisClient(new Uri(endpoint), new AzureKeyCredential(key));

            ImageAnalysisResult result = client.Analyze(new Uri(imageUri), VisualFeatures.Read, new ImageAnalysisOptions { GenderNeutralCaption = true });

            string cleanLine;
            string parsedPara = string.Empty;
            string miniString = string.Empty;

            int i = 0;
            foreach (DetectedTextBlock block in result.Read.Blocks)
            {
                foreach (DetectedTextLine line in block.Lines)
                {
                    cleanLine = line.Text.Replace("'", "");

                    if (!cleanLine.EndsWith(".") || !cleanLine.EndsWith(",") || !cleanLine.EndsWith("!") || !cleanLine.EndsWith("?") || !cleanLine.EndsWith("-"))
                    {
                        cleanLine += ". "; //Add period character to the end of strings missing a closing character.
                    }
                    if (cleanLine.EndsWith(".."))
                    {
                        cleanLine.Remove(cleanLine.Length - 1);
                    }

                    parsedPara += cleanLine; //Concatenate strings into a single string.

                    foreach (char c in parsedPara) //This is where I start trying to check every combination of 3 characters to identify any misplaced period characters mid sentence.
                    {
                        miniString = miniString + c;

                        if (miniString.Length > 2)
                        {
                            miniString = string.Empty;
                        } else if (miniString.Length == 3) 
                        {
                            char firstChar = miniString[0];
                            char secondChar = miniString[1];
                            char thirdChar = miniString[2];

                            if(firstChar.ToString() == "." && secondChar.ToString() == " " && char.IsLower(thirdChar))
                            {
                                Debug.WriteLine("HIT!");
                            }

                        }
                        i++;
                    }
                }
            }
            Debug.WriteLine("parsedPara: " + parsedPara);
        }
    }
}
Share Improve this question edited Feb 12 at 21:23 DMur asked Feb 12 at 20:46 DMurDMur 65916 silver badges31 bronze badges 7
  • 1 Why not just add . to current string only if the following is uppercase? Also based on the description it would be better to use StringBuilder. And please provide some code - minimal reproducible example. – Guru Stron Commented Feb 12 at 21:04
  • Because the strings are being pulled from an image using OCR and when I loop through them (To add the ".") I dont know what the following strings are going to be yet. – DMur Commented Feb 12 at 21:05
  • 1 Then pull the "following" string before adding the "current" one. Or use StringBuilder and append . if current string is uppercase. – Guru Stron Commented Feb 12 at 21:07
  • I would read all the text into a string and use a regular expression to do the replacements. – Heretic Monkey Commented Feb 12 at 21:21
  • 2 It seems like you are focused on the period delimiter being APPENDED to each string in your collection when you are concatenating. But whether you need the delimiter is determined by some property of the string following the delimiter. So if you can shift to consider the delimiter as something PREPENDED to certain strings, you might find a simple solution. – Jeff Zola Commented Feb 12 at 21:24
 |  Show 2 more comments

1 Answer 1

Reset to default 2

Try using regex by matching:

\.(?=\s*[a-z])

and replacing with an empty string. See: regex101


Explanation

MATCH:

  • \.: Match a literal dot
  • (?= ... ): only if it is succeeded by
    • \s*: any amount of whitespace characters (change to if you only ever have a single space)
    • [a-z]: and a lowercase letter.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745203065a4616443.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信