javascript - How to improve tesseract.js accuracy? - Stack Overflow|江阴雨辰互联

Im using this piece of code from the website but its not accurate enough

 const worker1 = createWorker();
  const worker2 = createWorker();

  await worker1.load();
  await worker2.load();
  await worker1.loadLanguage("eng");
  await worker2.loadLanguage("eng");
  await worker1.initialize("eng");
  await worker2.initialize("eng");

  scheduler.addWorker(worker1);
  scheduler.addWorker(worker2);

  /** Add 10 recognition jobs */
  const {
    data: { text }
  } = await scheduler.addJob("recognize", image);

this is the type of image i'm trying to read its text:

thou it seems simple and easy ,sometimes tesseract fails to read it . is there any better alternatives to tesseract.js or any way to improve the accuracy?

Im using this piece of code from the website but its not accurate enough

 const worker1 = createWorker();
  const worker2 = createWorker();

  await worker1.load();
  await worker2.load();
  await worker1.loadLanguage("eng");
  await worker2.loadLanguage("eng");
  await worker1.initialize("eng");
  await worker2.initialize("eng");

  scheduler.addWorker(worker1);
  scheduler.addWorker(worker2);

  /** Add 10 recognition jobs */
  const {
    data: { text }
  } = await scheduler.addJob("recognize", image);

this is the type of image i'm trying to read its text:

thou it seems simple and easy ,sometimes tesseract fails to read it . is there any better alternatives to tesseract.js or any way to improve the accuracy?

Share Improve this question asked Dec 1, 2019 at 13:51 PayamB. 7641 gold badge11 silver badges28 bronze badges

Have you tried applying some filtering on the input images, to enhance the contrast, for example or enlarge them? I think one way to get better accuracy, is to do some modifications on the input images. – Kostas Minaidis Commented Dec 1, 2019 at 13:53
1 You can start with this post: docparser./blog/improve-ocr-accuracy Increasing contrast, image sharpening, removing noise are some basic image enhancements that might help you get better accuracy results. – Kostas Minaidis Commented Dec 1, 2019 at 14:12
1 Additionally, you might want to check threshold filtering. See this code for example: github./laurenzcodes/Canvas-Threshold-Effect – Kostas Minaidis Commented Dec 1, 2019 at 14:14
1 You can also dive deeper into edge detection algorithms, like the Sobel Algorithm or Canny Algorithm. – Kostas Minaidis Commented Dec 1, 2019 at 14:20
1 I use a negative version of your image and it works fine. Also additional gamma correction looks promising. – Aikon Mogwai Commented Dec 1, 2019 at 18:11

| Show 3 more ments

2 Answers 2

Sorted by: Reset to default 3

When applying OCR using Tesseract, it is important to preprocess the image so that the desired text to detect is in black with the background in white. To do this, you can apply a simple threshold to obtain a binary image. Here's the image after preprocessing:

Result from Tesseract

I implemented this approach in Python OpenCV, but you can adapt a similar strategy into Javascript!

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image and Otsu's Threshold to get a binary image
image = cv2.imread('1.png', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Perform OCR
data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.waitKey()

<!DOCTYPE html>
<html>
    
<head>
    <title>
        Auto captcha verification
    </title>
<script src='https://cdn.rawgit./naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>
    
</head>

<body>
    <img id = "img" src="https://i.sstatic/lqJ12.png"
 />
    
    <div id = "GFG"></div>
    
    <!-- script for auto captcha verification -->
    <script>
        let progress = document.querySelector('#GFG');
        let img = document.querySelector('#img').src
        Tesseract.recognize(img)
        
        .progress(function(p) {
            progress.innerHTML += JSON.stringify(p) + "<br>"
        })
        
        .then(function(result) {
            var captcha = result.text.replace(/[^a-zA-Z0-9 ]/g, "");
            alert(captcha)
        })
    </script>
</body>

</html>

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744286646a4566831.html

javascript - How to improve tesseract.js accuracy? - Stack Overflow

2 Answers 2

发表回复

评论列表（0条）

联系我们

400-800-8888

javascript - How to improve tesseract.js accuracy? - Stack Overflow

2 Answers 2

相关推荐