javascript - How to improve tesseract.js accuracy? - Stack Overflow

Im using this piece of code from the website but its not accurate enough const worker1 = createWorker()

Im using this piece of code from the website but its not accurate enough

 const worker1 = createWorker();
  const worker2 = createWorker();

  await worker1.load();
  await worker2.load();
  await worker1.loadLanguage("eng");
  await worker2.loadLanguage("eng");
  await worker1.initialize("eng");
  await worker2.initialize("eng");

  scheduler.addWorker(worker1);
  scheduler.addWorker(worker2);

  /** Add 10 recognition jobs */
  const {
    data: { text }
  } = await scheduler.addJob("recognize", image);

this is the type of image i'm trying to read its text:

thou it seems simple and easy ,sometimes tesseract fails to read it . is there any better alternatives to tesseract.js or any way to improve the accuracy?

Im using this piece of code from the website but its not accurate enough

 const worker1 = createWorker();
  const worker2 = createWorker();

  await worker1.load();
  await worker2.load();
  await worker1.loadLanguage("eng");
  await worker2.loadLanguage("eng");
  await worker1.initialize("eng");
  await worker2.initialize("eng");

  scheduler.addWorker(worker1);
  scheduler.addWorker(worker2);

  /** Add 10 recognition jobs */
  const {
    data: { text }
  } = await scheduler.addJob("recognize", image);

this is the type of image i'm trying to read its text:

thou it seems simple and easy ,sometimes tesseract fails to read it . is there any better alternatives to tesseract.js or any way to improve the accuracy?

Share Improve this question asked Dec 1, 2019 at 13:51 PayamB.PayamB. 7641 gold badge11 silver badges28 bronze badges 8
  • Have you tried applying some filtering on the input images, to enhance the contrast, for example or enlarge them? I think one way to get better accuracy, is to do some modifications on the input images. – Kostas Minaidis Commented Dec 1, 2019 at 13:53
  • 1 You can start with this post: docparser./blog/improve-ocr-accuracy Increasing contrast, image sharpening, removing noise are some basic image enhancements that might help you get better accuracy results. – Kostas Minaidis Commented Dec 1, 2019 at 14:12
  • 1 Additionally, you might want to check threshold filtering. See this code for example: github./laurenzcodes/Canvas-Threshold-Effect – Kostas Minaidis Commented Dec 1, 2019 at 14:14
  • 1 You can also dive deeper into edge detection algorithms, like the Sobel Algorithm or Canny Algorithm. – Kostas Minaidis Commented Dec 1, 2019 at 14:20
  • 1 I use a negative version of your image and it works fine. Also additional gamma correction looks promising. – Aikon Mogwai Commented Dec 1, 2019 at 18:11
 |  Show 3 more ments

2 Answers 2

Reset to default 3

When applying OCR using Tesseract, it is important to preprocess the image so that the desired text to detect is in black with the background in white. To do this, you can apply a simple threshold to obtain a binary image. Here's the image after preprocessing:

Result from Tesseract

52024

I implemented this approach in Python OpenCV, but you can adapt a similar strategy into Javascript!

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image and Otsu's Threshold to get a binary image
image = cv2.imread('1.png', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Perform OCR
data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.waitKey()

<!DOCTYPE html>
<html>
    
<head>
    <title>
        Auto captcha verification
    </title>
<script src='https://cdn.rawgit./naptha/tesseract.js/1.0.10/dist/tesseract.js'></script>
    
</head>

<body>
    <img id = "img" src="https://i.sstatic/lqJ12.png"
 />
    
    <div id = "GFG"></div>
    
    <!-- script for auto captcha verification -->
    <script>
        let progress = document.querySelector('#GFG');
        let img = document.querySelector('#img').src
        Tesseract.recognize(img)
        
        .progress(function(p) {
            progress.innerHTML += JSON.stringify(p) + "<br>"
        })
        
        .then(function(result) {
            var captcha = result.text.replace(/[^a-zA-Z0-9 ]/g, "");
            alert(captcha)
        })
    </script>
</body>

</html>              

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744286646a4566831.html

相关推荐

  • javascript - How to improve tesseract.js accuracy? - Stack Overflow

    Im using this piece of code from the website but its not accurate enough const worker1 = createWorker()

    7天前
    20

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信