Tesseract Ocr - Topic

bLink11

Member

Posts: 8,491

Joined: Sep 27 2021

Gold: 1,233.01

Dec 14 2023 03:51pm

can't get it to read the text in this box

tried having program convert it to greyscale first maybe to make more readable but still nothing

i've never used tesseract before am i retarded or does it just suck? pls advise

Candyzcanes

Member

Posts: 22,458

Joined: Dec 6 2008

Gold: 14.00

Trader: Trusted

Dec 14 2023 04:57pm

i've tried using tesseract ocr on many things that are very legible and i struggled greatly.
if you are using python this is what i got to sort of work

Code

import cv2
import pytesseract
import matplotlib.pyplot as plt

from difflib import SequenceMatcher

def similar(a, b):
return SequenceMatcher(None, a, b).ratio()

img = cv2.imread('resources/ammy.png')
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string(img)
spl_text = text.split("\n")
spl_text_two = ''
for sentence in spl_text:
spl_text_two.join(sentence.split())

print('spltwo' + spl_text_two)
for i in range(len(spl_text)+1):
for word in spl_text_two:
print(word)
print(str(i) + " " + spl_text[i])

this converted

to

DEATH ToRC
AMULET
ReguirReD LeveL: 52
+2 To LIGHTNING SKILLs (S@RcERESS ONLY)
*19% FASTER CAST RATE
+17 Te MANA
REGENERATE MANA 10%
CoLpD Resist +6%
LIGHTNING RESIST 9 6%
Fire Resist + 6%
P@IseN Resist: + 35%

...almost..lol

This post was edited by Candyzcanes on Dec 14 2023 05:00pm

bLink11

Member

Posts: 8,491

Joined: Sep 27 2021

Gold: 1,233.01

Dec 14 2023 06:09pm

Quote (Candyzcanes @ Dec 14 2023 05:57pm)

i've tried using tesseract ocr on many things that are very legible and i struggled greatly.
if you are using python this is what i got to sort of work

Code

this converted
https://i.imgur.com/gS5aS3C.png

to

DEATH ToRC
AMULET
ReguirReD LeveL: 52
+2 To LIGHTNING SKILLs (S@RcERESS ONLY)
*19% FASTER CAST RATE
+17 Te MANA
REGENERATE MANA 10%
CoLpD Resist +6%
LIGHTNING RESIST 9 6%
Fire Resist + 6%
P@IseN Resist: + 35%

...almost..lol

yeah i can't get a better output than that, i tried preprocessing the image and stuff but the result was basically the same when i did your ammy
i've never used ocr and it just kinda sucks ass at some fonts. training it with the font data is way outside the scope of my project. can't believe it won't read mine honestly. other OCRs i know of are cloud based and i think you gotta pay
i'm kinda stumped

Code

import cv2
import pytesseract

from difflib import SequenceMatcher

def similar(a, b):
return SequenceMatcher(None, a, b).ratio()

# Set the Tesseract OCR executable path
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

# Read the image using OpenCV
img = cv2.imread('LINKTOIMAGE')

# Convert the image to grayscale
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Use pytesseract to extract text from the image
text = pytesseract.image_to_string(gray_img)

# Split the text into lines
spl_text = text.split("\n")

# Initialize an empty string to store the modified text
spl_text_two = ''

# Concatenate words without spaces to remove line breaks
for sentence in spl_text:
spl_text_two += ''.join(sentence.split())

# Convert the text to title case
spl_text_two = spl_text_two.title()

# Print the modified text
print('spl_text_two: ' + spl_text_two)

# Print each word and its corresponding line number
for i, word in enumerate(spl_text):
print(f'{i + 1}: {word.title()}')

The Output

Deathtorcamuletreguirredlevel:52+2T®Lightningskills(S@Rceressonly}*19%Fastercastrate+17Temanaregeneratemana10%Coltdresist-+6%Lightningaresistyp6%Fireresist+6%Poisonoresist:+35%
1: Death Torc
2: Amulet
3: Reguirred Level: 52
4: +2 T® Lightning Skills (S@Rceress Only}
5: *19% Faster Cast Rate
6: +17 Te Mana
7: Regenerate Mana 10%
8: Coltd Resist-+6%
9: Lightningaresist Yp 6%
10: Fire Resist + 6%
11: Poisono Resist: + 35%

This post was edited by bLink11 on Dec 14 2023 06:12pm

bLink11

Member

Posts: 8,491

Joined: Sep 27 2021

Gold: 1,233.01

Dec 14 2023 06:43pm

If anyone has advice I would greatly appreciate it. Til then I'm keeping at it

EDIT:

i'm getting closer.....
its now outputting MER

This post was edited by bLink11 on Dec 14 2023 06:59pm

cugock14

Member

Posts: 2,485

Joined: Jun 14 2017

Gold: 328.00

Jan 19 2024 11:15am

seems like the custom diablo font is tricking the CV to read as a different character..my recommendation is to store the letters that arnt being transcribed correctly into its own library as individual images.

then, when CV comes across such letters or images, you can programmatically swap the erroneous letters with preconfigured string values..

for example on bullet point 6..it seems the in game "o" is being treated as an "e"...so why not just swap all instances of this in a try-catch?

flow6

Member

Posts: 3,267

Joined: Jun 8 2023

Gold: 12,627.25

Mar 11 2024 02:37pm

Quote (Candyzcanes @ Dec 14 2023 05:57pm)

i've tried using tesseract ocr on many things that are very legible and i struggled greatly.
if you are using python this is what i got to sort of work

Code

dat font tho lol

Helic

Member

Posts: 4

Joined: Jan 23 2022

Gold: 4,000.00

Mar 12 2024 07:16pm

Quote (bLink11 @ Dec 14 2023 08:43pm)

Sorry to post again back to back but it won't let me edit OP

I'm having it do lots of preprocessing to get a better result because that stippling effect on the button is really throwing off the OCR i think. there is another 'PLAY' button that looks almost identical without the stippling and i got the OCR to read that just fine
This is where i'm at in the automated preprocessing and it's struggling to read it still

https://i.imgur.com/FYm0Yp5.png

If anyone has advice I would greatly appreciate it. Til then I'm keeping at it

EDIT:

i'm getting closer.....
its now outputting MER

https://i.imgur.com/GCg4ylw.png

Why don't you try rotating the image so the text isn't skewed?

Go Back To Programming & Development Topic List