Home Captcha Bypass -> OCR
Post
Cancel

Captcha Bypass -> OCR

Captcha Bypass -> OCR

CAPTCHA Is a type of challenge–response test used in computing to determine whether the user is human or a bot. If the text in the captcha is not scattered enough, we can use ocr (Optical Character Recognition) tool to recognise it.

Exploitation

cap

As you can see the text in the image is not scattered enough making it possible for ocr tool to recognise the text.

This is the python script that we are going to use to recognise the text of the captcha.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# We import the necessary packages
#import the needed packages
import cv2
import os,argparse
import pytesseract
from PIL import Image

#We then Construct an Argument Parser
ap=argparse.ArgumentParser()
ap.add_argument("-i","--image",
				required=True,
				help="Path to the image folder")
ap.add_argument("-p","--pre_processor",
				default="thresh",
				help="the preprocessor usage")
args=vars(ap.parse_args())

#We then read the image with text
images=cv2.imread(args["image"])

#convert to grayscale image
gray=cv2.cvtColor(images, cv2.COLOR_BGR2GRAY)

#checking whether thresh or blur
if args["pre_processor"]=="thresh":
	cv2.threshold(gray, 0,255,cv2.THRESH_BINARY| cv2.THRESH_OTSU)[1]
if args["pre_processor"]=="blur":
	cv2.medianBlur(gray, 3)
	
#memory usage with image i.e. adding image to memory
filename = "{}.jpg".format(os.getpid())
cv2.imwrite(filename, gray)
text = pytesseract.image_to_string(Image.open(filename))
os.remove(filename)
print(text)

Let’s download the image and use our python script to recognise the text

cap

it was able to recognised the text as edward, which was correct.

Let’s test one more.

cap

cap

it was able to recognised the text as genesis, which was correct.

References

This post is licensed under CC BY 4.0 by the author.