python3使用枕头,tesseract-ocr与pytesseract模块的图片识别的方法 - 行业资讯

　　　　　　pip安装枕头　　之前　　　　

github地址:https://github.com/tesseract-ocr/tesseract

或本地下载地址:https://www.jb51.net/softs/538925.html

　　windows:

　　
最新的安装程序可以在这里下载:,tesseract-ocr-setup-3.05.01.exe 以及tesseract-ocr-setup-4.00.00dev.exe (实验)只
　　　　ubuntu:

　　　　　　sudo apt-get安装tesseract-ocr 　　traineddata文件路径:/usr/share/tesseract-ocr/tessdata/之前　　　　

　　　　　　pip安装pytesseract 　　　　
如不能使用脉冲直接安装可取搜索模块文件直接安装
　　
<强>遇到问题及解决:
　　
<强> 1。FileNotFoundError: [WinError 2]系统找不到指定的文件
　　
解决办法:
　　
方法1(推荐):将tesseract.exe添加到环境变量路径中,
　　
例如:D: \ Tesseract-OCR默认路径为C: \程序文件(x86) \ Tesseract-OCR
　　
注意:为了使环境变量生效,需要关闭cmd窗口或是关闭pycharm等ide重新启动
　　
方法2:修改pytesseract.py文件,指定tesseract.exe安装路径
　　　　　　#改变这个如果超正方体不是在你的路径,或者是有不同的名称　　tesseract_cmd=' C: \ \程序文件(x86) \ \ Tesseract-OCR \ \ tesseract.exe ' 　　之前　　　　
方法3:,在实际运行代码中指定
　　　　　　pytesseract.pytesseract。tesseract_cmd=' D: \ \ Tesseract-OCR \ \ tesseract.exe ' 　　　　之前　　　　
<强> 2. pytesseract.pytesseract。TesseractError:(1,错误打开数据文件 <强> \ \ Tesseract-OCR \ \ tessdata/eng.traineddata” <强>)
　　
,解决方法:
　　
方法1(推荐):,
　　
将tessdata目录的上级目录所在路径(默认为tesseract-ocr安装目录)添加至TESSDATA_PREFIX环境变量中
　　
例如:C: \程序文件(x86) \ Tesseract-OCR
　　
　　
请确认TESSDATA_PREFIX环境变量设置为“tessdata”目录的父目录只
　　　　
方法2:,在py文件配置中指定tessdata-dir
　　　　　　tessdata_dir_config='——tessdata-dir“D: \ \ Tesseract-OCR \ \ tessdata”” 　　# tessdata_dir_config='——tessdata-dir“C: \ \程序文件(x86) \ \ Tesseract-OCR \ \ tessdata” 　　pytesseract。image_to_string(图像,配置=tessdata_dir_config) 　　之前　　　　
trainedata下载地址:最新的从github.com
　　
　　
示例:
　　
　　　　　　# - *编码:utf - 8 - * 　　从公益诉讼导入图像　　导入系统　　进口操作系统　　进口pytesseract 　　从硒进口webdriver 　　sys.path.append (“C: \ Python27 \ Lib \网站\ pytesser”) 　　进口pytesser 　　url=' http://192.168.24.189/system/code& # 63; 0.6824490785056669 ' 　　司机=webdriver.Firefox () 　　driver.maximize_window() #将浏览器最大化　　driver.get (url) 　　imgelement=driver.find_element_by_id (codeImg) #定位验证码　　位置=imgelement。位置#获取验证码x, y轴坐标=imgelement大小。大小#获取验证码的长宽　　纠正=(int [' x '])(位置,int [y])(位置,int(位置[x] +尺寸(宽的)),int(位置[y] +['高'])大小)#写成我们需要截取的位置坐标　　name=" code.jpg " 　　driver.find_element_by_id (“codeImg”) .click () 　　driver.save_screenshot(名字)#截取当前网页,该网页有我们需要的验证码　　aa=Image.open(名字)#打开截的图　　frame4=aa.crop(纠正)#使用形象的作物函数,从截图中再次截取我们需要的区域　　frame4.save(名字) 　　我=Image.open(名称) 　　#转化到灰度图　　imgry=im.convert (L) 　　#保存图像　　imgry.save (“g”+名字) 　　#二值化,采用阈值分割法,阈值为分割点　　阈值=140 　　表=[] 　　范围的j (256): 　　如果j & lt;阈值: 　　table.append (0) 　　其他: 　　table.append (1)=imgry。点(表,' 1 ') 　　out.save (b +名字) 　　#识别　　文本=pytesseract.image_to_string(出) 　　#识别对吗　　文本=text.strip () 　　文本=text.upper (); 　　打印(文本) 　　文本=pytesseract.image_to_string (Image.open (code.png) lang=癳ng”) 　　打印(文本) 　　之前　　　　
,以上就是python3使用枕头,tesseract-ocr与pytesseract模块的图片识别的方法的详细内容,更多关于python3图片识别的资料请关注其它相关文章!
python3使用枕头,tesseract-ocr与pytesseract模块的图片识别的方法