文末含有案例地址哦!!!
Tesseract是一个开源的光学字符识别(OCR)引擎,它可以将图像中的文字转换为计算机可读的文本。支持多种语言和书面语言,并且可以在命令行中执行。它是一个流行的开源OCR工具,可以在许多不同的操作系统上运行。
Tess4J是一个基于Tesseract OCR引擎的Java接口,可以用来识别图像中的文本,说白了,就是封装了它的API,让Java可以直接调用。
net.sourceforge.tess4j tess4j 4.5.4
把训练数据的目录路径配置在yml里,后续可以扩展到配置中心。
server: port: 8888 # 训练数据文件夹的路径 tess4j: datapath: E:/gitee/OCR/tess4j_data
Tesseract OCR库通过训练数据来学习不同语言和字体的特征,以便更好地识别图片中的文字。
在安装Tesseract OCR库时,通常会生成一个包含多个子文件夹的训练数据文件夹,其中每个子文件夹都包含了特定语言或字体的训练数据。
比如我这里是下载后放到了D盘的tessdata目录下,如图所示,其实就是一个.traineddata为后缀的文件,大小约2M多。
训练数据,官方下载地址:https://digi.bib.uni-mannheim.de/tesseract/
GitHub官网地址:https://github.com/tesseract-o
@Configuration public class TesseractOcrConfig { @Value("${tess4j.dataPath}") private String dataPath; @Bean public Tesseract tesseract() { Tesseract tesseract = new Tesseract(); // 设置训练数据文件夹路径 tesseract.setDatapath(dataPath); // 设置为中文简体 tesseract.setLanguage("chi_sim"); return tesseract; } }
@Service public class OcrServiceImpl implements OcrService { private final Tesseract tesseract; public OcrServiceImpl(Tesseract tesseract) { this.tesseract = tesseract; } /** * * @param imageFile 要识别的图片 * @return */ @Override public String recognizeText(MultipartFile imageFile) throws IOException, TesseractException { // 转换 InputStream sbs = new ByteArrayInputStream(imageFile.getBytes()); BufferedImage bufferedImage = ImageIO.read(sbs); // 对图片进行文字识别 return tesseract.doOCR(bufferedImage); } }
@RestController @RequestMapping("/api") @Slf4j public class OcrController { private final OcrService ocrService; public OcrController(OcrService ocrService) { this.ocrService = ocrService; } @PostMapping(value = "/recognize", consumes = MediaType.MULTIPART_FORM_DATA_VALUE) public String recognizeImage(@RequestParam("file") MultipartFile file) throws TesseractException, IOException { log.info(ocrService.recognizeText(file)); // 调用OcrService中的方法进行文字识别 return ocrService.recognizeText(file); } }
此方法的识别精准度差,不建议使用。
案例完整代码:https://github.com/rookiesnewbie/OCR
官网文档:https://ai.baidu.com/ai-doc/OCR/dk3iqnq51
com.baidu.aip java-sdk 4.16.10
server: port: 8090 baidu: APP_ID: #应用的appid API_KEY: #你的app_key SECRET_KEY: #你的secret_key
@Setter @Configuration @ConfigurationProperties(prefix = "baidu") public class baiduOcrConfig { private String APP_ID; private String API_KEY; private String SECRET_KEY; @Bean public AipOcr aipOcr() { return new AipOcr(APP_ID, API_KEY, SECRET_KEY); } }
@RestController @Slf4j @RequestMapping("/api") public class BaiduOcrController { @Autowired private AipOcr aipOcr; @PostMapping(value = "/process",consumes = MediaType.MULTIPART_FORM_DATA_VALUE) public String processImage(@RequestParam("file") MultipartFile file) { HashMapoptions = new HashMap (); options.put("detect_direction", "true"); options.put("probability", "true"); try { byte[] imageBytes = file.getBytes(); JSONObject result = aipOcr.basicGeneral(imageBytes, options); // 解析百度OCR的识别结果 StringBuilder sb = new StringBuilder(); result.getJSONArray("words_result").forEach(item -> { JSONObject word = (JSONObject) item; sb.append(word.getString("words")).append("\n"); }); return sb.toString(); } catch (IOException e) { e.printStackTrace(); return "Error processing image"; } } }
精准度很高,推荐使用,而且还有很多的精准度选择
官网文档:https://ai.baidu.com/ai-doc/OCR/Nkibizxlf#%E9%80%9A%E7%94%A8%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB
//image为图片的二进制字节数组,options为传入可选参数调用接口 basicGeneral(byte[] image, HashMapoptions)
//image为图片的存放路径,options为传入可选参数调用接口 basicGeneral(String image, HashMapoptions)
//image为图片的存放在服务器的路径,options为传入可选参数调用接口 basicGeneralUrl(String url, HashMapoptions)
官网文档:https://ai.baidu.com/ai-doc/OCR/Nkibizxlf#%E9%80%9A%E7%94%A8%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB%EF%BC%88%E9%AB%98%E7%B2%BE%E5%BA%A6%E7%89%88%EF%BC%89
//image为图片的二进制字节数组,options为传入可选参数调用接口 basicAccurateGeneral(byte[] image, HashMapoptions)
//image为本地图片的存放路径,options为传入可选参数调用接口 basicAccurateGeneral(String image, HashMapoptions)
官网文档:https://ai.baidu.com/ai-doc/OCR/Nkibizxlf#%E9%80%9A%E7%94%A8%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB%EF%BC%88%E5%90%AB%E4%BD%8D%E7%BD%AE%E4%BF%A1%E6%81%AF%E7%89%88%EF%BC%89
//image为图片的二进制字节数组,options为传入可选参数调用接口 general(byte[] image, HashMapoptions)
//image为图片的存放路径,options为传入可选参数调用接口 general(String image, HashMapoptions)
//image为图片的存放在服务器的路径,options为传入可选参数调用接口 generalUrl(String url, HashMapoptions)
官网文档:https://ai.baidu.com/ai-doc/OCR/Nkibizxlf#%E9%80%9A%E7%94%A8%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB%EF%BC%88%E5%90%AB%E4%BD%8D%E7%BD%AE%E9%AB%98%E7%B2%BE%E5%BA%A6%E7%89%88%EF%BC%89
//image为图片的二进制字节数组,options为传入可选参数调用接口 accurateGeneral(byte[] image, HashMapoptions)
//image为图片的存放路径,options为传入可选参数调用接口 accurateGeneral(String image, HashMapoptions)
官网文档:https://ai.baidu.com/ai-doc/OCR/Nkibizxlf#%E9%80%9A%E7%94%A8%E6%96%87%E5%AD%97%E8%AF%86%E5%88%AB%EF%BC%88%E5%90%AB%E7%94%9F%E5%83%BB%E5%AD%97%E7%89%88%EF%BC%89
enhancedGeneral(byte[] image, HashMapoptions)
enhancedGeneral(String image, HashMapoptions)
enhancedGeneralUrl(String url, HashMapoptions)
官方文档:https://next.api.aliyun.com/api-tools/sdk/ocr-api?spm=a2c4g.442247.0.0.40933e8eQ3GZIX&version=2021-07-07&language=java-tea&tab=primer-doc
https://next.api.aliyun.com/api/ocr-api/2021-07-07/RecognizeAdvanced?spm=api-workbench.SDK%20Document.0.0.2e7652e95jnhKE&tab=DOC&sdkStyle=dara&lang=JAVA
com.aliyun ocr_api20210707 1.2.0
server: port: 8099 aliyun: KeyId: #你的KeyId KeySecret: #你的KeySecret endpoint: ocr-api.cn-hangzhou.aliyuncs.com
@Setter @Component @ConfigurationProperties(prefix = "aliyun") public class AliyunOcrConfig { private String KeyId; private String KeySecret; private String endpoint; @Bean public Client ocrClient(){ try { Client client = new Client(new Config().setAccessKeyId(KeyId).setAccessKeySecret(KeySecret).setEndpoint(endpoint)); return client; } catch (Exception e) { throw new RuntimeException(e); } } }
@RestController @RequestMapping("/api") @Slf4j public class AliyunOcrController { @Resource private Client client; @PostMapping(value = "ocr",consumes = MediaType.MULTIPART_FORM_DATA_VALUE) public String OcrTest(@RequestParam("file")MultipartFile file) throws IOException { RecognizeGeneralRequest request = new RecognizeGeneralRequest(); request.setBody(file.getInputStream()); try { RecognizeGeneralResponse response = client.recognizeGeneral(request); String json = new Gson().toJson(response.getBody()); String[] split = json.split(","); return split[1].split(":")[1].replace("\\",""); // return json; } catch (TeaException error) { Common.assertAsString(error.message); } catch (Exception e) { TeaException error = new TeaException(e.getMessage(), e); // 如有需要,请打印 error Common.assertAsString(error.message); } return null; } }
官网文档:https://cloud.tencent.com/document/product/866
官网API:https://cloud.tencent.com/document/api/866/33515
官网参数说明与在线调试:https://console.cloud.tencent.com/api/explorer?Product=ocr&Version=2018-11-19&Action=GeneralHandwritingOCR
https://console.cloud.tencent.com/cam/capi
com.tencentcloudapi tencentcloud-sdk-java 3.1.909
server: port: 9090 tencent: SecretId: #你的SecretId SecretKey: #你的SecretKey
@Setter @Configuration @ConfigurationProperties(prefix = "tencent") public class TencentOcrConfig { private String SecretId; private String SecretKey; @Bean public Credential credential(){ return new Credential(SecretId,SecretKey); } }
由于腾讯云OCR的API接收的参数如下图,所以要编写将文件转码成Base64的工具类:
public class ByteToBase64Converter { public static String encodeBytesToBase64(byte[] bytes) { return Base64.getEncoder().encodeToString(bytes); } }
@RestController @RequestMapping("/api") @Slf4j public class TencentOcrController { @Resource private Credential credential; @PostMapping(value = "/tencent",consumes = MediaType.MULTIPART_FORM_DATA_VALUE) public String TencentOCR(@RequestParam("file")MultipartFile file) throws TencentCloudSDKException, IOException { // 实例化一个http选项,可选的,没有特殊需求可以跳过 HttpProfile httpProfile = new HttpProfile(); httpProfile.setEndpoint("ocr.tencentcloudapi.com"); // 实例化一个client选项,可选的,没有特殊需求可以跳过 ClientProfile clientProfile = new ClientProfile(); clientProfile.setHttpProfile(httpProfile); // 实例化要请求产品的client对象,clientProfile是可选的 OcrClient client = new OcrClient(credential, "ap-guangzhou", clientProfile); //将上传的文件转换成base64编码 String encodeBytesToBase64 = ByteToBase64Converter.encodeBytesToBase64(file.getBytes()); // 实例化一个请求对象,每个接口都会对应一个request对象 GeneralBasicOCRRequest req = new GeneralBasicOCRRequest(); req.setImageBase64(encodeBytesToBase64); // req.setImageUrl("https://ltmyblog.oss-cn-shenzhen.aliyuncs.com/myBlog/article/image-20230924213423855.png"); // 返回的resp是一个GeneralBasicOCRResponse的实例,与请求对象对应 GeneralBasicOCRResponse resp = client.GeneralBasicOCR(req); GeneralBasicOCRResponse.toJsonString(resp); // 输出json格式的字符串回包 System.out.println(GeneralBasicOCRResponse.toJsonString(resp)); return GeneralBasicOCRResponse.toJsonString(resp); } }
本文的全部案例:https://github.com/rookiesnewbie/OCR