NoteOnMe博客平台搭建

文件历史

王子玥 5c42096431 更新 'im2latex_master/README.md'		3 年前
..
README.md	更新 'im2latex_master/README.md'	3 年前
__init__.py	上传文件至 'im2latex_master'	3 年前
build.py	上传文件至 'im2latex_master'	3 年前
data.json	上传文件至 'im2latex_master'	3 年前
data_small.json	上传文件至 'im2latex_master'	3 年前
debug.py	上传文件至 'im2latex_master'	3 年前
demo2.py	上传文件至 'im2latex_master'	3 年前
encoder.pyc	上传文件至 'im2latex_master'	3 年前
evaluate_img.py	上传文件至 'im2latex_master'	3 年前
evaluate_txt.py	上传文件至 'im2latex_master'	3 年前
makefile	上传文件至 'im2latex_master'	3 年前
model.json	上传文件至 'im2latex_master'	3 年前
model_train.py	上传文件至 'im2latex_master'	3 年前
predict.py	上传文件至 'im2latex_master'	3 年前
train.py	上传文件至 'im2latex_master'	3 年前
training.json	上传文件至 'im2latex_master'	3 年前
training_small.json	上传文件至 'im2latex_master'	3 年前
vgg.py	上传文件至 'im2latex_master'	3 年前
vocab.json	上传文件至 'im2latex_master'	3 年前
vocab_small.json	上传文件至 'im2latex_master'	3 年前

README.md

多语言文本识别

介绍

采用基于faster rcnn 的CTPN网络进行文本定位，修改原网络以进行多语言语言识别。利用神经网络预测文本行与anchor之间的偏移量。使用VGG16提取特征，在feature map上使用滑动窗口预测和anchor之间的偏移距离，之后将其输入到一个双向LSTM网络，获得序列特征。由于文本行长度差异较大，模型仅预测anchor高度，最后循环连接小尺度的文本框。

文本识别网络采用seq2seq模型以及attention机制。encoder端使用CNN以获得较高的并行速度，同时采用positional embedding表征位置信息； decoder端使用LSTM做解码器。考虑到数学公式的识别存在长距离依赖的问题，故引入attention机制。训练数据结合了拍摄的图像以及课题组制作的含有混合latex公式及文字的图像，训练过程中进行了图像增强，以提高泛化能力。

结构

文本行定位

main文件夹，net文件夹。权重：checkpoint_mlt
文本识别

model文件夹权重：results/full
数据生成脚本：generate_data文件夹

train：训练文本识别部分的网络

evaluate_txt：验证文本识别部分的网络

Main/train:训练文本定位部分的网络

predict: 单行预测

demo2：展示两个网络联合起来的效果，输入多行图片预测

Classification：单图像语言分类（最终模型未使用）

数据来源：
1. 自生成的行级别Latex与行级别的英文数据，用于训练文本识别网络。
  
  Latex文本来源为arXiv论文http://www.cs.cornell.edu/projects/kddcup/datasets.html
  
  英文文本来源为美国当代英语语料库（COCA）
  
  经过处理后的实验的文本数据在data2和data3中。

自生成的图像级别的数据集，用于训练文本框检测网络

英文文本，Latex文本来源同上。

文本识别结果

	BLEU-4	Inverse Edit	perplexity	Exact Match
Seq2seq混合式	86.36	88.69	-1.44	36.20
Seq2seq-Latex	90.10	84.12	-1.32	37.21
Seq2seq-English	97.2	97.22	-1.05	88.54