[4] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine
translation by jointly learning to align and translate. CoRR,
abs/1409.0473, 2014.
[5] A. Bissacco, M. Cummins, Y. Netzer, and H. Neven. Pho-
toocr: Reading text in uncontrolled conditions. In ICCV,
2013.
[6] F. L. Bookstein. Principal warps: Thin-plate splines and the
decomposition of deformations. IEEE Trans. Pattern Anal.
Mach. Intell., 11(6):567–585, 1989.
[7] K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio.
On the properties of neural machine translation: Encoder-
decoder approaches. CoRR, abs/1409.1259, 2014.
[8] J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Ben-
gio. Attention-based models for speech recognition. CoRR,
abs/1506.07503, 2015.
[9] R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A
matlab-like environment for machine learning. In BigLearn,
NIPS Workshop, 2011.
[10] N. Dalal and B. Triggs. Histograms of oriented gradients for
human detection. In CVPR, 2005.
[11] V. Goel, A. Mishra, K. Alahari, and C. V. Jawahar. Whole is
greater than sum of parts: Recognizing scene text words. In
ICDAR, 2013.
[12] A. Gordo. Supervised mid-level features for word image rep-
resentation. In CVPR, 2015.
[13] A. Graves, A. Mohamed, and G. E. Hinton. Speech recogni-
tion with deep recurrent neural networks. In ICASSP, 2013.
[14] S. Hochreiter and J. Schmidhuber. Long short-term memory.
Neural Computation, 9(8):1735–1780, 1997.
[15] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman.
Synthetic data and artificial neural networks for natural scene
text recognition. NIPS Deep Learning Workshop, 2014.
[16] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman.
Deep structured output learning for unconstrained text recog-
nition. In ICLR, 2015.
[17] M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman.
Reading text in the wild with convolutional neural networks.
Int. J. Comput. Vision, 2015.
[18] M. Jaderberg, K. Simonyan, A. Zisserman, and
K. Kavukcuoglu. Spatial transformer networks. CoRR,
abs/1506.02025, 2015.
[19] M. Jaderberg, A. Vedaldi, and A. Zisserman. Deep features
for text spotting. In ECCV, 2014.
[20] D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Big-
orda, S. R. Mestre, J. Mas, D. F. Mota, J. Almaz
´
an, and
L. de las Heras. ICDAR 2013 robust reading competition.
In ICDAR, 2013.
[21] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
classification with deep convolutional neural networks. In
NIPS, 2012.
[22] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-
based learning applied to document recognition. Proceed-
ings of the IEEE, 86(11):2278–2324, 1998.
[23] D. G. Lowe. Distinctive image features from scale-invariant
keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004.
[24] S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong,
R. Young, K. Ashida, H. Nagai, M. Okamoto, H. Yamamoto,
H. Miyao, J. Zhu, W. Ou, C. Wolf, J. Jolion, L. Todoran,
M. Worring, and X. Lin. ICDAR 2003 robust reading com-
petitions: entries, results, and future directions. IJDAR, 7(2-
3):105–122, 2005.
[25] A. Mishra, K. Alahari, and C. V. Jawahar. Scene text recog-
nition using higher order language priors. In BMVC, 2012.
[26] G. Nagy. Twenty years of document image analysis in PAMI.
IEEE Trans. Pattern Anal. Mach. Intell., 22(1):38–62, 2000.
[27] V. Nair and G. E. Hinton. Rectified linear units improve re-
stricted boltzmann machines. In ICML, 2010.
[28] L. Neumann and J. Matas. Real-time scene text localization
and recognition. In CVPR, 2012.
[29] T. Q. Phan, P. Shivakumara, S. Tian, and C. L. Tan. Recog-
nizing text with perspective distortion in natural scenes. In
ICCV, 2013.
[30] A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan.
A robust arbitrary text detection system for natural scene im-
ages. Expert Syst. Appl., 41(18):8027–8048, 2014.
[31] J. A. Rodr
´
ıguez-Serrano, A. Gordo, and F. Perronnin. Label
embedding: A frugal baseline for text recognition. Int. J.
Comput. Vision, 113(3):193–207, 2015.
[32] B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural
network for image-based sequence recognition and its ap-
plication to scene text recognition. CoRR, abs/1507.05717,
2015.
[33] K. Simonyan and A. Zisserman. Very deep convolu-
tional networks for large-scale image recognition. CoRR,
abs/1409.1556, 2014.
[34] B. Su and S. Lu. Accurate scene text recognition based on
recurrent neural network. In ACCV, 2014.
[35] K. Wang, B. Babenko, and S. Belongie. End-to-end scene
text recognition. In ICCV, 2011.
[36] K. Wang and S. Belongie. Word spotting in the wild. In
ECCV, 2010.
[37] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng. End-to-end text
recognition with convolutional neural networks. In ICPR,
2012.
[38] C. Yao, X. Bai, W. Liu, Y. Ma, and Z. Tu. Detecting texts of
arbitrary orientations in natural images. In CVPR, 2012.
[39] C. Yao, X. Bai, B. Shi, and W. Liu. Strokelets: A learned
multi-scale representation for scene text recognition. In
CVPR, 2014.
[40] Q. Ye and D. S. Doermann. Text detection and recognition in
imagery: A survey. IEEE Trans. Pattern Anal. Mach. Intell.,
37(7):1480–1500, 2015.
[41] M. D. Zeiler. ADADELTA: an adaptive learning rate method.
CoRR, abs/1212.5701, 2012.
[42] Z. Zhang, A. Ganesh, X. Liang, and Y. Ma. TILT: transform
invariant low-rank textures. Int. J. Comput. Vision, 99(1):1–
24, 2012.
[43] Z. Zhang, C. Zhang, W. Shen, C. Yao, W. Liu, and X. Bai.
Multi-oriented text detection with fully convolutional net-
works. In CVPR, 2016.
[44] Y. Zhu, C. Yao, and X. Bai. Scene text detection and recog-
nition: recent advances and future trends. Frontiers of Com-
puter Science, 10(1):19–36, 2016.