Image dataset to train a deep learning model to decode Leetspeak obfuscated characters
- De Mendizabal, Iñaki Velez 1
- Vidriales, Xabier 1
- Fernandes, Vitor Basto 2
- Ezpeleta, Enaitz 1
- Méndez, José Ramón 3
- Zurutuza, Urko 1
-
1
Universidad de Mondragón/Mondragon Unibertsitatea
info
- 2 Instituto Universitário de Lisboa (ISCTE-IUL)
-
3
Universidade de Vigo
info
Editor: Zenodo
Year of publication: 2022
Type: Dataset
Abstract
The dataset contains an image database (18,981 images) that could be used to train a deep learning model to accurately detect characters. We have successfully used it to create a model that identifies characters encoded using LeetSpeak. The original dataset can be found in the Mondragon Unibertsitatea Repository -- https://gitlab.danz.eus/datasharing/ski4spam The training dataset consists of: - Alphabetic letters (a-z) written using different fonts and styles (regular, cursive, bold, cursive+bold) - Handwritten letters: English handwriting from the Chars74k dataset [2] which is available at http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/.