Arabic End-to-End Structured OCR for textbooks
This is the official demo for the Arabic Nougat models. It is an end-to-end Markdown Extraction model that extracts text from images or PDFs and write them in Markdown.
There are three models available:
- arabic-small-nougat: A small model that is faster but less accurate (a finetune from facebook/nougat-small).
- arabic-base-nougat: A base model that is more accurate but slower (a finetune from facebook/nougat-base).
- arabic-large-nougat: The largest of the three (Made from scratch using riotu-lab/Aranizer-PBE-86k tokenizer and a larger transformer decoder model).
- mobser-small-v0.1: A finetune built by TawasulAI team to push the boundry of what's possible further.
Disclaimer: Models can hallucinate text and are not perfect. Please double check the output if you care about accuracy the most.
Model
Examples