Arabic Nougat

This is the official demo for the Arabic Nougat models. It is an end-to-end Markdown Extraction model that extracts text from images or PDFs and write them in Markdown.

There are three models available:

arabic-small-nougat: A small model that is faster but less accurate (a finetune from facebook/nougat-small).
arabic-base-nougat: A base model that is more accurate but slower (a finetune from facebook/nougat-base).
arabic-large-nougat: The largest of the three (Made from scratch using riotu-lab/Aranizer-PBE-86k tokenizer and a larger transformer decoder model).
mobser-small-v0.1: A finetune built by TawasulAI team to push the boundry of what's possible further.

Disclaimer: Models can hallucinate text and are not perfect. Please double check the output if you care about accuracy the most.

Arabic End-to-End Structured OCR for textbooks