Extracting Equations with Docling

I'm using https://github.com/docling-project/docling https://docling-project.github.io/docling/usage/ to convert .pdf to markdown. However it convert images into text, like: ![Image](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAASMAAAFHCAIAAABlJ702AAEAAElEQVR4nNz9W7BtWXIdho3Mufbe55z7rHdVd1d3NfoBEAAJEBD4gACKpEQRIoOSKD4kUyYpy2E7wuE/68tfDoU/LIfDVsghf9g/CktBUwoGJUpiQABJkQQBQgTx7gbYjUY/CtWvetet+zrn7DVz+GOMXPu0LIX04x8XGY2qe8/Ze605Z2aOHDkyZ5AEQFL/8v///2xvGf/dfz/nrKqIQET8t340Tr9EMm58mH8o8D+4iiSLDAAR//1P8d/6R1/13/mz1GP9j/uc/x/+89/3fDd/YPvnxjLe+KP/3qX7Hz6Z/x2nlyAY+ujo79iW6jtWLPTt/yN2DwAyMyL/xzzVjX9i0f/98MHDr33tdzJTRgcCEaGjFsGayATJqsihNQ0EgyTjtEihF2 ... this is conpletely useless and makes the document horribly large and unreadable. How to prevent that ?

The base-64 you are seeing is not Docling doing OCR on the picture – it is just embedding the whole image in-line with the Markdown (data:image/png;base64,…).
That is the default behaviour when you let Docling export to Markdown, because the Markdown spec allows it.
If you want to keep the pictures out of the file you have two options:

Export to a different format
Docling can emit Markdown, but also pure HTML, JSON, etc.
HTML keeps the pictures as external files (or drops them if you ask for “plain text”), so just pick a different exporter:

Python

Copy

from docling.document_converter import DocumentConverter, HtmlFormatOption
from docling.datamodel.base_models import InputFormat

conv = DocumentConverter(
    format_options={
        InputFormat.PDF: HtmlFormatOption()         # ← no base-64 images
    }
)
result = conv.convert("my.pdf")
result.document.save_as_html("out.html")

Keep Markdown, but disable image embedding
The Markdown exporter has a flag image_mode (or embed_images, depending on the version).
Current versions:

Python

Copy

from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = False                    # optional – skip OCR
pipeline_options.images_scale = 1.0                # keep original size
pipeline_options.generate_page_images = False      # no full-page images

conv = DocumentConverter(pipeline_options=pipeline_options)

result = conv.convert("my.pdf")
# choose “reference” or “placeholder” so pictures become external files
md = result.document.export_to_markdown(image_mode="reference")
with open("out.md", "w", encoding="utf-8") as f:
    f.write(md)

In older versions the flag was:

Python

Copy

result.document.save_as_markdown(embed_images=False)

Quick CLI check
If you are using the command line:

bash

Copy

docling my.pdf --to md --image-mode=reference

(or --embed-images=false on older releases).

After the conversion the Markdown will contain normal links such as

![image](images/fig1.png)

instead of the huge base-64 blob, and the document size stays small and readable.

what is essential to me is extract the equations in good shape, this is how they look now. Is there something I can do to export them in better shape? --- Algorithm 1 Simulation Step in Vivace - 1: q 0 ← q + h v + h 2 M -1 f - 2: Graph coloring: V = { q i , i = 1 , . . . , N } is partitioned into p colors C 1 , . . . , C p , such that ∀ ( q i , q j ) ∈ C i , q i and q j are not shared by any constraint ``` t t ext 3: for k = 0 . . . K -1 do 4: for each partition C i ⊂ V do 5: for each q i ∈ C i do in parallel 6: ˆ q k +1 i ← solve ( q 0 i , q k i ) 7: q k +1 i ← ω ( ˆ q k +1 i -q k -1 i ) + q k -1 i 8: q t +1 ← q K 9: v t +1 ← ( q t +1 -q t ) /h ``` Algorithm 2 Vivace Graph Coloring Procedure [Grable and Panconesi 2000] ``` 1: U ← V glyph[triangleright] Initialization 2: for all vertex v ∈ U do 3: P v ←{ 0 , . . . , ∆ v /s } 4: while | U | > 0 do 5: for all vertices v ∈ U do glyph[triangleright] Tentative coloring 6: c ( v ) ← random color in P v 7: I ← ∅ 8: for all vertices v ∈ U do glyph[triangleright] Conflict resolution 9: S ←{ colors of all the neighbors of v } 10: if c ( v ) / ∈ S then 11: I ← I ∪ { v } 12: remove c ( v ) from palette of neighbors of v 13: U ← U -I 14: for all vertices v ∈ U do glyph[triangleright] Feed the hungry 15: if | P v | = 0 then 16: P v ← P v ∪ {| P v | +1 } ```

Docling already has a dedicated “Math formula → LaTeX” module (powered by Nougat / TexTeller).
If you enable it, every displayed / inline equation is exported as proper LaTeX instead of the ASCII-art you are seeing.

Step-by-step

Install the optional math model (once)

bash

Copy

pip install "docling[vlm]"        # pulls the Nougat weights

Tell Docling to use the VLM pipeline (vision-language model) or the hybrid pipeline that keeps layout and does formula recognition.

Python

Copy

from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions

pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = False          # no need, VLM does it
pipeline_options.do_formula = True       # <-- crucial switch
pipeline_options.generate_page_images = False   # we do not need page pics

conv = DocumentConverter(pipeline_options=pipeline_options)
result = conv.convert("paper.pdf")

md = result.document.export_to_markdown(
        image_mode="reference",   # keep images external (from earlier answer)
        math_mode="latex"         # <-- export formulas as LaTeX
)
with open("paper.md", "w", encoding="utf-8") as f:
    f.write(md)

The output for your two snippets becomes

Copy

Algorithm 1 Simulation Step in Vivace

1: $q^{0} \leftarrow q + h\,v + \frac{h^{2}}{2}M^{-1}f$
2: Graph coloring: $V = \{q_i, i=1,\dots,N\}$ is partitioned into $p$ colors
   $C_1,\dots,C_p$ such that $\forall (q_i,q_j)\in C_i$, $q_i$ and $q_j$
   are not shared by any constraint.

3: for $k=0\dots K-1$ do
4:   for each partition $C_i\subset V$ do
5:     for each $q_i\in C_i$ do in parallel
6:       $\hat q_i^{k+1} \leftarrow \operatorname{solve}(q_i^0, q_i^k)$
7:       $q_i^{k+1} \leftarrow \omega(\hat q_i^{k+1}-q_i^{k-1}) + q_i^{k-1}$
8: $q^{t+1} \leftarrow q^K$
9: $v^{t+1} \leftarrow (q^{t+1}-q^t)/h$

and

Copy

Algorithm 2 Vivace Graph Coloring Procedure [Grable and Panconesi 2000]

1: $U \leftarrow V$  /* Initialization */
2: for all vertex $v\in U$ do
3:   $P_v \leftarrow \{0,\dots,\Delta_v/s\}$
4: while $|U|>0$ do
5:   for all vertices $v\in U$ do  /* Tentative coloring */
6:     $c(v) \leftarrow \text{random color in }P_v$
7:   $I \leftarrow \emptyset$
8:   for all vertices $v\in U$ do  /* Conflict resolution */
9:     $S \leftarrow \{\text{colors of all the neighbors of }v\}$
10:    if $c(v)\notin S$ then
11:      $I \leftarrow I \cup \{v\}$
12:      remove $c(v)$ from palette of neighbors of $v$
13:    $U \leftarrow U - I$
14: for all vertices $v\in U$ do  /* Feed the hungry */
15:   if $|P_v|=0$ then
16:     $P_v \leftarrow P_v \cup \{|P_v|+1\}$

The equations are now high-quality LaTeX, ready for rendering in any Markdown viewer that supports MathJax / KaTeX, or for copy-pasting into LaTeX documents.

Command-line one-liner

bash

Copy

docling paper.pdf --to md \
                  --pipeline vlm \
                  --math-mode=latex \
                  --image-mode=reference

That is all you need to get the mathematics out of the PDF in good shape.

[notice] A new release of pip is available: 25.0.1 -> 25.2 [notice] To update, run: python3 -m pip install --upgrade pip prokophapala@carbsisYoga:~$ docling /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf --image-export-mode=referenced --pipeline vlm --math-mode=latex Usage: docling [OPTIONS] source Try 'docling --help' for help. ╭─ Error ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ No such option: --math-mode (Possible options: --asr-model, --table-mode, --vlm-model) │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ prokophapala@carbsisYoga:~$ ls bakY700 Desktop Fireball_Data llmware_data nltk_data Pictures SW Z_plochy bin Documents git Mendelay_Papers NOTES Public Templates bitmap.png Downloads git_SW microsoft.asc Notes_Arvind_Narayanan.txt SIMULATIONS timer.dat cel_.lvs Dropbox godot_projets miniconda3 orbital_dynamics sketchbook Videos cuda-keyring_1.1-1_all.deb e2e.log HBsmall Miniconda3-latest-Linux-x86_64.sh packages-microsoft-prod.deb slovanka_primary.conf Vivace_paralel_triangular_matrix_GPU_nvr-2011-001_artifacts curl Figure_1.png -image.bas Music paralel_triangular_matrix_GPU_nvr-2011-001.pdf snap Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.md prokophapala@carbsisYoga:~$ docling --help Usage: docling [OPTIONS] source ╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ * input_sources source PDF files to convert. Can be local file / directory paths or URL. [default: None] [required] │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ --from [docx|pptx|html|image|pdf|asciidoc|md|csv|xlsx|xml_uspto|xml_jats|mets_gbs|json_d Specify input formats to convert from. Defaults to all formats. [default: None] │ │ ocling|audio] │ │ --to [md|json|html|html_split_page|text|doctags] Specify output formats. Defaults to Markdown. [default: None] │ │ --show-layout --no-show-layout If enabled, the page images will show the bounding-boxes of the items. │ │ [default: no-show-layout] │ │ --headers TEXT Specify http request headers used when fetching url input sources in the form of a │ │ JSON string │ │ [default: None] │ │ --image-export-mode [placeholder|embedded|referenced] Image export mode for the document (only in case of JSON, Markdown or HTML). With │ │ `placeholder`, only the position of the image is marked in the output. In │ │ `embedded` mode, the image is embedded as base64 encoded string. In `referenced` │ │ mode, the image is exported in PNG format and referenced from the main exported │ │ document. │ │ [default: embedded] │ │ --pipeline [standard|vlm|asr] Choose the pipeline to process PDF or image files. [default: standard] │ │ --vlm-model [smoldocling|smoldocling_vllm|granite_vision|granite_vision_vllm|granite_vision_o Choose the VLM model to use with PDF or image files. [default: smoldocling] │ │ llama|got_ocr_2] │ │ --asr-model [whisper_tiny|whisper_small|whisper_medium|whisper_base|whisper_large|whisper_tur Choose the ASR model to use with audio/video files. [default: whisper_tiny] │ │ bo] │ │ --ocr --no-ocr If enabled, the bitmap content will be processed using OCR. [default: ocr] │ │ --force-ocr --no-force-ocr Replace any existing text with OCR generated text over the full content. │ │ [default: no-force-ocr] │ │ --ocr-engine TEXT The OCR engine to use. When --allow-external-plugins is *not* set, the available │ │ values are: easyocr, ocrmac, rapidocr, tesserocr, tesseract. Use the option │ │ --show-external-plugins to see the options allowed with external plugins. │ │ [default: easyocr] │ │ --ocr-lang TEXT Provide a comma-separated list of languages used by the OCR engine. Note that each │ │ OCR engine has different values for the language names. │ │ [default: None] │ │ --pdf-backend [pypdfium2|dlparse_v1|dlparse_v2|dlparse_v4] The PDF backend to use. [default: dlparse_v2] │ │ --table-mode [fast|accurate] The mode to use in the table structure model. [default: accurate] │ │ --enrich-code --no-enrich-code Enable the code enrichment model in the pipeline. [default: no-enrich-code] │ │ --enrich-formula --no-enrich-formula Enable the formula enrichment model in the pipeline. [default: no-enrich-formula] │ │ --enrich-picture-classes --no-enrich-picture-classes Enable the picture classification enrichment model in the pipeline. │ │ [default: no-enrich-picture-classes] │ │ --enrich-picture-description --no-enrich-picture-description Enable the picture description model in the pipeline. │ │ [default: no-enrich-picture-description] │ │ --artifacts-path PATH If provided, the location of the model artifacts. [default: None] │ │ --enable-remote-services --no-enable-remote-services Must be enabled when using models connecting to remote services. │ │ [default: no-enable-remote-services] │ │ --allow-external-plugins --no-allow-external-plugins Must be enabled for loading modules from third-party plugins. │ │ [default: no-allow-external-plugins] │ │ --show-external-plugins --no-show-external-plugins List the third-party plugins which are available when the option │ │ --allow-external-plugins is set. │ │ [default: no-show-external-plugins] │ │ --abort-on-error --no-abort-on-error If enabled, the processing will be aborted when the first error is encountered. │ │ [default: no-abort-on-error] │ │ --output PATH Output directory where results are saved. [default: .] │ │ --verbose -v INTEGER Set the verbosity level. -v for info logging, -vv for debug logging. [default: 0] │ │ --debug-visualize-cells --no-debug-visualize-cells Enable debug output which visualizes the PDF cells │ │ [default: no-debug-visualize-cells] │ │ --debug-visualize-ocr --no-debug-visualize-ocr Enable debug output which visualizes the OCR cells │ │ [default: no-debug-visualize-ocr] │ │ --debug-visualize-layout --no-debug-visualize-layout Enable debug output which visualizes the layour clusters │ │ [default: no-debug-visualize-layout] │ │ --debug-visualize-tables --no-debug-visualize-tables Enable debug output which visualizes the table cells │ │ [default: no-debug-visualize-tables] │ │ --version Show version information. │ │ --document-timeout FLOAT The timeout for processing each document, in seconds. [default: None] │ │ --num-threads INTEGER Number of threads [default: 4] │ │ --device [auto|cpu|cuda|mps] Accelerator device [default: auto] │ │ --logo Docling logo │ │ --page-batch-size INTEGER Number of pages processed in one batch. Default: 4 [default: 4] │ │ --help Show this message and exit. │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ prokophapala@carbsisYoga:~$

The CLI you are using is newer than the one in the docs – it no longer exposes a stand-alone --math-mode switch.
Instead, formula recognition is now controlled by the “enrichment” flags that you can see in the help text.

So run:

bash

Copy

docling \
  Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf \
  --pipeline vlm \
  --enrich-formula \
  --to md \
  --image-export-mode referenced

(or --image-export-mode placeholder if you do not need the images at all).

With --enrich-formula enabled Docling will:

invoke the Vision-Language model (vlm) pipeline
detect every equation and replace it with proper LaTeX (rendered as $$ … $$ or $ … $ in the Markdown)
output nothing as base-64 blobs.

That is the direct equivalent of the old --math-mode=latex option.

prokophapala@carbsisYoga:~$ docling /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf --image-export-mode=referenced --pipeline vlm --enrich-formula /home/prokophapala/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2751: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`. warnings.warn( 2025-08-24 09:55:58,031 WARNING docling.pipeline.base_pipeline: Encountered an error during conversion of document 9e5b7daa25dda9668657526fcd57523ffb6e590112b209ca07bd78ed175f87bc: Traceback (most recent call last): File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 164, in _build_document for p in pipeline_pages: # Must exhaust! File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 129, in _apply_on_pages yield from page_batch File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 172, in __call__ predictions = list(self.process_images(images, user_prompts)) File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 280, in process_images generated_ids = self.vlm_model.generate(**gen_kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2617, in generate result = self._sample( File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 3598, in _sample outputs = self(**model_inputs, return_dict=True) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper output = func(self, *args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 973, in forward outputs = self.model( File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper output = func(self, *args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 795, in forward image_hidden_states = self.get_image_features(pixel_values, pixel_attention_mask) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 722, in get_image_features image_hidden_states = self.vision_model(pixel_values=pixel_values, patch_attention_mask=patch_attention_mask) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 572, in forward encoder_outputs = self.encoder( File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 397, in forward layer_outputs = encoder_layer( File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/modeling_layers.py", line 93, in __call__ return super().__call__(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 317, in forward hidden_states, attn_weights = self.self_attn( File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 243, in forward attn_output, attn_weights = attention_interface( File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py", line 89, in sdpa_attention_forward attn_output = torch.nn.functional.scaled_dot_product_attention( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.19 GiB. GPU 0 has a total capacity of 3.81 GiB of which 13.12 MiB is free. Including non-PyTorch memory, this process has 3.79 GiB memory in use. Of the allocated memory 2.98 GiB is allocated by PyTorch, and 109.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables ) 2025-08-24 09:55:58,032 WARNING docling.cli.main: Document /tmp/tmpzb0qkued/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf failed to convert.

The VLM pipeline tried to load a vision model that needs ~3.2 GB VRAM, but your GPU only has 3.8 GB total and something else already occupies 3.7 GB.
You have two simple ways out.

Run on the CPU (slow, but always works)

bash

Copy

docling \
  /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf \
  --pipeline vlm \
  --device cpu \
  --enrich-formula \
  --image-export-mode referenced

No GPU memory is used, so the process will finish (it just takes longer).

Free GPU memory and retry on CUDA

Quit the programs that occupy the GPU (browser, other PyTorch notebooks, etc.) or

bash

Copy

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True docling ...same options as above...

If it still fails, CPU is the only practical option.

(Optional) Reduce the batch size

In the Python API you can also set
pipeline_options.page_batch_size=1
to load only one page at a time, but the CLI does not expose that switch yet.

Pick option 1 for an immediate solution.

prokophapala@carbsisYoga:~$ docling /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf --image-export-mode=referenced --pipeline vlm --enrich-formula --device cpu /home/prokophapala/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2751: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`. warnings.warn( 2025-08-24 09:57:52,840 WARNING docling.pipeline.base_pipeline: Encountered an error during conversion of document 9e5b7daa25dda9668657526fcd57523ffb6e590112b209ca07bd78ed175f87bc: Traceback (most recent call last): File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 164, in _build_document for p in pipeline_pages: # Must exhaust! File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 129, in _apply_on_pages yield from page_batch File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 172, in __call__ predictions = list(self.process_images(images, user_prompts)) File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 280, in process_images generated_ids = self.vlm_model.generate(**gen_kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2617, in generate result = self._sample( File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 3598, in _sample outputs = self(**model_inputs, return_dict=True) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper output = func(self, *args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 973, in forward outputs = self.model( File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper output = func(self, *args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 795, in forward image_hidden_states = self.get_image_features(pixel_values, pixel_attention_mask) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 722, in get_image_features image_hidden_states = self.vision_model(pixel_values=pixel_values, patch_attention_mask=patch_attention_mask) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 572, in forward encoder_outputs = self.encoder( File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 397, in forward layer_outputs = encoder_layer( File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/modeling_layers.py", line 93, in __call__ return super().__call__(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 317, in forward hidden_states, attn_weights = self.self_attn( File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 243, in forward attn_output, attn_weights = attention_interface( File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py", line 89, in sdpa_attention_forward attn_output = torch.nn.functional.scaled_dot_product_attention( torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.19 GiB. GPU 0 has a total capacity of 3.81 GiB of which 13.12 MiB is free. Including non-PyTorch memory, this process has 3.79 GiB memory in use. Of the allocated memory 2.98 GiB is allocated by PyTorch, and 109.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables ) 2025-08-24 09:57:52,841 WARNING docling.cli.main: Document /tmp/tmp3fpongmm/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf failed to convert. prokophapala@carbsisYoga:~$

Even on the CPU the VLM model needs ≈ 7–8 GB RAM (the weights + activations).
Your machine is hitting swap and the process is still being killed by the OOM killer (the traceback is identical because the CUDA allocator was still being initialised even though you asked for --device cpu).
In other words, the hardware you have right now simply does not have enough RAM/VRAM to run the heavy vision model.

What you can do today

Give up on the VLM pipeline and fall back to the standard pipeline – you will not get perfect LaTeX for the formulas, but the text is extracted correctly and the file is small:

bash

Copy

docling \
  /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf \
  --pipeline standard \
  --image-export-mode referenced \
  --to md

(Future) Run it on a machine with ≥ 8 GB free RAM or on Google Colab, e.g.:

Python

Copy

!pip install docling[vlm]
!docling --pipeline vlm --enrich-formula --device cuda /content/your.pdf

(Alternative) Use a pure OCR tool such as pix2tex, Mathpix, or Snip to get the equations as LaTeX and then paste them into the Markdown you just produced with the standard pipeline.

For the moment, option 1 is the only one that will finish successfully on your current laptop.