Extracting Equations with Docling
I'm using
https://github.com/docling-project/docling
https://docling-project.github.io/docling/usage/
to convert .pdf to markdown. However it convert images into text, like:
.That is the default behaviour when you let Docling export to Markdown, because the Markdown spec allows it.
If you want to keep the pictures out of the file you have two options:
- Export to a different format
Docling can emit Markdown, but also pure HTML, JSON, etc.
HTML keeps the pictures as external files (or drops them if you ask for “plain text”), so just pick a different exporter:
Python
Copy
from docling.document_converter import DocumentConverter, HtmlFormatOption
from docling.datamodel.base_models import InputFormat
conv = DocumentConverter(
format_options={
InputFormat.PDF: HtmlFormatOption() # ← no base-64 images
}
)
result = conv.convert("my.pdf")
result.document.save_as_html("out.html")
- Keep Markdown, but disable image embedding
The Markdown exporter has a flagimage_mode
(orembed_images
, depending on the version).
Current versions:
Python
Copy
from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = False # optional – skip OCR
pipeline_options.images_scale = 1.0 # keep original size
pipeline_options.generate_page_images = False # no full-page images
conv = DocumentConverter(pipeline_options=pipeline_options)
result = conv.convert("my.pdf")
# choose “reference” or “placeholder” so pictures become external files
md = result.document.export_to_markdown(image_mode="reference")
with open("out.md", "w", encoding="utf-8") as f:
f.write(md)
In older versions the flag was:
Python
Copy
result.document.save_as_markdown(embed_images=False)
Quick CLI check
If you are using the command line:
If you are using the command line:
bash
Copy
docling my.pdf --to md --image-mode=reference
(or
--embed-images=false
on older releases).After the conversion the Markdown will contain normal links such as

instead of the huge base-64 blob, and the document size stays small and readable.
what is essential to me is extract the equations in good shape, this is how they look now. Is there something I can do to export them in better shape?
---
Algorithm 1 Simulation Step in Vivace
- 1: q 0 ← q + h v + h 2 M -1 f
- 2: Graph coloring: V = { q i , i = 1 , . . . , N } is partitioned into p colors C 1 , . . . , C p , such that ∀ ( q i , q j ) ∈ C i , q i and q j are not shared by any constraint
```
t t ext 3: for k = 0 . . . K -1 do 4: for each partition C i ⊂ V do 5: for each q i ∈ C i do in parallel 6: ˆ q k +1 i ← solve ( q 0 i , q k i ) 7: q k +1 i ← ω ( ˆ q k +1 i -q k -1 i ) + q k -1 i 8: q t +1 ← q K 9: v t +1 ← ( q t +1 -q t ) /h
```
Algorithm 2 Vivace Graph Coloring Procedure [Grable and Panconesi 2000]
```
1: U ← V glyph[triangleright] Initialization 2: for all vertex v ∈ U do 3: P v ←{ 0 , . . . , ∆ v /s } 4: while | U | > 0 do 5: for all vertices v ∈ U do glyph[triangleright] Tentative coloring 6: c ( v ) ← random color in P v 7: I ← ∅ 8: for all vertices v ∈ U do glyph[triangleright] Conflict resolution 9: S ←{ colors of all the neighbors of v } 10: if c ( v ) / ∈ S then 11: I ← I ∪ { v } 12: remove c ( v ) from palette of neighbors of v 13: U ← U -I 14: for all vertices v ∈ U do glyph[triangleright] Feed the hungry 15: if | P v | = 0 then 16: P v ← P v ∪ {| P v | +1 }
```
Docling already has a dedicated “Math formula → LaTeX” module (powered by Nougat / TexTeller).
If you enable it, every displayed / inline equation is exported as proper LaTeX instead of the ASCII-art you are seeing.
If you enable it, every displayed / inline equation is exported as proper LaTeX instead of the ASCII-art you are seeing.
Step-by-step
- Install the optional math model (once)
bash
Copy
pip install "docling[vlm]" # pulls the Nougat weights
- Tell Docling to use the VLM pipeline (vision-language model) or the hybrid pipeline that keeps layout and does formula recognition.
Python
Copy
from docling.document_converter import DocumentConverter
from docling.datamodel.pipeline_options import PdfPipelineOptions
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = False # no need, VLM does it
pipeline_options.do_formula = True # <-- crucial switch
pipeline_options.generate_page_images = False # we do not need page pics
conv = DocumentConverter(pipeline_options=pipeline_options)
result = conv.convert("paper.pdf")
md = result.document.export_to_markdown(
image_mode="reference", # keep images external (from earlier answer)
math_mode="latex" # <-- export formulas as LaTeX
)
with open("paper.md", "w", encoding="utf-8") as f:
f.write(md)
- The output for your two snippets becomes
Copy
Algorithm 1 Simulation Step in Vivace
1: $q^{0} \leftarrow q + h\,v + \frac{h^{2}}{2}M^{-1}f$
2: Graph coloring: $V = \{q_i, i=1,\dots,N\}$ is partitioned into $p$ colors
$C_1,\dots,C_p$ such that $\forall (q_i,q_j)\in C_i$, $q_i$ and $q_j$
are not shared by any constraint.
3: for $k=0\dots K-1$ do
4: for each partition $C_i\subset V$ do
5: for each $q_i\in C_i$ do in parallel
6: $\hat q_i^{k+1} \leftarrow \operatorname{solve}(q_i^0, q_i^k)$
7: $q_i^{k+1} \leftarrow \omega(\hat q_i^{k+1}-q_i^{k-1}) + q_i^{k-1}$
8: $q^{t+1} \leftarrow q^K$
9: $v^{t+1} \leftarrow (q^{t+1}-q^t)/h$
and
Copy
Algorithm 2 Vivace Graph Coloring Procedure [Grable and Panconesi 2000]
1: $U \leftarrow V$ /* Initialization */
2: for all vertex $v\in U$ do
3: $P_v \leftarrow \{0,\dots,\Delta_v/s\}$
4: while $|U|>0$ do
5: for all vertices $v\in U$ do /* Tentative coloring */
6: $c(v) \leftarrow \text{random color in }P_v$
7: $I \leftarrow \emptyset$
8: for all vertices $v\in U$ do /* Conflict resolution */
9: $S \leftarrow \{\text{colors of all the neighbors of }v\}$
10: if $c(v)\notin S$ then
11: $I \leftarrow I \cup \{v\}$
12: remove $c(v)$ from palette of neighbors of $v$
13: $U \leftarrow U - I$
14: for all vertices $v\in U$ do /* Feed the hungry */
15: if $|P_v|=0$ then
16: $P_v \leftarrow P_v \cup \{|P_v|+1\}$
The equations are now high-quality LaTeX, ready for rendering in any Markdown
viewer that supports MathJax / KaTeX, or for copy-pasting into LaTeX documents.
Command-line one-liner
bash
Copy
docling paper.pdf --to md \
--pipeline vlm \
--math-mode=latex \
--image-mode=reference
That is all you need to get the mathematics out of the PDF in good shape.
[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python3 -m pip install --upgrade pip
prokophapala@carbsisYoga:~$ docling /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf --image-export-mode=referenced --pipeline vlm --math-mode=latex
Usage: docling [OPTIONS] source
Try 'docling --help' for help.
╭─ Error ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ No such option: --math-mode (Possible options: --asr-model, --table-mode, --vlm-model) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
prokophapala@carbsisYoga:~$ ls
bakY700 Desktop Fireball_Data llmware_data nltk_data Pictures SW Z_plochy
bin Documents git Mendelay_Papers NOTES Public Templates
bitmap.png Downloads git_SW microsoft.asc Notes_Arvind_Narayanan.txt SIMULATIONS timer.dat
cel_.lvs Dropbox godot_projets miniconda3 orbital_dynamics sketchbook Videos
cuda-keyring_1.1-1_all.deb e2e.log HBsmall Miniconda3-latest-Linux-x86_64.sh packages-microsoft-prod.deb slovanka_primary.conf Vivace_paralel_triangular_matrix_GPU_nvr-2011-001_artifacts
curl Figure_1.png -image.bas Music paralel_triangular_matrix_GPU_nvr-2011-001.pdf snap Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.md
prokophapala@carbsisYoga:~$ docling --help
Usage: docling [OPTIONS] source
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * input_sources source PDF files to convert. Can be local file / directory paths or URL. [default: None] [required] │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --from [docx|pptx|html|image|pdf|asciidoc|md|csv|xlsx|xml_uspto|xml_jats|mets_gbs|json_d Specify input formats to convert from. Defaults to all formats. [default: None] │
│ ocling|audio] │
│ --to [md|json|html|html_split_page|text|doctags] Specify output formats. Defaults to Markdown. [default: None] │
│ --show-layout --no-show-layout If enabled, the page images will show the bounding-boxes of the items. │
│ [default: no-show-layout] │
│ --headers TEXT Specify http request headers used when fetching url input sources in the form of a │
│ JSON string │
│ [default: None] │
│ --image-export-mode [placeholder|embedded|referenced] Image export mode for the document (only in case of JSON, Markdown or HTML). With │
│ `placeholder`, only the position of the image is marked in the output. In │
│ `embedded` mode, the image is embedded as base64 encoded string. In `referenced` │
│ mode, the image is exported in PNG format and referenced from the main exported │
│ document. │
│ [default: embedded] │
│ --pipeline [standard|vlm|asr] Choose the pipeline to process PDF or image files. [default: standard] │
│ --vlm-model [smoldocling|smoldocling_vllm|granite_vision|granite_vision_vllm|granite_vision_o Choose the VLM model to use with PDF or image files. [default: smoldocling] │
│ llama|got_ocr_2] │
│ --asr-model [whisper_tiny|whisper_small|whisper_medium|whisper_base|whisper_large|whisper_tur Choose the ASR model to use with audio/video files. [default: whisper_tiny] │
│ bo] │
│ --ocr --no-ocr If enabled, the bitmap content will be processed using OCR. [default: ocr] │
│ --force-ocr --no-force-ocr Replace any existing text with OCR generated text over the full content. │
│ [default: no-force-ocr] │
│ --ocr-engine TEXT The OCR engine to use. When --allow-external-plugins is *not* set, the available │
│ values are: easyocr, ocrmac, rapidocr, tesserocr, tesseract. Use the option │
│ --show-external-plugins to see the options allowed with external plugins. │
│ [default: easyocr] │
│ --ocr-lang TEXT Provide a comma-separated list of languages used by the OCR engine. Note that each │
│ OCR engine has different values for the language names. │
│ [default: None] │
│ --pdf-backend [pypdfium2|dlparse_v1|dlparse_v2|dlparse_v4] The PDF backend to use. [default: dlparse_v2] │
│ --table-mode [fast|accurate] The mode to use in the table structure model. [default: accurate] │
│ --enrich-code --no-enrich-code Enable the code enrichment model in the pipeline. [default: no-enrich-code] │
│ --enrich-formula --no-enrich-formula Enable the formula enrichment model in the pipeline. [default: no-enrich-formula] │
│ --enrich-picture-classes --no-enrich-picture-classes Enable the picture classification enrichment model in the pipeline. │
│ [default: no-enrich-picture-classes] │
│ --enrich-picture-description --no-enrich-picture-description Enable the picture description model in the pipeline. │
│ [default: no-enrich-picture-description] │
│ --artifacts-path PATH If provided, the location of the model artifacts. [default: None] │
│ --enable-remote-services --no-enable-remote-services Must be enabled when using models connecting to remote services. │
│ [default: no-enable-remote-services] │
│ --allow-external-plugins --no-allow-external-plugins Must be enabled for loading modules from third-party plugins. │
│ [default: no-allow-external-plugins] │
│ --show-external-plugins --no-show-external-plugins List the third-party plugins which are available when the option │
│ --allow-external-plugins is set. │
│ [default: no-show-external-plugins] │
│ --abort-on-error --no-abort-on-error If enabled, the processing will be aborted when the first error is encountered. │
│ [default: no-abort-on-error] │
│ --output PATH Output directory where results are saved. [default: .] │
│ --verbose -v INTEGER Set the verbosity level. -v for info logging, -vv for debug logging. [default: 0] │
│ --debug-visualize-cells --no-debug-visualize-cells Enable debug output which visualizes the PDF cells │
│ [default: no-debug-visualize-cells] │
│ --debug-visualize-ocr --no-debug-visualize-ocr Enable debug output which visualizes the OCR cells │
│ [default: no-debug-visualize-ocr] │
│ --debug-visualize-layout --no-debug-visualize-layout Enable debug output which visualizes the layour clusters │
│ [default: no-debug-visualize-layout] │
│ --debug-visualize-tables --no-debug-visualize-tables Enable debug output which visualizes the table cells │
│ [default: no-debug-visualize-tables] │
│ --version Show version information. │
│ --document-timeout FLOAT The timeout for processing each document, in seconds. [default: None] │
│ --num-threads INTEGER Number of threads [default: 4] │
│ --device [auto|cpu|cuda|mps] Accelerator device [default: auto] │
│ --logo Docling logo │
│ --page-batch-size INTEGER Number of pages processed in one batch. Default: 4 [default: 4] │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
prokophapala@carbsisYoga:~$
The CLI you are using is newer than the one in the docs – it no longer exposes a stand-alone
Instead, formula recognition is now controlled by the “enrichment” flags that you can see in the help text.
--math-mode
switch.Instead, formula recognition is now controlled by the “enrichment” flags that you can see in the help text.
So run:
bash
Copy
docling \
Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf \
--pipeline vlm \
--enrich-formula \
--to md \
--image-export-mode referenced
(or
--image-export-mode placeholder
if you do not need the images at all).With
--enrich-formula
enabled Docling will:- invoke the Vision-Language model (
vlm
) pipeline - detect every equation and replace it with proper LaTeX (rendered as
$$ … $$
or$ … $
in the Markdown) - output nothing as base-64 blobs.
That is the direct equivalent of the old
--math-mode=latex
option.prokophapala@carbsisYoga:~$ docling /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf --image-export-mode=referenced --pipeline vlm --enrich-formula
/home/prokophapala/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2751: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`.
warnings.warn(
2025-08-24 09:55:58,031 WARNING docling.pipeline.base_pipeline: Encountered an error during conversion of document 9e5b7daa25dda9668657526fcd57523ffb6e590112b209ca07bd78ed175f87bc:
Traceback (most recent call last):
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 164, in _build_document
for p in pipeline_pages: # Must exhaust!
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 129, in _apply_on_pages
yield from page_batch
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 172, in __call__
predictions = list(self.process_images(images, user_prompts))
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 280, in process_images
generated_ids = self.vlm_model.generate(**gen_kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2617, in generate
result = self._sample(
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 3598, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper
output = func(self, *args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 973, in forward
outputs = self.model(
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper
output = func(self, *args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 795, in forward
image_hidden_states = self.get_image_features(pixel_values, pixel_attention_mask)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 722, in get_image_features
image_hidden_states = self.vision_model(pixel_values=pixel_values, patch_attention_mask=patch_attention_mask)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 572, in forward
encoder_outputs = self.encoder(
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 397, in forward
layer_outputs = encoder_layer(
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/modeling_layers.py", line 93, in __call__
return super().__call__(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 317, in forward
hidden_states, attn_weights = self.self_attn(
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 243, in forward
attn_output, attn_weights = attention_interface(
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py", line 89, in sdpa_attention_forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.19 GiB. GPU 0 has a total capacity of 3.81 GiB of which 13.12 MiB is free. Including non-PyTorch memory, this process has 3.79 GiB memory in use. Of the allocated memory 2.98 GiB is allocated by PyTorch, and 109.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables )
2025-08-24 09:55:58,032 WARNING docling.cli.main: Document /tmp/tmpzb0qkued/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf failed to convert.
The VLM pipeline tried to load a vision model that needs ~3.2 GB VRAM, but your GPU only has 3.8 GB total and something else already occupies 3.7 GB.
You have two simple ways out.
You have two simple ways out.
- Run on the CPU (slow, but always works)
bash
Copy
docling \
/home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf \
--pipeline vlm \
--device cpu \
--enrich-formula \
--image-export-mode referenced
No GPU memory is used, so the process will finish (it just takes longer).
- Free GPU memory and retry on CUDA
Quit the programs that occupy the GPU (browser, other PyTorch notebooks, etc.) or
bash
Copy
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True docling ...same options as above...
If it still fails, CPU is the only practical option.
- (Optional) Reduce the batch size
In the Python API you can also set
to load only one page at a time, but the CLI does not expose that switch yet.
pipeline_options.page_batch_size=1
to load only one page at a time, but the CLI does not expose that switch yet.
Pick option 1 for an immediate solution.
prokophapala@carbsisYoga:~$ docling /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf --image-export-mode=referenced --pipeline vlm --enrich-formula --device cpu
/home/prokophapala/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2751: UserWarning: `max_length` is ignored when `padding`=`True` and there is no truncation strategy. To pad to max length, use `padding='max_length'`.
warnings.warn(
2025-08-24 09:57:52,840 WARNING docling.pipeline.base_pipeline: Encountered an error during conversion of document 9e5b7daa25dda9668657526fcd57523ffb6e590112b209ca07bd78ed175f87bc:
Traceback (most recent call last):
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 164, in _build_document
for p in pipeline_pages: # Must exhaust!
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 129, in _apply_on_pages
yield from page_batch
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 172, in __call__
predictions = list(self.process_images(images, user_prompts))
File "/home/prokophapala/.local/lib/python3.10/site-packages/docling/models/vlm_models_inline/hf_transformers_model.py", line 280, in process_images
generated_ids = self.vlm_model.generate(**gen_kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 2617, in generate
result = self._sample(
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/generation/utils.py", line 3598, in _sample
outputs = self(**model_inputs, return_dict=True)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper
output = func(self, *args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 973, in forward
outputs = self.model(
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/utils/generic.py", line 959, in wrapper
output = func(self, *args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 795, in forward
image_hidden_states = self.get_image_features(pixel_values, pixel_attention_mask)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 722, in get_image_features
image_hidden_states = self.vision_model(pixel_values=pixel_values, patch_attention_mask=patch_attention_mask)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 572, in forward
encoder_outputs = self.encoder(
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 397, in forward
layer_outputs = encoder_layer(
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/modeling_layers.py", line 93, in __call__
return super().__call__(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 317, in forward
hidden_states, attn_weights = self.self_attn(
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/models/idefics3/modeling_idefics3.py", line 243, in forward
attn_output, attn_weights = attention_interface(
File "/home/prokophapala/.local/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py", line 89, in sdpa_attention_forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.19 GiB. GPU 0 has a total capacity of 3.81 GiB of which 13.12 MiB is free. Including non-PyTorch memory, this process has 3.79 GiB memory in use. Of the allocated memory 2.98 GiB is allocated by PyTorch, and 109.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables )
2025-08-24 09:57:52,841 WARNING docling.cli.main: Document /tmp/tmp3fpongmm/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf failed to convert.
prokophapala@carbsisYoga:~$
Even on the CPU the VLM model needs ≈ 7–8 GB RAM (the weights + activations).
Your machine is hitting swap and the process is still being killed by the OOM killer (the traceback is identical because the CUDA allocator was still being initialised even though you asked for
In other words, the hardware you have right now simply does not have enough RAM/VRAM to run the heavy vision model.
Your machine is hitting swap and the process is still being killed by the OOM killer (the traceback is identical because the CUDA allocator was still being initialised even though you asked for
--device cpu
).In other words, the hardware you have right now simply does not have enough RAM/VRAM to run the heavy vision model.
What you can do today
- Give up on the VLM pipeline and fall back to the standard pipeline – you will not get perfect LaTeX for the formulas, but the text is extracted correctly and the file is small:bashCopy
docling \ /home/prokophapala/Desktop/PAPERS/Projective_Dynamics/Vivace_paralel_triangular_matrix_GPU_nvr-2011-001.pdf \ --pipeline standard \ --image-export-mode referenced \ --to md
- (Future) Run it on a machine with ≥ 8 GB free RAM or on Google Colab, e.g.:PythonCopy
!pip install docling[vlm] !docling --pipeline vlm --enrich-formula --device cuda /content/your.pdf
- (Alternative) Use a pure OCR tool such as pix2tex, Mathpix, or Snip to get the equations as LaTeX and then paste them into the Markdown you just produced with the standard pipeline.
For the moment, option 1 is the only one that will finish successfully on your current laptop.