uv
. However, uv
and venv
are not required.
To work with all supported file types, run:
.txt
), HTML files (.html
), XML files (.xml
), and emails (.eml
, .msg
, and .p7s
) by default.
To further conserve disk space and reduce code dependencies, you can run the following command instead, replacing <extra>
with the appropriate extra for the target file type:
all-docs
(for all supported file types in this list)csv
(for .csv
files only)docx
(for .doc
and .docx
files only)epub
(for .epub
files only)image
(for all supported image file types: .bmp
, .heic
, .jpeg
, .png
, and .tiff
)md
(for .md
files only)odt
(for .odt
files only)org
(for .org
files only)pdf
(for .pdf
files only)pptx
(for .ppt
and .pptx
files only)rst
(for .rst
files only)rtf
(for .rtf
files only)tsv
(for .tsv
files only)xlsx
(for .xls
and .xlsx
files only)tesseract-lang
(for additional language support).epub
, .odt
, and .rtf
files. For .rtf
files, you must have version 2.14.2 or newer. Running this script will install the correct version for you.)