To extract the tables from PDF files using the partition_pdf, set the skip_infer_table_types parameter to False and strategy parameter to hi_res.Usage
Copy
Ask AI
from unstructured.partition.pdf import partition_pdffname = "example-docs/pdf/layout-parser-paper.pdf"elements = partition_pdf(filename=fname, skip_infer_table_types=False, strategy='hi_res', )tables = [el for el in elements if el.category == "Table"]print(tables[0].text)print(tables[0].metadata.text_as_html)
Method 2: Using Auto Partition or Unstructured API
By default, table extraction from all file types is enabled. To extract tables from PDFs and images using Auto Partition or Unstructured API parameters simply set strategy parameter to hi_res.Usage: Auto Partition
Copy
Ask AI
from unstructured.partition.auto import partitionfilename = "example-docs/pdf/layout-parser-paper.pdf"elements = partition(filename=filename, strategy='hi_res', )tables = [el for el in elements if el.category == "Table"]print(tables[0].text)print(tables[0].metadata.text_as_html)