After partitioning and chunking, you can have Unstructured generate representations of each detected table in HTML markup format.This table-to-HTML output is done by using GPT-4o, provided through OpenAI.Here is an example of the HTML markup output of a detected table using GPT-4o. Note specifically the text_as_html field that is added.
Line breaks have been inserted here for readability. The output will not contain these line breaks.
To generate table-to-HTML output, in an Enrichment node in a workflow, for Model, select OpenAI (GPT-4o).Make sure after you choose this provider and model, that Table to HTML is also selected.
You can change a workflow’s table description settings only through Custom workflow settings.
Unstructured can potentially generate table-to-HTML output only for workflows that are configured as follows:
With a Partitioner node set to use the Auto or High Res partitioning strategy, and a table-to-HTML output node is added.
With a Partitioner node set to use the VLM partitioning strategy. No table-to-HTML output node is needed (or allowed).
Even with these configurations, Unstructured actually generates table-to-HTML output only for files that contain tables and are also eligible
for processing with the following partitioning strategies:
High Res, when the workflow’s Partitioner node is set to use Auto or High Res.
VLM or High Res, when the workflow’s Partitioner node is set to use VLM.
Unstructured never generates table-to-HTML output for workflows that are configured as follows:
With a Partitioner node set to use the Fast partitioning strategy.
With a Partitioner node set to use the Auto, High Res, or VLM partitioning strategy, for all files that Unstructured encounters that do not contain tables.