Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt

Use this file to discover all available pages before exploring further.

After partitioning, you can have Unstructured generate a list of recognized entities and their types (such as the names of organizations, products, and people) in the content, through a process known as named entity recognition (NER). You can also have Unstructured generate a list of relationships between the entities that are recognized. This NER is done by using models offered through various model providers. Here is an example of a list of recognized entities and their entity types, along with a list of relationships between those entities and their relationship types, using GPT-4o. Note specifically the entities field that is added to the metadata field.
{
    "type": "CompositeElement",
    "element_id": "bc8333ea0d374670ff0bd03c6126e70d",
    "text": "SECTION. 3\n\nThe Senate of the United States shall be composed of two Senators from each State, 
        [chosen by the Legislature there- of,]* for six Years; and each Senator shall have one Vote.\n\n
        Immediately after they shall be assembled in Consequence of the first Election, they shall be divided
        as equally as may be into three Classes. The Seats of the Senators of the first Class shall be vacated
        at the Expiration of the second Year, of the second Class at the Expiration of the fourth Year, and of
        the third Class at the Expiration of the sixth Year, so that one third may be chosen every second Year;
        [and if Vacan- cies happen by Resignation, or otherwise, during the Recess of the Legislature of any
        State, the Executive thereof may make temporary Appointments until the next Meeting of the Legislature,
        which shall then fill such Vacancies.]*\n\nC O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
    "metadata": {
        "filename": "constitution.pdf",
        "filetype": "application/pdf",
        "languages": [
            "eng"
        ],
        "page_number": 2,
        "entities": {
            "items": [
                {
                    "entity": "Senate",
                    "type": "ORGANIZATION"
                },
                {
                    "entity": "United States",
                    "type": "LOCATION"
                },
                {
                    "entity": "Senator",
                    "type": "ROLE"
                },
                {
                    "entity": "State",
                    "type": "LOCATION"
                },
                {
                    "entity": "Legislature",
                    "type": "ORGANIZATION"
                },
                {
                    "entity": "Executive",
                    "type": "ROLE"
                },
                {
                    "entity": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
                    "type": "DOCUMENT"
                }
            ],
            "relationships": [
                {
                    "from": "Senate",
                    "relationship": "based_in",
                    "to": "United States"
                },
                {
                    "from": "Senator",
                    "relationship": "has_role",
                    "to": "Senate"
                },
                {
                    "from": "Legislature",
                    "relationship": "has_office_in",
                    "to": "State"
                },
                {
                    "from": "Executive",
                    "relationship": "has_role",
                    "to": "State"
                },
                {
                    "from": "C O N S T I T U T I O N O F T H E U N I T E D S T A T E S",
                    "relationship": "dated",
                    "to": "DATE"
                }
            ]
        }
    }
}
Here is another example of some of the entities, their entity types, and relationships that are recognized for a given paragraph of text. This information is generated by GPT-4o by OpenAI: Named entity recognition for information in a paragraph of text By default, the following entity types are supported for NER:
  • PERSON
  • ORGANIZATION
  • LOCATION
  • DATE
  • TIME
  • EVENT
  • MONEY
  • PERCENT
  • FACILITY
  • PRODUCT
  • ROLE
  • DOCUMENT
  • DATASET
By default, the following entity relationships are supported for NER:
  • PERSON - ORGANIZATION: works_for, affiliated_with, founded
  • PERSON - LOCATION: born_in, lives_in, traveled_to
  • ORGANIZATION - LOCATION: based_in, has_office_in
  • Entity - DATE: occurred_on, founded_on, died_on, published_in
  • PERSON - PERSON: married_to, parent_of, colleague_of
  • PRODUCT - ORGANIZATION: developed_by, owned_by
  • EVENT - LOCATION: held_in, occurred_in
  • Entity - ROLE: has_title, acts_as, has_role
  • DATASET - PERSON: mentions
  • DATASET - DOCUMENT: located_in
  • PERSON - DATASET: published
  • DOCUMENT - DOCUMENT: referenced_in, contains
  • DOCUMENT - DATE: dated
  • PERSON - DOCUMENT: published
You can add, rename, or delete items in this list of default entity types and default entity relationship types. You can also add any clarifying instructions to the prompt that is used to run NER. To do this, see the next section.

Generate a list of entities and their relationships

To have Unstructured generate list of entities and their relationships, do the following: