Skip to main content

An integration package connecting Unstructured and LangChain

Project description

langchain-unstructured

This package contains the LangChain integration with Unstructured

Installation

pip install -U langchain-unstructured

And you should configure credentials by setting the following environment variables:

export UNSTRUCTURED_API_KEY="your-api-key"

Loaders

Partition and load files using either the unstructured-client sdk and the Unstructured API or locally using the unstructured library.

API: To partition via the Unstructured API pip install unstructured-client and set partition_via_api=True and define api_key. If you are running the unstructured API locally, you can change the API rule by defining url when you initialize the loader. The hosted Unstructured API requires an API key. See the links below to learn more about our API offerings and get an API key.

Local: By default the file loader uses the Unstructured partition function and will automatically detect the file type.

In addition to document specific partition parameters, Unstructured has a rich set of "chunking" parameters for post-processing elements into more useful text segments for uses cases such as Retrieval Augmented Generation (RAG). You can pass additional Unstructured kwargs to the loader to configure different unstructured settings.

Setup:

    pip install -U langchain-unstructured
    pip install -U unstructured-client
    export UNSTRUCTURED_API_KEY="your-api-key"

Instantiate:

from langchain_unstructured import UnstructuredLoader

loader = UnstructuredLoader(
    file_path = ["example.pdf", "fake.pdf"],
    api_key=UNSTRUCTURED_API_KEY,
    partition_via_api=True,
    chunking_strategy="by_title",
    strategy="fast",
)

Load:

docs = loader.load()

print(docs[0].page_content[:100])
print(docs[0].metadata)

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_unstructured-1.0.1.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_unstructured-1.0.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file langchain_unstructured-1.0.1.tar.gz.

File metadata

  • Download URL: langchain_unstructured-1.0.1.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for langchain_unstructured-1.0.1.tar.gz
Algorithm Hash digest
SHA256 9b4b832b8fcdef8598ff634ec6fc0e344e6d8fdae854c5727f717d605d28e406
MD5 ec5ec7121255d878560504051374bc5c
BLAKE2b-256 de837d0821c03868d69c52a385891772a9c6f931a8a6cd7c16a2e397c01a5ee2

See more details on using hashes here.

File details

Details for the file langchain_unstructured-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_unstructured-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1cdb00b3bccc05daa6f03bc3991b0a76270fd2fc095b358c23bd08c5f0e05f50
MD5 3ef68b9997de5b81af9a73d14c8fa0f8
BLAKE2b-256 cc17803613614fa4cec18d7dc9953a6029aa038ae4e693ef1b8375d08c30718c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page