: It extracts attachments, such as images from a PDF, in their original binary format. Note that by default, this endpoint is not recursive; it only extracts the immediate child documents.
: Ideal for simple search indexing where you only need a single blob of text and don't care about the distinct metadata of embedded attachments. 2. The /rmeta Endpoint: Detailed Hierarchy tikaserverendpointscompared
The /rmeta (Recursive Metadata) endpoint is the preferred choice for modern, complex data processing. Unlike standard endpoints, it provides a structured view of a file and all its internal components. : It extracts attachments, such as images from
| Aspect | Generic Endpoint | TiKA Endpoint | | :--- | :--- | :--- | | | Stateless (except login) | Stateless but token-bound | | Cacheability | Segment URLs are static | Segments URLs change per token (harder for public CDN) | | Security Model | One token = all assets | One token = one asset, limited time, optional IP binding | | Seamless Seek | Relies on Range header support | Uses explicit start/end query params | | Logging Granularity | Per request (may lack session context) | Session ID + sequence number embedded in most endpoints | | CORS Complexity | Needs per-endpoint config | Uniform handling via /v1/info endpoint | | Aspect | Generic Endpoint | TiKA Endpoint
Deep analysis or manual inspection of individual file components.
: It extracts attachments, such as images from a PDF, in their original binary format. Note that by default, this endpoint is not recursive; it only extracts the immediate child documents.
: Ideal for simple search indexing where you only need a single blob of text and don't care about the distinct metadata of embedded attachments. 2. The /rmeta Endpoint: Detailed Hierarchy
The /rmeta (Recursive Metadata) endpoint is the preferred choice for modern, complex data processing. Unlike standard endpoints, it provides a structured view of a file and all its internal components.
| Aspect | Generic Endpoint | TiKA Endpoint | | :--- | :--- | :--- | | | Stateless (except login) | Stateless but token-bound | | Cacheability | Segment URLs are static | Segments URLs change per token (harder for public CDN) | | Security Model | One token = all assets | One token = one asset, limited time, optional IP binding | | Seamless Seek | Relies on Range header support | Uses explicit start/end query params | | Logging Granularity | Per request (may lack session context) | Session ID + sequence number embedded in most endpoints | | CORS Complexity | Needs per-endpoint config | Uniform handling via /v1/info endpoint |
Deep analysis or manual inspection of individual file components.