Skip to content

Get Semantic Insights

POST
/api/discovery/enhanced-search/semantic-insights

Generate semantic insights and analysis for a collection of documents.

Request Body:

  • document_ids: List of document IDs to analyze
  • model_name: Embedding model reported in the response (default: “multilingual-e5-large-instruct”)

Returns:

  • document_count: Number of documents analyzed
  • average_similarity_to_centroid: Average similarity to collection centroid
  • cohesion_score: Overall collection cohesion (0.0-1.0)
  • most_representative_document: Document ID closest to centroid
  • similarity_distribution: Statistical distribution of similarities
  • embedding_dimensions: Dimension count of embedding vectors

Raises:

  • 400: Invalid document IDs or analysis error
  • 404: No document embeddings found (reindex required)
  • 500: Insights generation failed
SemanticInsightsRequest
object
documentIds
required
Documentids

List of document IDs to analyze

Array<string>
modelName
Any of:
string

Successful Response

SemanticInsights

Semantic insights response model.

Provides comprehensive semantic analysis and insights for a collection of documents including cohesion scoring, similarity distribution, and representative document identification for collection understanding.

Fields:

  • document_count: Number of documents analyzed (non-negative integer)
  • average_similarity_to_centroid: Average similarity score (0.0-1.0) of all documents to the collection centroid
  • cohesion_score: Overall collection cohesion score (0.0-1.0) indicating how semantically cohesive the document collection is
  • most_representative_document: Document ID closest to the collection centroid, representing the most typical document
  • similarity_distribution: Dictionary containing statistical distribution of similarity scores (e.g., mean, median, std_dev)
  • embedding_dimensions: Dimension count of embedding vectors used (non-negative integer)

Usage: POST /api/discovery/enhanced-search/semantic-insights returns this response model.

JSON Example:

{
  "documentCount": 150,
  "averageSimilarityToCentroid": 0.75,
  "cohesionScore": 0.82,
  "mostRepresentativeDocument": "doc_45",
  "similarityDistribution": {
    "mean": 0.75,
    "median": 0.78,
    "stdDev": 0.12
  },
  "embeddingDimensions": 384
}
object
documentCount
required
Documentcount

Number of documents analyzed

integer
averageSimilarityToCentroid
required
Averagesimilaritytocentroid

Average similarity to collection centroid (0.0-1.0)

number
<= 1
cohesionScore
required
Cohesionscore

Collection cohesion score (0.0-1.0)

number
<= 1
mostRepresentativeDocument
required
Mostrepresentativedocument

Document ID closest to centroid

string
similarityDistribution
Similaritydistribution

Similarity statistics

object
key
additional properties
number
embeddingDimensions
required
Embeddingdimensions

Embedding vector dimensions

integer

Validation Error

HTTPValidationError
object
detail
Detail
Array<object>
ValidationError
object
loc
required
Location
Array
msg
required
Message
string
type
required
Error Type
string
input
Input
ctx
Context
object