Get Semantic Insights
POST /api/discovery/enhanced-search/semantic-insights
POST
/api/discovery/enhanced-search/semantic-insights
Generate semantic insights and analysis for a collection of documents.
Request Body:
document_ids: List of document IDs to analyzemodel_name: Embedding model reported in the response (default: “multilingual-e5-large-instruct”)
Returns:
document_count: Number of documents analyzedaverage_similarity_to_centroid: Average similarity to collection centroidcohesion_score: Overall collection cohesion (0.0-1.0)most_representative_document: Document ID closest to centroidsimilarity_distribution: Statistical distribution of similaritiesembedding_dimensions: Dimension count of embedding vectors
Raises:
- 400: Invalid document IDs or analysis error
- 404: No document embeddings found (reindex required)
- 500: Insights generation failed
Request Body required
Section titled “Request Body required ”Responses
Section titled “ Responses ”Successful Response
SemanticInsights
Semantic insights response model.
Provides comprehensive semantic analysis and insights for a collection of documents including cohesion scoring, similarity distribution, and representative document identification for collection understanding.
Fields:
document_count: Number of documents analyzed (non-negative integer)average_similarity_to_centroid: Average similarity score (0.0-1.0) of all documents to the collection centroidcohesion_score: Overall collection cohesion score (0.0-1.0) indicating how semantically cohesive the document collection ismost_representative_document: Document ID closest to the collection centroid, representing the most typical documentsimilarity_distribution: Dictionary containing statistical distribution of similarity scores (e.g., mean, median, std_dev)embedding_dimensions: Dimension count of embedding vectors used (non-negative integer)
Usage: POST /api/discovery/enhanced-search/semantic-insights returns this response model.
JSON Example:
{
"documentCount": 150,
"averageSimilarityToCentroid": 0.75,
"cohesionScore": 0.82,
"mostRepresentativeDocument": "doc_45",
"similarityDistribution": {
"mean": 0.75,
"median": 0.78,
"stdDev": 0.12
},
"embeddingDimensions": 384
}object
documentCount
required
Documentcount
Number of documents analyzed
integer
averageSimilarityToCentroid
required
Averagesimilaritytocentroid
Average similarity to collection centroid (0.0-1.0)
number
cohesionScore
required
Cohesionscore
Collection cohesion score (0.0-1.0)
number
mostRepresentativeDocument
required
Mostrepresentativedocument
Document ID closest to centroid
string
similarityDistribution
Similaritydistribution
Similarity statistics
object
key
additional properties
number
embeddingDimensions
required
Embeddingdimensions
Embedding vector dimensions
integer
Validation Error
HTTPValidationError
object
detail
Detail
Array<object>
ValidationErrorobject
loc
required
Location
Array
msg
required
Message
string
type
required
Error Type
string
input
Input
ctx
Context