Extract entities

Entity Extractor uses statistical or deep neural network based models, patterns, and exact matching to identify entities in documents. An entity refers to an object of interest such as a person, organization, location, date, or email address. Identifying entities can help you classify documents and the kinds of data they contain.

The statistical models are based on computational linguistics and human-annotated training documents. The patterns are regular expressions that identify entities such as dates, times, and geographical coordinates. The exact matcher uses lists of entities to match words exactly in one or more entities.

Statistical model based extractions can return confidence scores for each entity. Confidence score calculation correlates well with precision and may be used for thresholding and removal of false positives. Confidence is calculated by default if linkEntities is on. Otherwise, to include the scores in the result, add the calculateConfidence option to the request.

The entities endpoint can also return a salience score for each extracted entity. Salience indicates whether the entity is important to the overall scope of the document, for example, if it would be included in a summary of the document. Returned salience scores are binary, either 0 (not salient) or 1 (salient). To include the salience scores in your results, add the calculateSalience option to the request.

The normalized field returns the entity name with normalized white space around each word, one white space per token, which allows entity mention occurrences with different white space usage to be clustered together.

Headers

X-RosetteAPI-KeystringRequired

Query parameters

outputstringOptional
When set to Rosette, output is ADM format. Required to see entity mentions from indoc coref.

Request

This endpoint expects an object.
contentstringRequired
optionsobjectOptional

Response

OK
entitiesResponselist of objects or null
list of extracted entities