Analytics Overview
Babel Street Analytics uses natural language processing, statistical modeling, and machine learning to analyze unstructured and semi-structured text across hundreds of language-script combinations, revealing valuable information and actionable data. Analytics provides endpoints for extracting entities and relationships, translating and comparing the similarity of names, categorizing and adding linguistic tags to text and more.
Each Analytics endpoints processes either names or documents. The structure and type of input data depends on the type of endpoint. The name endpoints match names, addresses, records. There is an endpoint to deduplicate names and one to translate names. The text analytics endpoints process unstructured text documents, identifying languages, topics, and extracting critical business information.
Server is the on-premises installation of Analytics, with access to Analytic’s functions as RESTful web service endpoints. This solves cloud security worries and allows customization (models/indexes) as needed for your business.
Cloud Limits
The maximum payload size is 600KB, with a maximum character count of 50,000.
By default, Analytics will only process one call (active HTTP connection) at a time. You can send a second call once you have received a response from the first. Interested in making multiple concurrent calls? Contact us.
The maximum size name for any of the name processing endpoints, (address-similarity, name-similarity, name-translation, name-deduplication, name-translation) is 500 characters.
Using the Analytics API
Input Parameters
All input parameters, including the text or names being analyzed, are defined in the request body, along with any options. Babel Street Analytics analyzes two different types of input data. Each endpoint only analyzes one of the following types of data:
- Names and addresses
- Documents of unstructured text
Options
The endpoints use options to pass settings or override default values for settings. The options syntax is:
where optionName is the name of the option and value is the selected value for the call.
For example, to return the confidence score for each entity returned by the /entities endpoint, the option calculateConfidence must be set to true.
Language Support and Selection
The documentation for each endpoint contains a Language Support section, which lists the endpoint’s supported languages and scripts. This list can be dynamically retrieved using the endoint’s code
GET /[endpoint]/supported-languagescode
method.
If you know the language of your input, include the three-letter language code as a parameter in your call. This will improve response time and the accuracy of your results. If you do not know the language of your text, the endpoint will run an extra process to automatically detect it.
If no language is provided, and the endpoint is unable to auto-detect it, an endpoint may provide a “Language xxx is not supported” error, where xxx indicates the language was not determined.