Getting Started with the Analytics API
This guide will walk you through making your first couple of requests to the Babel Street Analytics API. We’ll make both a call to a Match endpoint and a Text Analytics endpoint.
The API is stateless, meaning that each request is independent and does not rely on any previous requests. This means you can make requests in any order and do not need to worry about maintaining state between requests.
Authentication
This guide assumes you have an API key if you are using Babel Street Hosted Services, the SaaS version of the API.
On Hosted Servces, all endpoints expect the API Key to be passed as an HTTP header:
Replace [your_api_key] with your personal API key.
If you are using an on-premise version of the API installed on your server, no API key is necessary.
The API key is used to authenticate your requests and ensure that only authorized users can access the API. If you do not have an API key, please contact your Babel Street account manager or support team to obtain one.
Name Matching
The first request we’ll make is to the Match endpoint. These endpoints are used to match names, addresses, and records.
The only required fields when matching names are the names themselves. However, you can specify the language, entity type, and script. We strongly recommend adding the language if the language is known. If entity type is not specified, the default is PERSON. If the script is not specified, the default is Latin.
Compare two names
Let’s start with the /name-similarity endpoint. This endpoint compares two names, returning a similarity score between 0 and 1. A score of 1 indicates the names are identical, while a score of 0 indicates the names are completely different.
Let’s compare a name written in different languages. In this example, we compare the names “Michael Jackson” and “迈克尔·杰克逊”. As with all the Analytics endpoints, the /name-similarity endpoint is multilingual. The endpoint will automatically detect the language of the input names, but you should specify the language if you know it.
First, let’s run the command through this portal.
- Navigate to the Analytics API Reference
- Select the /name-similarity endpoint.
- Click on the Try it out button. This will bring you to the SDK documentation platform, where you can interactively modify parameters and execute calls to the API.
- In the
X-BabelStreetAPI-Key
field, enter your API key. - The example is precoded for you. Select Send Request to execute the request.
Or you could copy and run the following cURL example from a command window:
The response will be a similarity score:
Compare company names
While the /name-similarity endpoint compares names, there are different kinds of names. Match selects the algorithms, stop words, overrides, and parameter values based on the entity type. If no entityType
is specified, the default is PERSON
.
When comparing company names, be sure to set the entityType
to ORGANIZATION
. This enables matching using Real World Ids.
Real World Ids
Organizations and companies often have nicknames which are very different from the company’s official name. For example, International Business Machines, or IBM, is known by the nickname Big Blue. As there is no phonetic similarity between the two names, a match query between those two organization names would result in a low score. A real world identifier associates companies, along with their associated nicknames and permutations, with an identifier. When enabled, a search between two company names will include a comparison between the real world identifiers for the two names, thus matching dissimilar names for the same corporate entity.
Let’s use the portal and change the example to compare two company names. We’ll be staying on the /name-similarity endpoint, since it is used to compare all kinds of names, including company names.
- On the left side of the screen, go to the
Body Parameters
section. - In the
name1
text
field, enterDunkin
. - In the
name2
text
field, enterDunkin Donuts
. - Select Send Request to execute the request.
Remember, the default entity type is PERSON
, so we are not using the Real World Ids. The response will be a similarity score:
Now let’s add in the entityType
parameter.
- In the
name1
entityType
field, enterORGANIZATION
. - In the
name2
entityType
field, enterORGANIZATION
. - Select Send Request to execute the request.
This time, because the type is
ORGANIZATION
, Real World Ids will be used for matching.
The response will be a similarity score:
Text Analytics
The text analytics endpoints process unstructured text documents, identifying languages, topics, and extracting critical business information.
The unstructured text can be provided as either::
content
: plain text passed directly in as a request parametercontentUri
: a valid HTTP, HTTPs, or FTP url to publicly accessible text content. The content will be downloaded and processed.contentFile
: a file uploaded to the API. The file must be a plain text file with a.txt
extension.
We’re going to provide the text content directly in the request.
Extract Entities
Let’s use the /entities endpoint to extract entities from a document. The entities endpoint extracts entities from a document, such as names, locations, and dates. The endpoint returns the entities found in the document, along with their types and locations in the text.
You don’t need to specify the language, though you can if you know it. If you don’t know it, the endpoint will automatically detect it.
The content is provided as a JSON object in the request body.
This will return a structure listing all entities, along with their positions in the text.