Getting Started with the Analytics API | Babel Street

This guide will walk you through making your first couple of requests to the Babel Street Analytics API. We’ll make both a call to a Match endpoint and a Text Analytics endpoint.

The API is stateless, meaning that each request is independent and does not rely on any previous requests. This means you can make requests in any order and do not need to worry about maintaining state between requests.

Authentication

This guide assumes you have an API key if you are using Babel Street Hosted Services, the SaaS version of the API.

On Hosted Servces, all endpoints expect the API Key to be passed as an HTTP header:

-H "X-BabelStreetAPI-Key: [your_api-key]"

Replace [your_api_key] with your personal API key.

If you are using an on-premise version of the API installed on your server, no API key is necessary.

The API key is used to authenticate your requests and ensure that only authorized users can access the API. If you do not have an API key, please contact your Babel Street account manager or support team to obtain one.

Name Matching

The first request we’ll make is to the Match endpoint. These endpoints are used to match names, addresses, and records.

The only required fields when matching names are the names themselves. However, you can specify the language, entity type, and script. We strongly recommend adding the language if the language is known. If entity type is not specified, the default is PERSON. If the script is not specified, the default is Latin.

Compare two names

Let’s start with the /name-similarity endpoint. This endpoint compares two names, returning a similarity score between 0 and 1. A score of 1 indicates the names are identical, while a score of 0 indicates the names are completely different.

Let’s compare a name written in different languages. In this example, we compare the names “Michael Jackson” and “迈克尔·杰克逊”. As with all the Analytics endpoints, the /name-similarity endpoint is multilingual. The endpoint will automatically detect the language of the input names, but you should specify the language if you know it.

First, let’s run the command through this portal.

Navigate to the Analytics API Reference
Select the /name-similarity endpoint.
Click on the Try it out button. This will bring you to the SDK documentation platform, where you can interactively modify parameters and execute calls to the API.
In the X-BabelStreetAPI-Key field, enter your API key.
The example is precoded for you. Select Send Request to execute the request.

Or you could copy and run the following cURL example from a command window:

$ curl -X POST https://analytics.babelstreet.com/rest/v1/name-similarity \
>      -H "X-BabelStreetApi-Key: YOURAPIKEYHERE" \
>      -H "Content-Type: application/json" \
>      -d '{
>   "name1": {
>     "text": "Michael Jackson"
>   },
>   "name2": {
>     "text": "迈克尔·杰克逊"
>   }
> }'

The response will be a similarity score:

1 {
2   "similarity": 0.98
3 }

Compare company names

While the /name-similarity endpoint compares names, there are different kinds of names. Match selects the algorithms, stop words, overrides, and parameter values based on the entity type. If no entityType is specified, the default is PERSON.

When comparing company names, be sure to set the entityType to ORGANIZATION. This enables matching using Real World Ids.

Real World Ids

Organizations and companies often have nicknames which are very different from the company’s official name. For example, International Business Machines, or IBM, is known by the nickname Big Blue. As there is no phonetic similarity between the two names, a match query between those two organization names would result in a low score. A real world identifier associates companies, along with their associated nicknames and permutations, with an identifier. When enabled, a search between two company names will include a comparison between the real world identifiers for the two names, thus matching dissimilar names for the same corporate entity.

Let’s use the portal and change the example to compare two company names. We’ll be staying on the /name-similarity endpoint, since it is used to compare all kinds of names, including company names.

On the left side of the screen, go to the Body Parameters section.
In the name1 text field, enter Dunkin.
In the name2 text field, enter Dunkin Donuts.
Select Send Request to execute the request.

Remember, the default entity type is PERSON, so we are not using the Real World Ids. The response will be a similarity score:

1 {
2   "similarity":  0.7273777
3 }

Now let’s add in the entityType parameter.

In the name1 entityType field, enter ORGANIZATION.
In the name2 entityType field, enter ORGANIZATION.
Select Send Request to execute the request. This time, because the type is ORGANIZATION, Real World Ids will be used for matching.

The response will be a similarity score:

1 {
2   "similarity": 0.98
3 }

Text Analytics

The text analytics endpoints process unstructured text documents, identifying languages, topics, and extracting critical business information.

The unstructured text can be provided as either::

content: plain text passed directly in as a request parameter
contentUri: a valid HTTP, HTTPs, or FTP url to publicly accessible text content. The content will be downloaded and processed.
contentFile: a file uploaded to the API. The file must be a plain text file with a .txt extension.

We’re going to provide the text content directly in the request.

Extract Entities

Let’s use the /entities endpoint to extract entities from a document. The entities endpoint extracts entities from a document, such as names, locations, and dates. The endpoint returns the entities found in the document, along with their types and locations in the text.

You don’t need to specify the language, though you can if you know it. If you don’t know it, the endpoint will automatically detect it.

The content is provided as a JSON object in the request body.

$ curl -s -X POST \
>     -H "X-BabelStreetAPI-Key: your_api_key" \
>     -H "Content-Type: application/json" \
>     -H "Accept: application/json" \
>     -H "Cache-Control: no-cache" \
>     -d '{"content": "Taylor Swift will close the European leg of her record-breaking Eras tour in London on Tuesday, drawing fans from near and far for the last opportunity to see the critically acclaimed show in Europe. The U.S. singer-songwriter returned to London's Wembley Stadium last week for five performances following the cancellation of her shows in Vienna, when a planned attack was foiled by authorities. Some of the 195,000 disappointed fans in Vienna rushed to buy tickets for the London dates on resale sites, where they were changing hands for up to 10 times face value. Eras, the first tour to surpass $1 billion in revenue, showcases all 11 of Swift's studio albums in dedicated sections." }' \
>     "https://analytics.babelstreet.com/rest/v1/entities"

This will return a structure listing all entities, along with their positions in the text.

{
  "entities": [
    {
      "type": "LOCATION",
      "mention": "London",
      "normalized": "London",
      "count": 3,
      "mentionOffsets": [
        {
          "startOffset": 77,
          "endOffset": 83
        },
        {
          "startOffset": 239,
          "endOffset": 245
        },
        {
          "startOffset": 474,
          "endOffset": 480
        }
      ],
      "entityId": "Q84",
      "confidence": 0.73554683,
      "linkingConfidence": 0.73475044
    },
    {
      "type": "PERSON",
      "mention": "Taylor Swift",
      "normalized": "Taylor Swift",
      "count": 2,
      "mentionOffsets": [
        {
          "startOffset": 0,
          "endOffset": 12
        },
        {
          "startOffset": 641,
          "endOffset": 646
        }
      ],
      "entityId": "Q26876",
      "confidence": 0.18091202,
      "linkingConfidence": 0.44089604
    },
    {
      "type": "LOCATION",
      "mention": "Vienna",
      "normalized": "Vienna",
      "count": 2,
      "mentionOffsets": [
        {
          "startOffset": 339,
          "endOffset": 345
        },
        {
          "startOffset": 437,
          "endOffset": 443
        }
      ],
      "entityId": "Q1741",
      "linkingConfidence": 0.70196899
    },
    {
      "type": "NATIONALITY",
      "mention": "European",
      "normalized": "European",
      "count": 1,
      "mentionOffsets": [
        {
          "startOffset": 28,
          "endOffset": 36
        }
      ],
      "entityId": "T1"
    },
    {
      "type": "TEMPORAL:DATE",
      "mention": "Tuesday",
      "normalized": "Tuesday",
      "count": 1,
      "mentionOffsets": [
        {
          "startOffset": 87,
          "endOffset": 94
        }
      ],
      "entityId": "T3"
    },
    {
      "type": "LOCATION",
      "mention": "Europe",
      "normalized": "Europe",
      "count": 1,
      "mentionOffsets": [
        {
          "startOffset": 192,
          "endOffset": 198
        }
      ],
      "entityId": "T4",
      "confidence": 0.90722972
    },
    {
      "type": "LOCATION",
      "mention": "U.S.",
      "normalized": "U.S.",
      "count": 1,
      "mentionOffsets": [
        {
          "startOffset": 204,
          "endOffset": 208
        }
      ],
      "entityId": "T5"
    },
    {
      "type": "LOCATION",
      "mention": "Wembley Stadium",
      "normalized": "Wembley Stadium",
      "count": 1,
      "mentionOffsets": [
        {
          "startOffset": 248,
          "endOffset": 263
        }
      ],
      "entityId": "Q128468",
      "confidence": 0.55430579,
      "linkingConfidence": 0.52152627
    },
    {
      "type": "IDENTIFIER:MONEY",
      "mention": "$1 billion",
      "normalized": "$1 billion",
      "count": 1,
      "mentionOffsets": [
        {
          "startOffset": 598,
          "endOffset": 608
        }
      ],
      "entityId": "T11"
    }
  ]
}