Web API’s

top_about

Amtera also provides solutions through WEB APIs. This lets others developers take advantage of part of our technology.

ESA Semantic Relatedness

This API allows one to measure the likeness on the meaning of two text excerpts. Click on the link bellow to start consuming this API and add semantics to your application. The implementation is based on Explicit Semantic Analysis which describes a method for calculating semantic similarities between linguistic items based on their distributional semantic properties over Wikipedia. Distributional semantic models such as ESA are based on the Distributional Hypothesis, which states that words co-occurring in similar contexts tend to have similar meaning.

What is behind it?

Semantic flexibility and approximation are a fundamental part of the intelligent behaviour that semantic applications are expected to deliver. However, most people working with semantic technologies associate semantic flexibility with the use of manually built models such as ontologies or dictionaries. In fact, ontologies and dictionaries have been the primary way for practitioners to deliver semantic flexibility into applications. In addition to their high associated development costs, these models tend to deliver semantic approximation solutions which can be incomplete in knowledge coverage, can be expensive to deploy and require a complex development skill set.

More recently, distributional semantic models are emerging as alternative solutions to build simplified but comprehensive semantic models based on word co-occurrence patterns derived from large text collections. Due to their
comprehensive nature, these models can work as a large commons sense knowledge base which applications can access to provide semantic capabilities such as flexibility and automatic disambiguation.

Amtera Semantic Relatedness API provides an easy-to-useservice which supports applications accessing a large-scale distributionalknowledge base. The core operation behind the API is the computation of a semantic relatedness measure, which allows the systematic determination of the semantic proximity between terms. This core operation can be used to support applications in the construction of semantic approximations and disambiguation functionalities. Developers can start experimenting with the API after a setup that takes less than 2 minutes.

The service is highly optimized to support the high throughput needed for industry applications. Currently the service is available in English and Portuguese. Additional languages (German, Spanish, French and Italian) will become available in a short time.

 

Use Case: Word Sense Disambiguation for Tagging

The first example consists on the use of the semantic relatedness measure to disambiguate words.
Suppose we want to help our tagging application to disambiguate tags extracted from text into their correct sense.
Given a word like “star”, which can have multiple meanings, it is necessary to tag the text with the
correct associated meaning.

In order to disambiguate the word into its proper sense, we can use the information present in the surrounding text.
In this example we will use the Article title as context string. The idea here is to show that the API can disambiguate
with a minimum level of information. Using the Semantic Relatedness API we can compute the semantic relatedness between
each possible sense for star and the context string:

{“t1″: “performer, actor”, “t2″: “Angelina Jolie”, “v”: 0.0011905249}
{“t1″: “celestial body”, “t2″: “Angelina Jolie”, “v”: 0.0000578361 }

In the example, the semantic relatedness measure determines that the correct sense for
star is Performer, Actor.

As a complementary exercise we will use the semantic relatedness API to classify the
set of candidate tags to provide a better description of the text and to eliminate tags which are unrelated. Suppose for this text we have the following set of words which are tag candidates: fame, video game, heroine, actresses, star, career, commercial, critical, revenues,
dramas
. Using the same context string we send to the API the pairwise requests and get the following values:

{“t1″: “actresses”, “t2″: “Angelina Jolie”, “v”: 0.0153965631}
{“t1″: “dramas”, “t2″: “Angelina Jolie”, “v”: 0.0028367762}
{“t1″: “heroine”, “t2″: “Angelina Jolie”, “v”: 0.0018140874}
{“t1″: “fame”, “t2″: “Angelina Jolie”, “v”: 0.0005110626}
{“t1″: “career”, “t2″: “Angelina Jolie”, “v”: 0.0005090217}

Threshold = 0.0005
{“t1″: “critical”, “t2″: “Angelina Jolie”, “v”: 0.0004876082}
{“t1″: “revenue”,”t2″: “Angelina Jolie”, “v”: 0.0004872711}
{“t1″: “star”, “t2″: “Angelina Jolie”, “v”: 0.000478083}
{“t1″: “video game”, “t2″: “Angelina Jolie”, “v”: 0.0003381776}
{“t1″: “commercial”, “t2″: “Angelina Jolie”, “v”: 0.0000000000}

Note that semantic relatedness values define a relative, comparative measure. We can also define an absolute threshold value of 0.005 for the semantic relatedness value to eliminate unrelated terms. Threshold values are specific to the distributional model and the associated corpus.

 

Use Case: News Classification

In this example, suppose we want to categorize a RSS feed by using the headline titles. One of our critical categories is
“regulation” we want to get all news which can be related to regulation. Taking some examples of real news from the
Reuters RSS feed, we send the headline tiles to the API, where the context word now is regulation:

{“t1″: “U.S. allows Indiana to offer health program outside of Obamacare”, “t2″: “regulation”, “v”: 0.0141632554}
{“t1″: “London’s ICAP next to plan U.S. swaps trading platform”, “t2″: “regulation”, “v”: 0.0109662349}
{“t1″: “Macquarie U.S. unit fined for client money error”,”t2″: “regulation”, “v”: 0.0065025463}

Threshold = 0.005

{“t1″: “Tom Hardy unveils new role at Venice”, “t2″: “regulation”, “v”: 0.0027370773}
{“t1″: “The Harry Potter star causes fan fever in Venice”, “t2″: “regulation”, “v”: 0.0019474802}
{“t1″: “Timberlake super-singer says no to being a superhero”, “t2″: “regulation”, “v”: 0.0010331645}
{“t1″: “Science fiction writer Frederik Pohl dies at 93″, “t2″: “regulation”, “v”: 0.0010345204}

We defined the semantic relatedness threshold as 0.005 for larger input strings.

© 2014 Amtera Semantic Technologies.
Todos os direitos reservados.