Semiocast API Documentation
Micromessage analysis

About

Semiocast API provides methods for analysing micromessages such as Twitter status updates, Facebook posts or short raw text (less than 256 unicode characters) and user profiles from Twitter and Facebook users. The Semiocast API accepts different types of inputs: a single micromessage, a list of micromessages (stream, timeline, feed), a single user profile, a list of user profiles. These methods provide information about the message, such as the language of the message or the sentiment expressed. Geographic location (country and city) can be deduced from a short text or from the users profile.

Other types of short message processing, such as tokenization and main topic extraction, will be made available through the Semiocast API soon.

AnalysisDescriptionStatus
Message languageMain text of micromessage.
Result is the script name in ISO-15924 code (which set of graphic characters is mainly used), and language name in ISO-639-1 code (which language are the words mainly written in).
Available
Geographic locationMessage location (if existing) or user location (if message is not located or location can not be deduced from it).
Result is the country code in ISO 3166-1 alpha-2 and city name
Available
MoodMain text of micromessage.
Positive/Neutral/Negative
Available upon request
TokensMain text of micromessage.
Decompose message in elementary parts (word, url, hash tag, username...)
Coming soon
TopicsMain text of micromessage.
Main topics
Coming soon

Rate limited

1 call per micromessage analyzed (read API levels for more information).

URL

https://api.semiocast.com/1/analyze/network.format

Formats

Input/Output in JSON or XML.

Networks models

NetworkSupported APIExpected data
FacebookFacebook REST APISingle post, stream, user profile, list of user profiles.
Facebook Graph APISingle post, feed, user profile, list of user profiles.
Status.netTwitter­compatible APISingle status or a set of statuses (user_timeline, friends_timeline, favorites, mentions, direct_messages).
TwitterREST APISingle status or a set of statuses (home_timeline, user_timeline, friends_timeline, favorites, mentions, direct_messages).
Search APISingle status or a set of statuses.
Stream APISingle status or a set of statuses (sample, filter, firehose, links, retweets).
Raw(none)UTF-8 encoded string (less than 256 unicode characters).

Notes

Any field available in message may be used during analysis. Altering or removing data provided by external API may decrease analysis reliability, or worse prevent analysis or future analysis. Consequently, it is better to provide message received from other networks as-is.

Input

ParameterMandatoryTypeDescription
data Yes json/xml Micromessage, timeline or raw text. See models above.
identNoComma separated list of all
language
location
mood:model
(default: all)
Analyses to execute. Currently, all is equivalent to language,location. Other analysis may be added later.
outputNoshort|enriched
(default: short)
By default only result of analysis is returned. output=enriched returns original message extended to contain analysis result.

Status codes

Read Errors for general information about error messages and interpretation of returned HTTP status codes.

Output

In case of success, result is provided in two formats:

  • A short format returns only analysis result. It is the default and preferred way to return analysis since it minimizes network bandwidth. If you send a single message, Semiocast API returns analysis result:
    { "language": {"script_code":"latn", "language_code":"fr"}, "location": {"country_code":"FR", "city_name":"Paris"}, "mood": {"sentiment":"positive"} }
    <result> <language> <script_code>latn</script_code> <language_code>fr</language_code> </language> <location> <country_code>FR</country_code> <city_name>Paris</city_name> </location> <mood> <sentiment>positive</sentiment> </mood> </result>

    If you send a timeline or a list of messages, API returns for each message its id and analysis result:

    [ { "id": "123456789-1234", "language": {"script_code":"latn", "language_code":"fr"}, "location": {"country_code":"FR", "city_name":"Paris"}, "mood": {"sentiment":"positive"} }, ... ]
    <result type="array"> <message> <id>123456789-1234<id/"> <language> <script_code>latn</script_code> <language_code>fr</language_code> </language> <location> <country_code>FR</country_code> <city_name>Paris</city_name> </location> <mood> <sentiment>positive</sentiment> </mood> </message> ... </result>
  • An enriched format where original input is modified to include analysis result inside the corresponding message. If micromessage has already been analyzed by Semiocast API, previous results are replaced with new ones. Other parts of the original input are not modified nor removed.

When Semiocast API is unable to identify in which script/language message is written or in which country/city user is located, the unknown information is associated to null (in json) or empty (in xml). For instance, if we only have country and no information about script, language and city name:

{ "id": "123456789-1234", "language": {"script_code":null, "language_code":null}, "location": {"country_code":"FR", "city_name":null} }
<result> <id>123456789-1234</id> <language> <script_code></script_code> <language_code></language_code> </language> <location> <country_code>FR</country_code> <city_name></city_name> </location> </result>

When Semiocast API do not find any information allowing to identify message language (no text) or user location (no location field in user profile), analysis result is associated to null. For instance, if we only have a text and no user location:

{ "id": "123456789-1234", "language": {"script_code":null, "language_code":null}, "location": null }
<result> <id>123456789-1234</id> <language> <script_code></script_code> <language_code></language_code> </language> <location /> </result>

Examples

  • Twitter examples
  • Facebook examples
  • Raw text examples