Semiocast API Documentation
Micromessage filtering

About

Filter micromessages, such as Twitter home timeline or Facebook feed, according to message language and user location.

URL

https://api.semiocast.com/1/filter/network.format

Networks models

NetworkSupported APIExpected data.
FacebookFacebook REST APISingle post, stream, user profile, list of user profiles.
Facebook Graph APISingle post, feed, user profile, friends list.
Status.netTwitter-compatible APISingle status update and timelines (user_timeline, friends_timeline, favorites, mentions, direct_messages).
TwitterREST APISingle status update and timelines (home_timeline, user_timeline, friends_timeline, favorites, mentions, direct_messages).
Search APIOutput from search request.
Stream APIStatuses extracted from streams (sample, filter, firehose, links, retweets).

Formats

Input and Output in JSON or XML.

Rate limit

1 call per micromessage analyzed (read API levels for more information).

Notes

Any field available in micromessages may be used during semantic analysis. Altering or removing data provided by external API may decrease the reliability of the analysis, or even prevent the analysis or up-coming analysis. Consequently, it is better to provide the statuses received from the other networks as-is. Extensions of existing models are acceptable as long as they do not correspond to something defined by official API models.

Input

ParameterMandatoryTypeDescription
dataYes json/xml Set of micromessage. See micromessage models above.
languagesNostringComma separated list of script code and/or language code. Script code is based on ISO 15924 and language code is based on ISO 639-1.
none may be used to catch messages without a specific language (like a smiley or just an url).
Example: languages=latn-fr,arab,ru will filter all messages written in french, with arabic characters or in russian.
countriesNostringComma separated list of country code. Country code is based on ISO 3166-1 alpha-2.
none may be used to catch messages where no country has been identified.
countries=fr,jp will filter all messages coming from France or Japan.
outputNoshort|filtered|enriched (default: short)By default, only a list of filtered message id associated to analysis result is returned. output=filtered returns original output without messages not validating conditions. output=enriched returns filtered messages with analysis results.

When languages and countries are specified simultaneously, there is an OR relationship between languages and countries. It is enough to have one language or one country verified to filter a message.

Status codes

Read Errors for general information about error messages and interpretation of returned HTTP status codes.

Output

If you want to filter micromessages written with latin characters among the following data:

[ {"id": "123456789-1234", "text": "Texte de démonstration", ...}, {"id": "123456789-4321", "text": "Show case", ...} {"id": "123456789-5555", "text": "テキストのデモ ", ...} ]

Three output are available:

  • a short output containing only message id and analysis result. It is the default and preferred way to return analysis since it minimizes network bandwidth. Example in JSON format:
    [ {"id": "123456789-1234", "languages", {"script_code": "latn", "language_code": "fr"}} {"id": "123456789-4321", "languages", {"script_code": "latn", "language_code": "en"}} ]
  • a filtered output corresponding to original input filtered to include only micromessages validating specified conditions. Messages are not modified.
    [ {"id": "123456789-1234", "text": "Texte de démonstration", ...}, {"id": "123456789-4321", "text": "Show case", ...} ]
  • a enriched output corresponding to original input filtered to include only micromessages validating specified conditions, enriched with analysis result for each micromessage:
    [ {"id": "123456789-1234", "text": "Texte de démonstration", ..., "annotations":["language":{"script_code": "latn", "language_code": "fr"}]}, {"id": "123456789-4321", "text": "Show case", ..., "annotations":["language":{"script_code": "latn", "language_code": "en"}]} ]

Examples