Semiocast API tutorial
Twitter
About
This chapter of the tutorial explains how to analyze Twitter status updates with Semiocast API.
Prerequisite
It is recommended to have read the Raw text section first to get familiar with the Semiocast API.
Analyze Twitter status updates
Semiocast API has been designed to analyze micromessages coming from various social networks, such as Twitter status updates or Facebook posts. Read micromessage analysis documentation for a complete list of features, languages recognized, location's formats accepted, and results provided by this query.
First get statuses from Twitter public timeline and store them in a file named statuses.json:
curl "http://api.twitter.com/1/statuses/public_timeline.json" > statuses.json
Then ask Semiocast API to analyze data:
curl -E semiocast-api.pem:PASSWORD
--data-urlencode data@statuses.json
https://api.semiocast.com/1/analyze/twitter.json
Will return:
[...
{"id":"16475229000","language":{"script_code":"latn","language_code":"id"},
"location":{"country_code":"ID","city_name":null}},
{"id":"16475228000","language":{"script_code":"jpan","language_code":"ja"},
"location":{"country_code":"JP","city_name":"福岡市"}},
{"id":"16475227000","language":{"script_code":"latn","language_code":"id"},
"location":{"country_code":"ID","city_name":"Jakarta"}},
{"id":"16475226000","language":{"script_code":"latn","language_code":"pt"},
"location":{"country_code":"BR","city_name":"Vitória"}},
{"id":"16475225000","language":{"script_code":"latn","language_code":"pt"},
"location":{"country_code":"BR","city_name":"São Paulo"}},
{"id":"16475222000","language":{"script_code":"latn","language_code":"id"},
"location":{"country_code":"FR","city_name":"Paris"}},
{"id":"16475221000","language":{"script_code":"latn","language_code":"en"},
"location":{"country_code":"MX","city_name":"Chihuahua"}},
{"id":"16475220000","language":{"script_code":"latn","language_code":"en"},
"location":{"country_code":null,"city_name":null}},
{"id":"16475219000","language":{"script_code":"latn","language_code":"es"},
"location":null},
{"id":"16475215000","language":{"script_code":"latn","language_code":"en"},
"location":{"country_code":"CA","city_name":"Oromocto"}},
...]
Returned result associates to each status a message language and a user location. location is null if there is no location to analyze. country_code and city_name are both null if there is a location but Semiocast API is unable to analyze it. country_code is defined but not city_name when Semiocast API identifies only the country and not the city.
If you want this information directly injected inside each message (through annotations), you can call Semiocast API with the output=enriched parameter.
curl -E semiocast-api.pem:PASSWORD
-d output=enriched
--data-urlencode data@statuses.json
https://api.semiocast.com/1/analyze/twitter.json
[...,
{"in_reply_to_user_id":null,"geo":null,...,
"text":"sipp\" teh, sama\" ; ) RT @heyvira: thanks follownya :) @heeyjeng @malindayuse",
"annotations":[{"language":{"script_code":"latn","language_code":"id"}},
{"location":{"country_code":"ID","city_name":null}}], ...},
...]
Filter Twitter status updates
Semiocast API also provides a method for filtering micromessages according to message language or user location. It allows to get only statuses written in a specific language or from a specific location.
Read micromessage filtering documentation for a complete list of features, parameters and results provided by this query.
Example: filtering by language
For instance if you want only messages written in japanese or portuguese:
curl -E semiocast-api.pem:PASSWORD
-d languages=ja,pt
--data-urlencode data@statuses.json
https://api.semiocast.com/1/filter/twitter.json
Will result in:
[{"id":"16475228000","language":{"script_code":"jpan","language_code":"ja"}},
{"id":"16475226000","language":{"script_code":"latn","language_code":"pt"}},
{"id":"16475225000","language":{"script_code":"latn","language_code":"pt"}},
{"id":"16475211000","language":{"script_code":"latn","language_code":"pt"}}]
Returned result contains only messages written in japanese or portuguese. Each message id is associated to identified language.
Example: filtering by location
If you want only messages from a specific location like Indonesia:
curl -E semiocast-api.pem:PASSWORD
-d locations=ID
--data-urlencode data@statuses.json
https://api.semiocast.com/1/filter/twitter.json
This call will return:
[{"id":"16475229000","location":{"country_code":"ID","city_name":null}},
{"id":"16475227000","location":{"country_code":"ID","city_name":"Jakarta"}}]
Filtering according to city name will be added later.
Example: filtering by language and location
You can combine message language and user location. For instance if you want all messages written in indonesian or coming from Indonesia:
curl -E semiocast-api.pem:PASSWORD
-d locations=ID
-d languages=id
--data-urlencode data@statuses.json
https://api.semiocast.com/1/filter/twitter.json
This call will give you the following result:
[{"id":"16475229000","location":{"country_code":"ID","city_name":null},
"language":{"script_code":"latn","language_code":"id"}},
{"id":"16475227000","location":{"country_code":"ID","city_name":"Jakarta"},
"language":{"script_code":"latn","language_code":"id"}},
{"id":"16475222000","location":{"country_code":"FR","city_name":"Paris"},
"language":{"script_code":"latn","language_code":"id"}}]
Prepare annotations before posting Twitter status
Twitter will allow to add metadata called annotations to your status update when posting. Semiocast API already offers a method to build annotations containing message language and user location.
The following diagram illustrates how to do so:

Suppose that you are in Paris and want to tweet "I'm tweeting from France" (Step 1 on diagram):
curl -E semiocast-api.pem:PASSWORD
-d status="I'm tweeting from France"
-d location="Paris"
https://api.semiocast.com/1/prepare/twitter.json > annotations.json
File annotations.json contains (Step 2 on diagram):
[{"language":{"provider":"http://semiocast.com/", "script_code":"latn", "language_code":"en"}},
{"location":{"provider":"http://semiocast.com/", "country_code":"FR", "city_name":"Paris"}}]
Now you can update your status including annotations describing message language and user location (Step 3 on diagram):
curl -u TWITTER_USERNAME:TWITTER_PASSWORD
-d status="I'm tweeting from France"
--data-urlencode annotations@annotations.json
"http://api.twitter.com/1/statuses/update.json"
Read Prepare annotations for a complete list of features, parameters and results provided by this query.
Other data sources
Semiocast API analyze method also works on other Twitter data (search, home timeline, direct messages, single status update).
On Twitter search results
On Twitter search results (since Twitter search does not provide user information, location is available only when message is geotagged).
curl -d q=twitter
"http://search.twitter.com/search.json?rpp=5" > results.json
Then ask Semiocast API to analyze results:
curl -E semiocast-api.pem:PASSWORD
--data-urlencode data@results.json
https://api.semiocast.com/1/analyze/twitter.json
This call should return the following:
[{"id":16472814373,"language":{"script_code":"latn","language_code":"en"},"location":null},
{"id":16472812097,"language":{"script_code":"latn","language_code":"pt"},"location":null},
{"id":16472808055,"language":{"script_code":"jpan","language_code":"ja"},"location":null},
{"id":16472807801,"language":{"script_code":"latn","language_code":"en"},"location":null},
{"id":16472807725,"language":{"script_code":"latn","language_code":"es"},"location":null}]
Returned results associate to each status message language and user location if available.
On Twitter home timeline
Retrieve some messages:
curl -u TWITTER_USERNAME:TWITTER_PASSWORD
"http://api.twitter.com/1/statuses/home_timeline.json?count=5" > home_timeline.json
Then make the call:
curl -E semiocast-api.pem:PASSWORD
--data-urlencode data@home_timeline.json
https://api.semiocast.com/1/analyze/twitter.json
On Twitter direct messages
Retrieve some direct messages:
curl -u TWITTER_USERNAME:TWITTER_PASSWORD
"http://api.twitter.com/1/direct_messages.json?count=5" > direct_messages.json
Then make the call:
curl -E semiocast-api.pem:PASSWORD
--data-urlencode data@direct_messages.json
https://api.semiocast.com/1/analyze/twitter.json