Pages

2012年6月17日日曜日

[Twitter Streaming API] Twitterの中心でappleを叫ぶ

Twitterのグローバルタイムラインから情報を収集することを検討してみる。 調べた結果では、Twitter Streaming APIを使うのが一番と言うことで、早速、試してみた。

Twitter Streaming API ドキュメント

Twitter Streaming APIは、WebアプリケーションAPIで利用するRESTな動きではなく、一度Requestを投げて、Twitterとつながるとだらだらとtweetを返してくる。 クライアントは、それを待ち受けて処理し続ける。つまりpushされてくるAPIだ。

twitterの中心で"apple"を叫ぶ人達

Node.JSを使う。

試作では、エンドポイント filterを使い、キーワード_apple_を含むグローバルタイプラインを取得している。出力は、標準出力に流している。終了させるためには、Ctl-Cだ(芸がない)。

// ファイル名は app.js にした
var https = require('https'); // httpsモジュールでつなぎます。
var host  = 'stream.twitter.com'; // Streaming APIを提供するサイト。

var find_one = 'apple';

// getメソッドに渡すオプションをまとめ
var options = {
    host : host,
    port : 443,
    path : '/1/statuses/filter.json?track=' + find_one, 
    // filterエンドポイントを利用。この当たりは一寸怪しい
    auth : 'TWITTERID:PASSWORD' // 自分のtwitterアカウントのユーザID,パスワードをセット
};

// リクエストオブジェクトを取得します
var req = https.get(options, 
  function(response){
    console.log('Response: ' + response.statusCode);
  }
);

// リクエストオブジェクトが、イベント"response"を受け取った場合
req.on('response', function(response){
    // レスポンスがデータだった場合(?)
    response.on('data', function(chunk){
        // 受け取ったデータをJSON形式に変換します
        var tweet = JSON.parse(chunk);
        // tweetがuserプロパティ、nameプロパティが存在する場合
        if('user' in tweet && 'name' in tweet.user){
            // 漢字、日本語などはfilterされず流されてくるため、ここで除いておいた
            // ちょっといい加減(今後の課題)
            if(tweet.text.toLowerCase().indexOf(find_one) > 0){
            console.log('[' + tweet.user['name'] + ']\n' + tweet.text);
        }
    }
  });
});

// リクエストオブジェクトが、イベント"error"を受け取った場合、
req.on('error', function(e){
  console.log(e);
});

// 異常例外でプログラムがストップする場合を避けるため
process.on('uncaughtException', function(err){
  console.log('uncaughtException: ' + err);
});

実行してみます。
$ node app.js 
Response: 200
[Q U E]
so my Screen just went black!! I don't feel like going downtown to the Apple store today
[Yoel Wonsey]
@JmSamuel1415 Just go eat a rocky mountain apple
[jose antonio rivera ]
RT @ComediaChistes: — Oye, me enteré que Apple demandó a tu novia por violación de patentes. —¿Si? ¿Por qué? —Porqué está tan plana como ...
[iPhone iPad Actu]
#iphone #geek Post-Jobs, Apple unleashes new iPhone http://t.co/nYJFdAYA #ipod #ipad #apple
[iPhone iPad Actu]
#iphone #geek Steel Battalion Heavy Armor Gameplay Partie 2HD (fr) http://t.co/kqwcvnBt #ipod #ipad #apple
[danielle Bergman]
RT @AcneSkinSite: An apple a day keeps the acne away! Honey helps restore a youthful look! Try an Apple Honey Mask for endless benefits& ...
[Dol Noppadol]
RT @JAlLBREAK: Apple Now Throws in a MagSafe 2 Converter with Every Thunderbolt Display http://t.co/Jdi486kK
[Alniedawn F E]
RT @AcneSkinSite: An apple a day keeps the acne away! Honey helps restore a youthful look! Try an Apple Honey Mask for endless benefits& ...
[TAYLOR Ann'Marie]
Shorty jus gave me sum fye shxt..apple bacardi sprite tini! Fye

(だらだらと続く)


これからの課題

とりあえず、いろいろな人のblogを見ながら、試作してみたが、いろいろと気になる。

  • Twitter Streaming APIのドキュメントによれば、filter エンドポイントを利用するのは、POST メソッドだが、GETで動作した。
  • 漢字は、filter対象にならず、送り込まれるようだ。英語だけのTweetだと判断する必要があるかな。
  • 開発者アカウントを使う必要があるな。
  • とにかく、Twitter streaming APIのドキュメントを良く読む必要がある。
  • Node.JSのhttp,httpsモジュール、processオブジェクトは再勉強する必要あり。

今後使ってみようと思うのは、このモジュール ntwitter

ntwitter:Node.JSで、twitter streaming APIを扱う便利なモジュール


processオブジェクトについて

Node.JSドキュメント process から Event: 'uncaughtException' function (err) { } 発生した例外がイベントループまでたどり着いた場合に生成されます。 もしこの例外に対するリスナーが加えられていれば、 デフォルトの動作 (それはスタックトレースをプリントして終了します) は起こりません。

Twitter Streaming API ドキュメントの一部

Twitter APIのドキュメントから、Streaming API関連の一部を引用してみた。後で訳すことにしよう


Overview

The set of streaming APIs offered by Twitter give developers low latency access to Twitter's global stream of Tweet data. A proper implementation of a streaming client will be pushed messages indicating Tweets and other events have occurred, without any of the overhead associated with polling a REST endpoint.

Twitter offers several streaming endpoints, each customized to certain use cases.

  • Public streams: Streams of the public data flowing through Twitter. Suitable for following specific users or topics, and data mining.
  • User streams: Single-user streams, containing roughly all of the data corresponding with a single user's view of Twitter.
  • Site streams: The multi-user version of user streams. Site streams are intended for servers which must connect to Twitter on behalf of many users.

An app which connects to the Streaming APIs will not be able to establish a connection in response to a user request, as shown in the above example. Instead, the code for maintaining the Streaming connection is typically run in a process separate from the process which handles HTTP requests:

FAQ から

How do I use the Twitter platform?

Twitter offers a platform with a number of different ways to interact with it.

Web Intents, Tweet Button and Follow Button is the simplest way to bring basic Twitter functionality to your site. It provides features like the ability to tweet, retweet, or follow using basic HTML and Javascript. You can also embed individual tweets.

More complex integrations can utilize our REST, Search, and Streaming APIs. The Streaming API allows you to stream tweets in real time as they happen. The Search API provides relevant results to ad-hoc user queries from a limited corpus of recent tweets. The REST API allows access to the nouns and verbs of Twitter such as reading timelines, tweeting, and following.

To use the REST and Streaming API, you should register an application and get to know the ways of OAuth and explore Twitter Libraries.

What is the version of the REST API?

In the API documentation there is a version place marker in the example request URL. Currently only one version of the API exists, that version is 1. This means any REST API queries will be of the format: https://api.twitter.com/1/statuses/user_timeline.json. The Streaming API is currently on version 1 as well, while the Search API is unversioned.

How are rate limits determined on the Streaming API?

At any one moment of time there are X amount of tweets in the public firehose. You're allowed to be served up to 1% of whatever X is per a "streaming second."

If you're streaming from the sample hose at https://stream.twitter.com/1/statuses/sample.json, you'll receive a steady stream of tweets, never exceeding 1% X tweets in the public firehose per "streaming second."

If you're using the filter feature of the Streaming API, you'll be streamed Y tweets per "streaming second" that match your criteria, where Y tweets can never exceed 1% of X public tweets in the firehose during that same "streaming second." If there are more tweets that would match your criteria, you'll be streamed a rate limit message indicating how many tweets fell outside of 1%.

How do I keep from running into the rate limit?

Caching. We recommend that you cache API responses in your application or on your site if you expect high-volume usage. For example, don't try to call the Twitter API on every page load of your hugely popular website. Instead, call our API once a minute and save the response to your local server, displaying your cached version on your site. Refer to the Terms of Service for specific information about caching limitations. Rate limiting by active user. If your site keeps track of many Twitter users (for example, fetching their current status or statistics about their Twitter usage), please consider only requesting data for users who have recently signed in to your site. Scale your use of the API with the number of users you have. When using OAuth to authenticate requests with the API, the rate limit applied is specific to that user_token. This means, every user who authorizes your application to act on their behalf, has their own bucket of API requests for you to use. Request only what you need, and only when you need it. For example, polling the REST API looking for new data is inefficient for both your application, and the Twitter API. Instead consider using one of the Streaming APIs as a signal of when to make REST API requests. Consider using a combination of the APIs to achieve your goal. You can't do everything with one API, but by combining them you can do most things. For example, instead of using the Search API for all your querying, use the Streaming API to track keywords and follow users Tweets, and save the Search API for the more complex queries. These are just some example strategies. To work out different solutions for you to achieve your goals, search through discussions on the topic or start your own.

How do password resets effect application authorization?

When using OAuth, application connectivity and permissions do not change when a user resets their password on twitter.com. The relationship between Twitter, a user, and a third-party application do not involve a username and password combination. When a Twitter user changes their password, we'll now ask the user whether they would also like to revoke any of their application authorizations, but any revocations are manually executed by the end user.

As of March 12, 2012 it is still possible to connect to the The Streaming APIs via Basic Auth credentials. If the password belonging to a user account that connects to the Streaming API via basic auth is changed, the new password will need to be used to regain that connection.

I don't want to require users to authenticate but 150 requests per hour is not enough for my app, what should I do?

Rethink not wanting to require authentication. It's the primary means to grow your application's capabilities. We recommend requiring authentication to make use of potentially 350 requests per hour per access token/user. Consider also investigating whether the Streaming API's follow filter will work for you.

How do I count favorites?

Favorite counts aren't available as part of tweet objects in the REST, Streaming or Search APIs at this time. User streams and Site streams both stream events when an authenticated user favorites tweets or has their tweets favorited. Using these authenticated streaming APIs, you can count favorites in real-time as they happen. This is currently the only scalable means to count favorite activity.

How do I count retweets?

Tweets in the REST and Streaming APIs contain a field called retweet_count that provides the number of times that tweet has been retweeted. You can obtain the retweet count for any arbitrary tweet by using GET statuses/show/:id.

You can count retweets as they happen by using a The Streaming APIs. In particular, User streams and Site streams allow you to be streamed retweet events about/around an authenticated user in real time.

I keep hitting the rate limit. How do I get more requests per hour?

REST & Search API Whitelisting is not provided. Resourceful use of more efficient REST API requests, authentication, and Streaming APIs will allow you to build your integration successfully without requiring whitelisting. Learn more about rate limits or see the rate limiting FAQ for more information.

0 件のコメント:

コメントを投稿