Writing an auto-retweet service
I'm working with a couple friends on a community service @PortsmouthTweet. It started as an auto-retweeter. People post tweets with @PortsmouthTweet and a bot picks them up and retweets them. This may seem simple, but it's surprisingly difficult.
People use twitter for a lot of things: posting one-shot updates, having quick conversations, retweeting what someone else said, encouraging followers to check out another user (#FollowFriday). Only the first of these uses concerns retweeters. At best, posting all will overwhelm your followers; at worst, you may look invasive and uncouth by retweeting a semi-personal conversation. So how can a bot discern the valuable content from the chatter?
Here's the code @PortsmouthTweet uses, where $t is an associative array built from a tweet in a Twitter API call:
if ( !stristr(substr($t['text'],0,2),'rt') && //not an RT
strlen($t['in_reply_to_status_id']) < 4 && //not a reply
!stristr($t['text'],'followfriday') && //not followfriday (or #followfriday)
!stristr($t['text'],'#ff') && //not #ff
$t['user']['id'] != 10855142 && //not from @lazytweet
$t['user']['id'] != 45366008 ) { //not from us -- change to your id!
It works, with some intelligence, at least for now. Compiling these rules has certainly been an empirical, Bayesian process. When I first wrote the retweeter, I didn't have any of them.
Soon we'll build a website and more robust databases for @PortsmouthTweet. With a user table in particular, we can keep track of our followers/following, and use this data to further filter the retweeter. For example, if a new user tweets at us, and this person follows a number of people we 'trust', we can likely trust the tweet. I'd also recommend checking for a 0.6 or higher followers/following ratio; most real people have this.
update you might also want to "blacklist" @lazytweet to avoid a recursion loop if someone includes #lazytweet in a tweet. Thanks to @Sturta for this feedback.