{"id":135,"date":"2014-05-31T11:42:22","date_gmt":"2014-05-31T16:42:22","guid":{"rendered":"http:\/\/thenerdis.me\/blog\/?p=135"},"modified":"2014-05-31T11:42:22","modified_gmt":"2014-05-31T16:42:22","slug":"pangramtweets","status":"publish","type":"post","link":"http:\/\/thenerdis.me\/blog\/?p=135","title":{"rendered":"PangramTweets"},"content":{"rendered":"<p>Bots searching for linguistic gems on Twitter. <\/p>\n<hr>\n<p><b>PangramTweets<\/b><br \/>\nBy Ben Zimmer<\/p>\n<p>The <a href=\"http:\/\/ift.tt\/jTKeWz\">Twitter API<\/a>, beyond its great utility for corpus linguistics (see &#8220;<a href=\"http:\/\/ift.tt\/trJMJ2\">On the front lines of Twitter linguistics<\/a>,&#8221; &#8220;<a href=\"http:\/\/ift.tt\/1oKhvSQ\">The he&#8217;s and she&#8217;s of Twitter<\/a>&#8220;), has made possible a lot of fun automated text-mining projects. One fertile area is algorithmic found poetry: there have been Twitter bots designed to find <a href=\"http:\/\/ift.tt\/1h1wAaZ\">accidental<\/a> <a href=\"http:\/\/ift.tt\/1djhV8B\">haikus<\/a>, and even more impressively, a bot named <a href=\"http:\/\/ift.tt\/13bhtnL\">@Pentametron<\/a> that finds rhyming tweets in iambic pentameter and fashions sonnets out of them. <\/p>\n<p>And then there is found wordplay, which is its own kind of found poetry. I&#8217;m a big fan of <a href=\"http:\/\/ift.tt\/1cks9FQ\">@Anagramatron<\/a>, which discovers paired tweets that form serendipitous anagrams of each other. (<a href=\"http:\/\/ift.tt\/1h1wCzK\">Example<\/a>: &#8220;Last time I do anything&#8221; \u21d4 &#8220;That&#8217;s it. I&#8217;m dying alone.&#8221;) Now, courtesy of Jesse Sheidlower, comes <a href=\"http:\/\/ift.tt\/1oKhvST\">@PangramTweets<\/a>, in which each tweet contains every letter of the alphabet at least once.<br \/><span id=\"more-12478\"><\/span><br \/> Jesse explains the project <a href=\"http:\/\/ift.tt\/1h1wCzM\">on his site<\/a>:<\/p>\n<p style=\"padding-left: 30px;\"><a href=\"http:\/\/ift.tt\/1oKhuya\">PangramTweets<\/a>is a bot (a computer program that runs on its own) that searches Twitter for, and then retweets, pangrams\u2014texts that contain every letter of the alphabet. A famous pangram, sometimes used as a typing test, is \u201cThe quick brown fox jumps over the lazy dog.\u201d [&#8230;]<\/p>\n<p style=\"padding-left: 30px;\">You may find the results interesting, or dull. I make no judgment on this. The bot is entirely automated; I do not curate the results.<\/p>\n<p style=\"padding-left: 30px;\">I strip out user names and URLs from the results, but hashtags are included. I also do some very basic filtering to try to ensure that the results are in English, and not in another language or complete gibberish (random letters), though earlier versions of the bot did retweet nonsense or foreign-language pangrams.<\/p>\n<p>The bot originally did not filter out known pangrams of the &#8220;quick brown fox&#8221; variety, but <a href=\"http:\/\/ift.tt\/1oKhvSV\">by popular demand<\/a> Jesse\u00a0put a filter in place for that as well. The results are not as rich as Anagramatron, but that&#8217;s to be expected given the constraints: Jesse <a href=\"http:\/\/ift.tt\/1h1wCzQ\">says<\/a>he gets &#8220;one real pangram in every few million tweets scanned.&#8221; Here&#8217;s a sampling of what has turned up so far.<\/p>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p>I&#8217;ve just (with the help of google) realized I wrote about the wrong experiment in my 12 mark psychology question<\/p>\n<p>oops<\/p>\n<p>\u2014 s (@bricktop___) <a href=\"http:\/\/ift.tt\/1h1wCQ4\">May 13, 2014<\/a><\/p>\n<\/blockquote>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p>It&#8217;s official: Arthur Sulzberger names Dean Baquet executive editor of The New York Times, replacing Jill Abramson.<\/p>\n<p>\u2014 Vindu Goel (@vindugoel) <a href=\"http:\/\/ift.tt\/1oKhuyi\">May 14, 2014<\/a><\/p>\n<\/blockquote>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p>Looking for a new job is exhausting. Every one I want requires a bazillion years of experience I don&#8217;t have. FML.<\/p>\n<p>\u2014 Ryan Stephens (@Integrity1stziB) <a href=\"http:\/\/ift.tt\/1h1wCQ7\">May 16, 2014<\/a><\/p>\n<\/blockquote>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p>Thanks JMM for boosting my boxing prediction confidence again. The Mayweather card did a number on a lot of boxing fans. <a href=\"http:\/\/ift.tt\/O83ij5\">#MarquezAlvarado<\/a><\/p>\n<p>\u2014 E.J.O. (@ElioOrtiz11) <a href=\"http:\/\/ift.tt\/1h1wCQb\">May 18, 2014<\/a><\/p>\n<\/blockquote>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p>SHUT THE FUCK UP ABOUT THE \u201cFRIENDZONE\u201d. MAYBE YOU SHOULD JUST VALUE A WOMAN\u2019S FRIENDSHIP AND QUIT EXPECTING THEM TO FUCK YOU. JESUS FUCK.<\/p>\n<p>\u2014 \u30fb\u3002\u3002\u30fb\u309c\u2606\u309c\u30fb\u3002\u3002\u30fb (@chrstnmchd) <a href=\"http:\/\/ift.tt\/1oKhuym\">May 19, 2014<\/a><\/p>\n<\/blockquote>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p>Juan Manuel Marquez boxes Alvarado on weekday to line up fifth fight alongside Pacquiao <a href=\"http:\/\/ift.tt\/1h1wCQf\">@SportsMomentz<\/a> <a href=\"http:\/\/t.co\/e5CyDwDXFd\">http:\/\/t.co\/e5CyDwDXFd<\/a><\/p>\n<p>\u2014 Rinaldo Jonathan (@testeronline12) <a href=\"http:\/\/ift.tt\/1oKhuyo\">May 19, 2014<\/a><\/p>\n<\/blockquote>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p>Maybe Joe needs to take some advice from Iceland and arrest the rich people who are stealing from the rest of us tax paying citizens. <a href=\"http:\/\/ift.tt\/17iayKJ\">#qanda<\/a><\/p>\n<p>\u2014 Toby Owens (@TehMegaWiz) <a href=\"http:\/\/ift.tt\/1h1wCQk\">May 19, 2014<\/a><\/p>\n<\/blockquote>\n<p> It will be interesting to see if the bot turns up a naturally occurring &#8220;<a href=\"http:\/\/ift.tt\/1h1wAb9\">pangrammatic window<\/a>&#8221; that beats the current record-holder of 42 letters, from Piers Anthony&#8217;s <i>Cube Route<\/i>: <\/p>\n<p style=\"padding-left: 30px;\">&#8220;<strong>We are all from Xanth,&#8221; Cube said quickly. &#8220;Just visiting Phaz<\/strong>e\u2026&#8221; <\/p>\n<p>Sean Irvine announced the discovery of this pangrammatic window in <em><a href=\"http:\/\/ift.tt\/1oKhuOF\">Word Ways<\/a><\/em> in 2012. It beat out Eric Chaikin&#8217;s <a href=\"http:\/\/ift.tt\/1h1wCQm\">47-letter find<\/a>, which he discovered by Googling for &#8220;Joaquin Phoenix&#8221;:<\/p>\n<p style=\"padding-left: 30px;\">&#8220;JoBlo&#8217;s movie re<strong>view of The Yards: Mark Wahlberg, Joaquin Phoenix, Charliz<\/strong>e Theron\u2026&#8221;<\/p>\n<p>Of course, determining if a pangram is &#8220;naturally occurring&#8221; may be difficult, since it&#8217;s always possible to game the system! But with half a billion tweeters tweeting, maybe someday one of them will authentically produce a winner like &#8220;Mr. Jock, TV quiz PhD, bags few lynx.&#8221;<\/p>\n<p><ins><em>Update<\/em>: Jesse is attempting to filter out non-English tweets, but Indonesian tweets keep seeping through. Since I&#8217;ve done research on colloquial varieties of Indonesian, I find these tweets fascinating. I was initially surprised that the Indonesian Twittersphere would be generating pangrams, considering that the letters Q, V, X, and Z appear only in loanwords. But Indonesian participants on Twitter are using quite a lot of Anglicisms, along with a plethora of <a href=\"http:\/\/ift.tt\/1ktKC5v\">txtspk<\/a>-style abbreviations of Indonesian words. An example that just popped up:<\/ins><\/p>\n<p><ins> <\/ins><\/p>\n<blockquote class=\"twitter-tweet\" lang=\"en\">\n<p><ins><a href=\"http:\/\/ift.tt\/1gQcsht\">@PutriAZSYA<\/a> EXCITED BGT GRGR 1D MW K INDO. LBH EXCITED LG KLO JOIN LITTLEQUIZ <a href=\"http:\/\/ift.tt\/1ktKC5z\">@1D_CrazyLovers<\/a> DAN BCA JG FFNY.PASTI LO MKIN EXCITED.CEK FAV6<\/ins><\/p>\n<p><ins> <\/ins><\/p>\n<p><ins>\u2014 winda (@windaameliasar1) <a href=\"http:\/\/ift.tt\/1gQcsxJ\">May 20, 2014<\/a><\/ins><\/p>\n<\/blockquote>\n<p><ins> <\/ins><\/p>\n<p><ins><\/ins><\/p>\n<p><ins> <\/ins><\/p>\n<p><ins>The loanwords here are EXCITED, JOIN, and LITTLEQUIZ, and 1D refers to the band One Direction. Here&#8217;s a key to the abbreviation-heavy Indonesian items:<\/ins><\/p>\n<p><ins> <\/ins><\/p>\n<p style=\"padding-left: 30px;\"><ins>BGT = banget &#8216;very&#8217;<br \/> GRGR = gara-gara &#8216;just because&#8217;<br \/> MW = mau &#8216;will&#8217;<br \/> K = ke &#8216;(come) to&#8217;<br \/> INDO = Indonesia<br \/> LBH = lebih &#8216;more&#8217;<br \/> LG = lagi &#8216;(even) more&#8217;<br \/> KLO = kalau &#8216;if&#8217;<br \/> DAN = dan &#8216;and&#8217;<br \/> BCA = baca &#8216;read&#8217;<br \/> JG = juga &#8216;also&#8217;<br \/> PASTI = pasti &#8216;definitely&#8217;<br \/> LO = (e)lo &#8216;you&#8217;<br \/> MKIN = makin &#8216;more and more&#8217;<br \/> CEK = cek &#8216;check&#8217;<\/ins><\/p>\n<p><ins> <\/ins><\/p>\n<p><ins>So that would work out to: &#8220;@PutriAZSYA Very excited just because One Direction is coming to Indonesia. You&#8217;ll be even more excited if you join LittleQuiz @1D_CrazyLovers, and also read FFNY. You&#8217;ll definitely get more and more excited. Check Fav6.&#8221;<\/ins><\/p>\n<p>May 19, 2014 at 5:07PM<br \/>\nvia Language Log http:\/\/ift.tt\/1h1wAaW<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Bots searching for linguistic gems on Twitter. PangramTweets By Ben Zimmer The Twitter API, beyond its great utility for corpus linguistics (see &#8220;On the front lines of Twitter linguistics,&#8221; &#8220;The he&#8217;s and she&#8217;s of Twitter&#8220;), has made possible a lot &hellip; <a href=\"http:\/\/thenerdis.me\/blog\/?p=135\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[14,15],"_links":{"self":[{"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=\/wp\/v2\/posts\/135"}],"collection":[{"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=135"}],"version-history":[{"count":1,"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=\/wp\/v2\/posts\/135\/revisions"}],"predecessor-version":[{"id":136,"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=\/wp\/v2\/posts\/135\/revisions\/136"}],"wp:attachment":[{"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=135"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=135"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/thenerdis.me\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=135"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}