id author title date pages extension mime words sentences flesch summary cache txt github-com-1536 twarc/unshrtn.py at main · DocNow/twarc · GitHub .html text/html 682 163 70 twarc/unshrtn.py at main · DocNow/twarc · GitHub GitHub Education → View all branches View all tags Code definitions No definitions found in this file. View blame Unfortunately the "expanded_url" as supplied by Twitter aren't fully unshrtn.py will attempt to completely unshorten URLs and add them as the "unshortened_url" key to each url, and emit the tweet as JSON again on stdout. http://github.com/edsu/unshrtn import time import urllib.request, urllib.parse, urllib.error # number of urls to look up in parallel unshrtn_url = "http://localhost:3000" return url unshrtn_url, urllib.parse.urlencode({"url": url.encode("utf8")}) resp = json.loads(urllib.request.urlopen(u).read().decode("utf-8")) return resp[key] tweet = json.loads(line) return line if "expanded_url" in url_dict: url = url_dict["expanded_url"] global unshrtn_url, retries, wait help="number of urls to look up in parallel", "--unshrtn", help="url of the unshrtn service", default=unshrtn_url help="number of time to retry if error from unshrtn service", help="number of seconds to wait between retries if error from unshrtn service", unshrtn_url = args.unshrtn Copy lines Contact GitHub ./cache/github-com-1536.html ./txt/github-com-1536.txt