As a student fellow at the Knight Lab, I get the opportunity to work on a variety of different projects. Recently, I’ve been working with Larry Birnbaum, a Knight Lab co-founder, and Shawn O’Banion, a computer science Ph.D. student, to build an application that takes a user’s Twitter handle, analyzes their activity and returns a list of celebrities that they tweet most like.
It’s not an earth-shattering project, but it is a fun way for Twitter users to see who they tweet like and perhaps discover a few interesting things about themselves in the process. It also gave me a great excuse to experiment with the tools available in the open source community for web scraping and mining Twitter data, which you can read about below.
The tools listed here are primarily for Python, but equivalent versions of these libraries exist in other languages — just search around!
Who’s a celebrity, exactly?
The first step in building this project was to gather a list of celebrities to compare users against. To do this, I searched the web for sites that had celebrity information. IMDB was the perfect solution as it had an extensive list of celebrities (actors, movie directors, singers, sports figures, etc) and provided the information in a structured format that was straightforward to collect using a web scraping tool.
- Beautiful Soup — A useful Python library for scraping web pages that has extensive documentation and community support. Choosing elements to save from a page is as simple as writing a CSS selector.
After gathering a list of celebrities, I needed to find them on Twitter and save their handles. Twitter’s API provides a straightforward way to query for users and returns results in a JSON format which makes it easy to parse in a Python script. One wrinkle when dealing with celebrities is that fake accounts use similar or identical names and could be difficult to detect. Luckily, Twitter includes a handy data field in each user object that indicates whether the account is verified, which I checked before saving the handle.
Once the celebrity name was associated with a Twitter handle, the next step was to again use Twitter’s API to download the user’s tweets and save them into a database.
When gathering data you will often encounter the “rate limit exceeded” error message. This is because Twitter imposes a limit on the number of API calls a single app can make in set “window” of times (currently 15 minutes). To get around this problem, you can either make multiple Twitter Apps and request additional OAuth credentials or set up a cronjob task to run every 15 minutes. Doing so will allow for your script to run during scheduled times or intervals in the background, leaving you free to perform other tasks.
A few tips for writing cronjob tasks that I found extremely helpful when collecting data:
- Construct your scripts in a way that cycles through your API keys to stay within the rate limit.
- Be sure to catch exception errors that may occur when accessing Twitter’s API and write to an error file for later review. This will allow for your scripts to run unattended and not crash the entire program when an error occurs.
- Run your scripts on a remote computer (unless you want to keep your computer on the entire time the scripts are running!).
- Twitter API — A Python wrapper for performing API requests such as searching for users and downloading tweets. This library handles all of the OAuth and API queries for you and provides it to you in a simple Python interface. Be sure to create a Twitter App and get your OAuth keys — you will need them to get access to Twitter’s API.
- PyMongo — A Python wrapper for interfacing with a MongoDB instance. This library lets you connect your Python scripts with your database and read/insert records.
- Cronjobs — A time based job scheduler that lets you run scripts at designated times or intervals (e.g. always at 12:01 a.m. or every 15 minutes).
Once the tweets have been successfully stored in your database, you can manipulate the data to fit the needs of your project. For my project, I removed common words and created an index on the text of the collected tweets to perform the similarity comparisons.
Accessing the Firehose
If you’re ready to go beyond the data limits that Twitter imposes for free access, you can upgrade to Twitter’s Firehose API where you can get nearly unlimited access to Twitter’s data stream via one of the various data providers that Twitter partners with, including Dataminr (CNN recently partnered with Dataminr build an application that alerts journalists in newsrooms of breaking news and emerging trends), Datasift, Gnip, Lithium, Topsy.
While the number of projects you could build using Twitter data is close to infinite, there are a few cool and fun civic-minded projects already out there. NoHomophobes.com gives you a glimpse of how prevelant homophobic speach is on Twitter. Closer to home, Knight Lab has developed a number a different projects using the tools above: twXplorer, BookRx, and NeighborhoodBuzz to name a few. While the scope of these projects range from text aggregation to recommendation engines to sentiment analysis, they all leverage the use of various open source tools to access Twitter data and build applications on top of it.
Advice for completing your application
Find help related to application requirements, such as your letters of recommendation and personal essay, and answers to common questions.
Standardized test scores
TIP: Take whichever standardized test best suits you, and submit SAT Subject Test Scores if you’d like to demonstrate additional areas of academic strength.
You may send in either an ACT or SAT score. There is no preference for either exam.
We review test scores slightly differently between the ACT and the SAT. For students taking the ACT, we use the highest official composite score that you choose to report. For students taking the SAT, we will simply add together your highest scores in the Critical Reading and Math categories to make a higher total score. This is the only area in which we create a “superscore” for applicants. You may use Score Choice to select which official SAT scores we receive.
SAT subject tests are optional. Subject exams, offered in topics such as molecular biology, English and physics allow students to demonstrate their skill level(s) in a particular area. For instance, a future engineering student may choose to take the chemistry and math subject tests. However, subject tests may be from topics outside of your potential major. Prospective students who do not submit SAT subject test results will not be penalized; we recognize that many applicants may not have the opportunity to sign up for the exams.
Letters of recommendation
TIP: Letters of recommendation should be from individuals who can best speak to the range of your strengths and abilities.
We require two letters of recommendation.
One letter should come from your high school counselor. This could be your college counselor, guidance counselor, academic advisor, career center specialist or whoever can to best speak to your overall high school curriculum and involvement within the context of your high school. The second letter should come from one of your teachers who can address your strengths as a student in the classroom; this recommender should most likely be a teacher from one of your core subject areas, in your junior or senior year.
If you have an additional reference who would like to submit a letter on your behalf, we will accept supplemental letters of recommendation. It is in your best interest that each letter provides new or different information about you.
TIP: Essays are an opportunity – they are one of the few sections of your application that you can manage right now, rather than being dependent on your past performance.
Essay writing is an excellent opportunity for personal expression and original thought. Applicants to Northwestern complete two sets of essays: essays appearing on the Common Application or Coalition Application, and the Northwestern Writing Supplement essay. The suggested word limit guideline gives you the chance to answer each question in detail, while also challenging you to write in a concise and clear manner.
In the Northwestern Writing Supplement, we ask students to explain why they would like to attend Northwestern. This question is intentionally open-ended. You may choose one or several aspects of Northwestern to focus your writing, though the majority of the essay’s content should relate to your own interests or experiences.
TIP: The activity chart is your opportunity to be thorough about the depth and range of your involvement, whatever it may be.
The activity chart is your chance to explain any and all activities in which you’ve been involved outside of your high school classes. Provide as much detail as you can, explaining any abbreviations or acronyms that may be unique to your school. If you have held any leadership positions or received any awards, honors or distinctions, be sure to include that information on the activities chart as well. There’s no “right answer” to what kind of activities we like to see – Northwestern has over 500 different clubs and activities on campus, so we appreciate a very wide range of activities and value diversity of student interests.
TIP: Use the “Additional Information” section of the Common Application to share any information that may have significantly impacted your academic performance or other involvement.
If you have experienced any special or outstanding circumstances that may have interrupted or significantly affected your academic performance in high school, you may write about those in the “additional information” section of the Common Application. If your high school counselor is aware of these circumstances, he or she may also use the Counselor Recommendation to explain this information. Should you have additional circumstances that need to be addressed, you can email a brief summary to firstname.lastname@example.org.
Selecting Early or Regular Decision
TIP: If Northwestern is your first choice for college, applying Early Decision best positions you within a competitive applicant pool. If you’re applying for financial aid, we use the same need-based process for financial aid awards for early decision and regular decision; your aid package will be the same regardless of when you apply.
If Northwestern is your top choice, you are strongly encouraged to consider applying Early Decision. We use the same review criteria for both early and regular decision. Applicants in both cycles are very competitive. Last year we enrolled 49% of our incoming freshman class from early decision.
Northwestern allocates financial aid on the basis of demonstrated financial need. Should you receive an offer of admission, your financial aid (including scholarships) will not differ whether you apply under the early decision or regular decision time frame. Please use our Net Price Calculator to determine your expected family contribution. Northwestern guarantees to meet 100% of the demonstrated need between your expected family contribution and the total cost of attendance.
Interviews and meetings
TIP: Admissions staff members do not conduct interviews, but optional, informational alumni interviews are available in some cities.
Alumni Interviews are an optional component of the application process, available on a limited basis. Alumni feedback is included in your file, but not participating in an interview has no negative effect on your chance of admission. Alumni interviews allow applicants to ask questions, and are primarily informational. Read about alumni interviews and availability.
Once you’ve hit “submit”
TIP: Once you’ve submitted your application, keep an eye on your email account associated with your Common Application or Coalition Application – that’s where we’ll send any important updates regarding your application status.
Congratulations! You’ve completed your application. Sit back and relax. Keep your eye on the email address associated with your Common Application or Coalition Application. If we are missing any of your application materials, you will receive an email from email@example.com. Otherwise, you’ll hear from us with an admission decision, by mid-December for Early Decision applicants, and by the end of March for Regular Decision applicants.Back to top