Archive for January, 2010


Adventures in awk

I’ve been working on a literature review lately so I’ve been sifting through tons of articles.  In my search, I’ve come across a few bibliographies of papers on specific topics which were compiled by researchers interested in those topics.  When the bibliography is reasonably small (in the case of a very narrowly defined topic), it’s usually fastest to sift through it by hand to find articles that might be of interest.  However, the most recent bibliography I found contains 8342 papers.  I am definitely not about to print that out and go through it by hand.

This bibliography is available as an EndNote library file and as a Rich Text Format (rtf) document.  Apparently, if you have a recent version of EndNote installed you can use its search features to sift through the data.  However, the file won’t open in EndNote 6, which is what I have on my laptop.  Zotero wouldn’t import it either.  I tried using the built-in search capabilities in Word and even jEdit on the rtf but nothing could provide me with what I wanted.  Basically, I wanted the ability to export entries that matched a given search criterion to a separate file.  Presumably, there are programs that can do this for you, but I don’t know of them and don’t have them installed.

In the end, I decided to convert the rtf to a simple txt file.  This put each entry on its own line.  With each entry occupying a single line in the text file, I just needed some way to search for a given term and then output each line that contains that term.  I have used sed and awk a little bit in the past and I knew that there must be some way to do that with either or both of those, so I looked into their syntax online.  I found this awk tutorial and, using the examples there, I was able to put together a command that does exactly what I needed:

awk '/term/ {print $0}' < bibliography.txt > term.txt

where “term” is replaced by whatever term you want to match.  You can further automate this by putting a bunch of these commands into a shell script or writing a little Perl program that will take a command line argument and insert it as “term” in the command.

Now I have a list of papers related to all the terms I’m interested in and I’ve got a fast way to search for further terms in the future if I need to.  The approach is a little “awk”ward (groan!) because I have to run it in Linux and I use Windows most of the time.  I no longer have Linux installed as a virtual machine on my laptop and I don’t even have Cygwin installed anymore.  So, I had to upload my text file to one my research group’s Linux servers, run the scripts, and download the results back to my laptop.  Once I figured out what I needed to do it took me less than half an hour to do it, though, so even if it’s kludgy, it’s still a lot faster than reading through the bibliography manually.

UPDATE: I just realized that all I did was replicate the functionality of grep using awk.  That is, I could achieve the same result with the following code:

grep term < bibliography.txt > term.txt

Additionally, it turns out that you can produce this functionality with sed as well, using the following code:

sed -n 's/term/&/p' < bibliography.txt > term.txt

I guess I missed the fact that I could use grep because I started thinking about using sed or awk before I converted the file to plain text.  Each entry was spread over multiple lines so I was thinking about needing something fairly sophisticated.  I know that grep can do regular expressions but my first thought went to sed and awk, which are like one logical unit in my brain because of the O’Reilly books that cover both.


URL hijacking

I’ve been using the online resources provided by the university library a lot lately.  Mostly I’ve been accessing online journals and downloading pdfs of articles that are pertinent to my work.  Generally, access to these journals seems to be afforded to people on campus by routing the request from the library website through a proxy server that the library maintains.

Unfortunately, I’ve found that basically any URL ending with gets hijacked by the proxy server or my browser somehow routes the request through the proxy server if I’m using Firefox.  This isn’t a problem for sites that don’t require authentication but it’s produced a number of problems for sites that do require authentication.  Basically, I can no longer access the online purchasing portal or even the student services portal using Firefox.  This might not seem like a huge problem since I should just be able to use a different browser.  However, MIT doesn’t provide a means for installing personal certificates in Chrome or IE 8 (which are the only other browsers I have installed), so I can’t use these browsers to access sites requiring certificate-based authentication using alternate browsers either.  I ended up having to use a different computer altogether in order to pay my student account and register for the Spring semester.

I’ve been in contact with the help desk at the library.  They’ve offered some suggestions but, so far, nothing has worked.  First, they recommended that I check the proxy settings in Firefox to make sure that I didn’t have the library proxy set as my means of connecting to the internet.  I checked my settings and, sure enough, the option selected was “No Proxy”.

They suggested clearing my browsing history in Firefox.  So, I opened Firefox and chose Tools -> Options -> Privacy -> “clear your recent history”.  I told it to remove everything and I selected all the various types of data available to delete – browsing and download history, form and search history, cookies, cache, active logins, and site preferences.  After clearing them I attempted to go to the MIT purchasing portal but the request was redirected again through the library proxy.  I tried typing a few characters of the URL and I found that the address bar still offered a bunch of choices from sites I had visited in the past.  What?  I thought this stuff was supposed to be cleared!

So, I downloaded CCleaner, a donation-ware program that can search your system and clear out temporary files, cached files, and various things that browsers tend to accumulate.  So, I ran that and cleared out everything including my history and cache in IE8 and Chrome.  I went back to Firefox but, although everything was gone, if I chose View -> Sidebar -> History (Ctrl H) or History -> Show All History (Ctrl + Shift + H), if I type a few letters in the address bar, it still suggests to me sites that I’ve visited in the past.  How do I get rid of the stuff in the address bar?

It turns out that the sites that still come up in the address bar are sites I’ve bookmarked.  I had forgotten that the address bar in Firefox also searches your bookmarks as well as your browsing history.  So, my history is clear after all and using CCleaner was unnecessary.  That said, it did clear out like 2GB of temporary files, which is a nice little bonus.

I was at a loss about what to do when I had a flash of insight – maybe the problem is related to the DNS cache.  So, I followed these instructions and flushed my DNS cache.  I was certain this would work.  But, it didn’t.

As a last resort, I decided to check my Firefox add-ons.  My first thought was that the problem could be related to NoScript since it’s the most invasive of the add-ons that I use.  However, the problem persisted even when I turned NoScript off.  Then, looking through the list of add-ons I use, I spotted Zotero.  This add-on is for managing journal article and book references.  I have used this add-on extensively during my library research, so I decided to look under the hood.  What I found was this:

I took a screenshot of this for future reference and then removed the entry.  Voila!  I’m now able to access the purchasing portal, the student services site, etc.

So, if you’re having weird problems with certain sites being redirected through library proxies and you have Zotero installed, Zotero is probably the culprit.  I suggest you start by checking on Zotero’s proxy settings.  If there isn’t anything in the list or if removing items on the list doesn’t fix your problem, try some of the other things I tried.  In particular, I recommend you check Firefox’s proxy settings, followed by flushing your DNS cache.

I hope that someone will find this useful!


My URL-shortening service of choice

As far as I’m concerned, URL-shortening services have one objective: to make URLs short.  I am a lot less concerned with various features those service might offer than I am with how short they can make URLs.  As far as I can tell, creates the shortest URLs around, so it has become my URL-shortening service of choice.  Not only does it have the shortest possible domain name but it also uses both upper- and lower-case letters as well as numbers as the database key.  This means that it can index more URLs using a given number of characters than shortening services that use, say numbers and only lower-case letters.

However, is owned by, which has a lot more visibility as a URL-shortening service.  While owns, they are not separate services that happen to have the same owner; rather, they are highly integrated.  A single user account gets you access to both and  Furthermore, and actually share their database, since and, for example, both redirect to the same site.  This raises the question of why I would want to use a shortening service like when is actually the same but produces URLs that are 2 characters shorter.  In many ways, is the best of both worlds.  Because it’s connected to, it offers a slew of nice features but it produces the shortest URLs around.

Unfortunately, it’s clear from the website that is interested in promoting and not  At some point in the past I went to and grabbed their “Shorten with” bookmarklet, which I put in my bookmark toolbar so I could easily shorten URLs for use on Twitter.  However, this bookmarklet appears to be gone.  If you go to, it offers a “Shorten with” bookmarklet rather than a “Shorten with” bookmarklet”.  Doesn’t that seem strange?

Fortunately, there is a simple solution: manually edit the bookmarklet.  It turns out that simply changing in the “URL” field code to is sufficient to change this “Shorten with” bookmarklet into a “Shorten with” bookmarklet.  In addition to making the change in the “URL” field, you probably want to change the name of the bookmarklet just so it’s clear that you’re using rather than


January 2010

Recent Twitterings

Follow Me on Twitter

RSS That to which I am listening

  • An error has occurred; the feed is probably down. Try again later.