Raam Dev’s Weblog

Avatar

The power of knowledge is not realized until that knowledge plays a key role in bridging an otherwise impossible gap.

HTML Radio Buttons: A blast from the past!

So there I was sitting in my C/Unix class at Harvard barely paying attention to the professor as he talked about HTML forms (!) when I heard him start talking about the history of the HTML radio button. I often wondered why they were called “radio” buttons so I shifted my attention and listened.

He started by trying to explain to a room full of people a third his age how car radios did not always have tiny touch-sensitive buttons and that they used to be single mechanical buttons that when one was pressed, the other would come out (much like the old cassette-based walkman’s).

This little fact fascinated me because I have been using HTML radio buttons for so long and until now, I have been so oblivious to the history behind their name. A quick search on Wikipedia confirmed my professor’s story:

A radio button or option button is a type of graphical user interface widget that allows the user to choose one of a predefined set of options. They were named after the physical buttons used on car radios to select preset stations - when one of the buttons was pressed, other buttons would pop out, leaving the pressed button the only button in the “pushed in” position.

Googlebot Relentlessly Using Bandwidth

When one of my hosting clients complained about continuously running out of bandwidth on his low-traffic site, I took a peek at the access logs and discovered that Googlebot was indexing every single possible day on a simple calendar addon for the phpBB2 forum software installed on the site. (Googlebot is the program that crawls the web indexing everything so you can search for it using Google.)

A quick peek at the access logs showed thousands of Googlebot requests for a forum calendar:

66.249.71.39 - - [01/Sep/2008:17:09:12 -0400] "GET /forums/calendar.php?m=7&d=21&y=1621&sid=79b643b30eer7140adcd2ba76732688a HTTP/1.1" 200 44000 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.40 - - [01/Sep/2008:17:09:33 -0400] "GET /forums/calendar.php?m=4&d=2&y=2188&sid=e4da1ee0a488096e3897a8f15c31cea2 HTTP/1.1" 200 43997 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.40 - - [01/Sep/2008:17:09:44 -0400] "GET /forums/calendar.php?m=12&d=4&y=1624&sid=cc5d5084d158457ce3c7a9d38263f553 HTTP/1.1" 200 44076 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.41 - - [01/Sep/2008:17:10:05 -0400] "GET /forums/calendar.php?m=10&d=15&y=1621&sid=a4e8af0d20715g965b3e616ae6f95004 HTTP/1.1" 200 43751 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.41 - - [01/Sep/2008:17:10:15 -0400] "GET /forums/calendar.php?m=9&d=13&y=2187&sid=80c79b2491ddf3d8d46076d48a6282d1 HTTP/1.1" 200 43896 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.40 - - [01/Sep/2008:17:10:26 -0400] "GET /forums/calendar.php?m=5&d=30&y=1618&sid=f0619ba6517an57bcd6a7e9ca6289a32 HTTP/1.1" 200 43820 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.39 - - [01/Sep/2008:17:10:38 -0400] "GET /forums/calendar.php?m=11&y=2189&d=30&sid=97c0a58bbd2b3914dbf255ea0a2b1a4c HTTP/1.1" 200 44107 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

A quick Google search turned up many others who’ve had the same problem:

Just found exactly the same on one of my client’s sites. They were complaining that despite being a small site, they’d apparently used all of their bandwidth within 4 days.

They had one of these PHP calendars on their site, where you click the day and it tells you what’s on. Googlebot had tried to index EVERY SINGLE POSSIBLE DAY. And, in the first four days of September, had used up all this site’s bandwidth, clocking up an impressive 19,000 hits and 800MB of bandwidth.

You can use robots.txt to tell all decent robots to push off. I’ve just done that. Let’s see if it works!

So I added a file to the root web directory for the site and named it robots.txt. Inside, I put the following:

User-agent: *
Disallow: /forums/calendar.php

Sure enough, the next time the Googlebot came through it ignored /forums/calendar.php and didn’t use up ridiculous amounts of bandwidth indexing something that need not be indexed.

I can’t blame the Googlebot though. It was just doing its job. The fault goes to the creators of the calendar addon. What they should have done was add a rel="nofollow" to all the links in the calendar. You can add a nofollow tag to individual links to prevent Googlebot from crawling them. Google started using the nofollow tag as a method of preventing comment spam back in 2005.

Google Reported Attack Site

Google Reported Attack Site

I’m sure some of you must have seen this warning when you tried to visit my site. Fear not, I have fixed the problem. There was an old file on my domain that had a link to a site that was defined as “malicious” by Google, so they basically added my entire domain to the watch list. I removed the file and, after asking Google to check my site again using Google’s Webmaster Tools, they removed my domain from the list.

So, how did I find the few pages (among thousands of files on my site) that contained a link to the malicious site Google was blocking me for? I logged into my site via SSH and ran a command like the following:

for i in `find . -name "*.ht*"` ; do echo $i; cat $i | grep 195\.2\.252; done

This basically searched every single .htm or .html file inside my public_html directory and returned anything that contained the IP address I was looking for. Whenever there was a match, the filename that preceded the output was the offending file. I’m sure there’s a more elegant way of doing this, but hell, I just wanted to fix the problem!

Although this was annoying to deal with, it made me feel good that Google is actually keeping track of these things and, with the help of Firefox, is warning people of such sites. Site owners must be vigilant in fixing such problems or they risk losing loads of traffic from Google (and from visitors with Firefox).

  • One of the downsides of being organized and having a clear picture of everything you need to do is that you realize just how much stuff you need to do! I’ve been using David Allen’s GTD method a lot lately, along with the help of an application called OmniFocus, and I can’t believe how much stuff I have written down since I started using this method. I have 300+ items in OmniFocus. Try to imagine the relief and freedom I have given my brain by allowing it to let go of trying to remember (subconsciously) 300+ items and you’ll begin to see why the GTD method is so powerful. (2)

Escaping Filename or Directory Spaces for rsync

To rsync a file or directory that contains spaces, you must escape both the remote shell and the local shell. I tried doing one or the other and it never worked. Now I know that I need to do both!

So lets say I’m trying to rsync a remote directory with my local machine and the remote directory contains a space (oh so unfortunately common with Windows files). Here’s what the command should look like:

rsync 'raam@example.com:/path/with\ spaces/' /local/path/

The single quotes are used to escape the space for my local shell and the backslash is used to escape the remote shell.

Comment Threading Coming to WordPress 2.7

It looks like the WordPress commenting system will finally being overhauled in v2.7, including support for threaded comments!. If there is one thing I wished was done better in WP, it’s definitely the comment system. After all, second only to the actual blog content itself, comments and discussion are the most important aspect of a blog.

Intense Debate, the makers of software which adds commenting features like threading, reply by email, voting, reputation, and global profiles to WordPress and other blogging software, was just acquired by Automattic, the company behind WordPress. I think this goes to show how serious the improvements coming to the commenting system will be.

Matt Mullenweg, the founding developer of WordPress, has a post about the acquisition and talks about what it means for WordPress.

  • I’ve been consistently getting rid of stuff that I don’t need or that I rarely use. I’m going to continue doing so until I’m down to the bare minimum. Then I’ll implement a personal rule that prohibits me from taking on more stuff unless I absolutely need it or unless I get rid of something else first. Not only have I been doing this with material stuff, but with mental stuff too. It’s a freeing experience to just let go of stuff. I realize how unnecessary certain things were after they’re gone. (0)

A Week of Big Decisions

This past week my head has been full of big decisions. Late last week, I was given the opportunity to make a big chunk of cash for three hours of work if I was willing to accept a slight (although real) risk to my health by removing asbestos from an old basement boiler. I needed the money, but in the end I realized that trying to put a price on my health was just stupid.

Then yesterday, the first day of spring, I signed the paperwork to file for Chapter 7. The decision had already been made, but it was a big event in my life nonetheless.

And lastly, since a huge portion of my income currently goes to rent, I’ve been trying to decide what I can do to decrease that expense considerably. I have been crawling CraigsList looking at what’s out there and then checking MBTA’s website to determine commuting distance and cost.

Since I’ve accepted that I’ll probably need to spend three hours a day commuting (total), I have even checked out a few places near the ocean (I love the ocean). However, the more time I spent looking for an apartment and planning my potential commute, the further from my original goal I unconsciously drifted. The original goal was to decrease expenses and to save as much money as possible. I’m still making my decision on where to move, but it’s going to be an important one that will have the potential to save me a lot of money.

xkcd: Tones

Amen!

Amen!

A Unique Perspective: College Campus

Today marks the first day of my twenty-six years of living that I spent time inside a college classroom and on a college campus. (There was one time I attended an Indian classical music concert with my dad at MIT, but I was young and barely remember it.) I’m taking the Introduction to C/Unix/CGI Programming class at Harvard Extension. As I walked around campus on the first day of class, I very quickly observed how different things felt from the “normal world”.

My perspective is probably somewhat unique in that I have been around business for as long as I can remember. My parents have always owned their own business and I myself just went through being a landlord for a few years and then lost all three of my houses to foreclosure. I also had my own consulting business going for awhile. Being home-schooled my whole life also meant that I saw nothing of the public school system.

The atmosphere of being on campus felt very unfamiliar to me — almost alien. I was only a few hundred feet from streets I had driven on every day for the past few years and yet I felt as if I was on a different planet. It’s a hard feeling to describe. As I stood there looking around, I could almost fool myself into believing I was in the middle of a utopian alien society where everything was about peace, harmony, learning, and knowledge. (Then I turned around, looked across the street, and saw all the money-hungry shops trying to buy your soul. I was quickly reminded that I am, in fact, still on Earth. Damn.)

As strange and different as the atmosphere felt, it also felt relaxing — like there was nothing to do except learn and relax. I was able to walk into a building and have instant access to dozens upon dozens of computers just waiting for me to login and start using them for whatever constructive thing I needed to do. Everything outside was clean and there were plenty of benches and places to sit.

But my perspective is flawed. The company where I work is paying for the class and I’m sure things wouldn’t feel quite as “free” and relaxing if I had student loans riding on my back. But that should say something for the current system. Imagine what society would be like if all education was free. Imagine the atmosphere it would create. People learning because they want to learn and because they can learn. Not learning because they want to make money and get an awesome job to pay off their student loans. No, learning because they want to create, explore, and evolve. Learning because it’s fun. Learning to learn.

  • It’s frustrating when I’m looking for a good FOSS app and the best one I can find was developed by non-native English-speaking developers. (1)
  • One of the new features of iTunes 8 is called Genius. It basically allows you to select any song in your library, click the Genius button, and it will generate a playlist of songs from your library that it thinks go great together. I was a bit optimistic at first, but holy crap! I tried it and I suddenly feel like I have a real live DJ picking songs for me to listen to! I feel like I’m rediscovering all the music in my collection! (0)
  • How to become an astronaut is an interesting article. It describes the risks to consider (which definitely made me reassess my desire for space travel) and the kind of education path required to apply. (0)
  • I rarely read the articles posted on Slashdot. Instead, after reading the article summary, I go straight to the user comments for interesting information and true tech humor. Take for example the first comment on the Apple Losing Touchscreen War post by ardor: Steve will fix it, don’t worry. Steve Jobs is not a human with a reality distortion field, Steve Jobs is a reality distortion field with a human body inside. (0)
  • As of late, I’ve been spending more of my free time reading and doing yoga/stretching. One of my 2008 New Years resolutions was to read one book a month. Since that definitely hasn’t happened, I thought it made the most sense to make my first cover-to-cover read for this year a book about getting things done: Getting Things Done by David Allen. (0)