My BIT 330 Experience

Exercise 15 - Tag Based Sites

by BrianHeM10BrianHeM10 (29 Oct 2008 00:40; last edited on 29 Oct 2008 03:01)

Digg vs Slashdot - Social News War

digg-logo.jpg
slashdot.jpg

The number of social news sites has grown dramatically during the past few years - some succeed while most fail. Digg and Slashdot are two of the ones that have succeeded. Digg has the most daily unique hits by far but it is also caters to most types of news readers. Slashdot is also extremey popular as well but focuses on a much more niche market Slashdot serves ("News for Nerds"). I am going to compare to both sites in terms of the quality of articles I find, the browsing experience, and any additional features that separates one site from the other.

Digg

The first thing that jumps out out you when you go to Digg is the fact that each article has a number of "Diggs" next to the article's title. Diggs are basically votes from site readers who help the community know whether an article is of good or bad quality and contains relevant or useless informaton. The process is explained here. This process creates a real democratic/social feel to the site and is the same sense-of-impact that drives other social collaboration services/sites. The inclusion of Video and Images is also extremely interesting and helpful. I've always been frustrated that with the exception of the "Popular" videos on Youtube, there is little to no way to distinguish a good video from something that is a waste of my time.

Part of the value of going to a news site is to find out information you weren't sure you were looking for. For example, when you use a search engine, you are looking for something specific. But, when I go to a news site, I want to be told something random or interesting. Digg definitely delivers on that. I wasted 30 minutes between paragraphs reading about Six Creepy Urban Legends, 10 Foods to Boost My Immune System (and that fix my cholesterol problem), Top 10 Ways Harry Potter Should Have Died in the Deathly Hallows, 6 Best Video Game Costumes for Halloween, The Most Ridiculous Video Game Box Quotes Ever, 7 Actors and 2 Directors Who Should Retire….you get the idea. I didn't even care they had a "search box" (but apparently really am attracted to articles that are Top "X" lists). The fact that I could find things that are interesting to me and in such a quick amount of time is due to two things: Digg has a fantastic user interface that is easy to navigate and a rabid fan base that does a great job at voting on content.

The past 10 days has been mind blowing for the gaming industry (and will get more intense over the next few weeks) with the release of Fallout 3, Fable II, and Farcry 2. With Fallout 3 being released today, I did a search for digg for about Fallout 3. I didn't really like how most of the top results seemed really outdated. Only 1 result on the front page was from within a month. I understand the "Best Match First" ranking, but I think a news site should always take date into consideration. When I switched to Newest First, the results were pretty relevant (3 of the results on the front page were random); but, all but one was less than a week old. This is crazy considering Fallout 3 is one of the most anticipated game launches of this quarter - there should be at least 10-15 articles from today or yesterday about this topic. That was a little strange.

Digg is a daily visit for a lot of people I know and whenever people ask me or talk about it, I just nod my head and pretend like I actually go to the site. Yet, in just a short amount of time playing wtih the site, I can see why it is so popular. Nevertheless, it is much better for finding random entertaining/interesting articles than using it as a search tool.

Slashdot

Slashdot has been around forever it seems (even though it has not) because as a computer-maniac in high school I always was linked to articles on Slashdot. The most interesting thing about the site is that if someone didn't tell me it was a "social news" site I would never have guessed it. It has an extremely different interface than Digg (much more "newsy" and formal) and doesn't emphasize the community aspect as much. This is mainly because articles are actually posted by site staff (or so it appears) and submitted my members. Slashdot has **a lot* less content than Digg it seems so Slashdot may be better utilized in RSS form.

In terms of content, about 90% of the articles in the Developers, Games, Hardware, Mobile, and Tech sections are interesting to me so I won't go into specific articles I like/read. Although there is not a ton of new content every day, whatever is posted is generally very high-level and destined to be referenced by a plethora of bloggers and wanna-be tech news sites. Slashdot is much better than Digg as a news site, but I am really struggling to see the social elements of it.

Summary / Other Odds & Ends

There is no clear winner between Digg and Slashdot as they seem to serve two different purposes. However, Slashdot, unfortunately, is a site I can stick into an RSS feed and be told what and when to read things. Digg is more entertaining to visit and can be a vacuum for my personal time. Plus, there is one huge, huge thing that separates Digg from Slashdot: Digg has a Microsoft section!. Why does Slashdot only have an Apple section? Slashdot is missing out on articles like Windows 7 to be Released in 20 Versions (not true, but hilarious article).


Exercise 14 - Page Monitors

by BrianHeM10BrianHeM10 (26 Oct 2008 04:25; last edited on 26 Oct 2008 05:18)

Sending PricewaterhouseCoopers Through Pipes

ypipes.jpg

Currently I am preparing for my first interivew of my recruiting process for a Technology Advisory Intern position at PricewaterhouseCoopers. In class, a suggestion was to use one of the new page monitor tools to keep track of new information about a company. Thus, that is exactly what I did. I will give a brief overview on what Yahoo Pipes is, show how I created my pipe, and comment on the results.

Why Use Yahoo Pipes

Yahoo Pipes is an extremely interesting and powerful tool that lets people take a information from across the web, combine it, filter it, and mash it up in almost any way possible. Some examples of cool Pipe creations are Hot Deals Pipe, which lets you enter 1-3 items you are interested in finding good deals on, and People Search, which lets you enter in a first/last name and find images and news about the name.

Yahoo Pipes is an advanced version of FeedRinse - a tool that takes a set of RSS feeds and filters the content based on your interests. FeedRinse focuses just on RSS/XML pages while Yahoo Pipes lets users extract data from HTML pages, Flickr photos, Google Base, and Yahoo Search.

How I Created the PwC Pipe

Creating a new pipe is a fun experience for anyone who likes Legos or just mashing things together. It's user interface is pretty slick and really makes you feel like you have complete control over what you're making. My PricewaterhouseCoopers pipe is extremely simple compared to the ultra-complex creations some people have made. However, it takes some time and a strong programming background to go beyond what I can do.

After deciding "What is my objective?" for my pipe, the next step is to decide "Where am I going to get my data from?" I usually use Google Base and Yahoo Search in my pipes, so those were a given. The queries are the same for both (intitle:PricewaterhouseCoopers AND advisory). Google Base will give me links to articles, blogs, and press releases about my topic while Yahoo Search will return basic search results. However, I restricted the Yahoo Search result to not include any result from "pwc.com" because I have page monitors from WatchThatPage and Feed43 to watch for new job postings.

I then found 5-6 RSS feeds related to the Big4 & PricewaterhouseCoopers. For the Big4 blogs, I filtered the results using the "Filter" operator to only permit entries that contain PwC or PricewaterhouseCoopers. Unfortunately, screenshots of my final pipe do not work.

What I Discovered

My pipe worked perfectly and I already have a lot of good information to read on PwC. I will also have access to any new information between now and my interview date. Here is a partial list of the articles/blogs my pipe returned:

I hope this pipe will help me impress the interviewer with some knowledge of the latest PwC news. I also look forward to using Yahoo Pipes more in the future and learning how its more advanced abilities.


Exercise 6 - Web Directories

by BrianHeM10BrianHeM10 (23 Sep 2008 21:27; last edited on 24 Sep 2008 15:48)

Google Directories Leads the Way

Over the last few years, Google's distance ahead of everyone else in the search field continued to grow. Nevertheless, I have stuck with using Yahoo Directories when browsing for information. I liked the presentation of Yahoo's directory service and it usually helped me find information. I really never knew (or cared) that Google had its own version. Yet, today when I was comparing the search results of Google Directory and Yahoo Directory for "Software as a Service", I came away with a lot of useful information from Google and absolutely nothing from Yahoo.

directory.jpg

The Query & The Browse

The benefit of directories is to be able to go in without really knowing what you are looking for and find a list of categories and websites that relate to a general idea you have in mind. Thus, the purpose of using these two directory services to search for "Software as a Service" was really to find some categories that relate to SaaS to use for future searches. My decision on which tool was more helpful is based on three criteria: 1) what the directories found when I typed in "Software as a Service", 2) what categories I found as a result of those results, and 3) what I found on my own starting from the home page.

Yahoo Directory Results- Very Poor

Unfortunately, the search results from Yahoo Directory were extremely poor and only returned 79 listings. The one highlight was that SalesForce.com was the 2nd result. Everything else was somewhat unrelated and/or inconsistent and the top three related categories are: Application Service Providers, B2B Customer Service Software, and B2B Electronic Data Interchange. It just was not what I wanted.

Google Directory Results - Very Good

Google Directory was on the complete opposite end of the spectrum. There were 636 results and all the results on the first two pages actually had Software-as-a-Service in the title or description. SalesForce was the first result! It is really nice when the leading company in the SaaS industry is the first thing to come up when you search "Software-as-a-Service".

The categories that were attached to some of the first 5-10 results were also very helpful. For example:

Computers > Internet > On the Web > Web Applications
Computers > Software > Business > E-Commerce > Business-to-Business
Computers > Software > Rentable
Computers > Internet > Web Design and Development > Hosted Components and Services

All of these led me to some really interesting listings of companies/products I had never heard but were grouped with companies in the SaaS 20. Fantastic! Searching is not always about finding what you know is out there but do not know right now. It is also about finding out what you do not know right now and do not know is out there. Proof:

SmugMug - A really awesome Photoediting/Sharing online service that sat below Flickr, Webshots, and Photobucket in the "Computers > Internet > On the Web > Web Applications > Photo Sharing" directory. Signed up, played with it, and its actually a better interface than anything I've used so far!

The Browse Test

After looking over the results of my searches, I went back to the home pages of both directories and began doing manual browses - going deeper into the directories one level at a time. Yahoo Directory really lost me once I got into the Computers —> Internet area. It was just really difficult to find where to go next. For Google, I got very deep and was able to find my way back to each of the four categories I mentioned above without going directly there (as in, I used other related categories to find them).

In summary, Google Directory worked like a charm and Yahoo Directory was a sad dissapointment. As usual, Google seems to just take the lead over everything Yahoo tries/tried to do.


Exercise 4 - Advanced Search Techniques

by BrianHeM10BrianHeM10 (16 Sep 2008 02:51; last edited on 17 Sep 2008 03:00)

Become More Lucky with Advanced Search Techniques

Introduction

EnglishCockerSpanielsAJLeftTommieLacono.JPG

Everyone has experienced that moment where they are sitting around and all of sudden think of a question they do not know the answer to. What is the best type of TV to buy? Who was the longest living Queen of England? How did Ghengis Kahn come to power?

In today's time, the vast majority of people fire up their computer, double click on their favorite web browser, and type in a few words into Google in search of an answer. Some people, depending on the mood, "feel lucky" and take the search out of searching. For those of you who do not know what I am referring to, Google asks if "I'm Feeling Lucky" when making a search - by clicking that button, I am telling Google to just take me to the first search result that would have come up.

Unfortunately, if that vast majority of people felt lucky pretty often, then odds are they would very frequently be disappointed with where Google took them. This is because most people type in simple queries into Google - for example, Ghengis Kahn or Queens of England. If you are searching for general information about a specific topic, you might go to a Wikipedia page about it. However, specific information about any topic requires a much more specific query.

This all begs the question: how can you become more lucky? How do you improve your chances of getting the search result you want when you click the "I'm Feeling Lucky" button. Hopefully I can illustrate for you the need for advanced queries and the techniques to create them. There are two sections to this blog post: 1) Simple Queries versus Phrases and 2) Query Operators. For all of the examples, I am going to focus on the subject of dog breeds and assume that I always click the "I'm Feeling Lucky" button (and never see the actual Google Results page until afterwards).

Simple Queries and Phrases

As I mentioned earlier, simple queries are searches that are just words entered into the search box. It can be a general term like "cats" or a specific word such as "Black Lab Dogs". These are the searches that the majority of people in the world do. Simple queries sometimes work and will almost 100% of the time give you a page somewhat relevant to what you want, but there will be plenty of instances where it's just not enough.

Query: dog breeds | Result: JustDogBreeds.com - This website has some information about dog breeds, but I am really looking for a stronger list of dog breeds with more information about each one such as colors, character traits, pros and cons, etc.

Query: cocker spaniel | Result: [http://en.wikipedia.org/wiki/Cocker_spaniel] - Wikipedia is a great source of information, but if I wanted information from Wikipedia, I would go straight to Wikipedia

Query: best dog breed | Result: [http://www.glowdog.com/bestdog/] - This is certainly the worst site I was taken too. There is nothing on this site that even talks about dog breeds - the site is simply talking about dogs and how to make them the best.

The last example is really the most important of the three because it highlights one of the key issues of simple queries - independent word relationships. When I typed in "best dog breed" (without the quotes), Google searches for the most relevant pages related to best, dog, and breed. It picks the most relevant / highest-ranked pages for "best", "dog", "breed", "dog breed", "best dog", "best breed", etc. If you look at the Search Results, you will see that only about half of the first ten results have all three words in either the title or description! Why?

This is because Google, and most search engines, treat words independently unless put together in a phrase. A phrase is a collection of words bounded by quotes (" " or ' '). Phrases are very helpful to ensuring the words you enter are looked at a group. After re-entering the above words in Google as phrases, all three of the top results stayed the same, mostly because they were only two or three words. The search results are pretty different though. Now, for "best dog breed", all the [http://www.google.com/search?hl=en&q="best+dog+breed"&btnG=Search Search Results] have those three words in the title. Compare these results:

Query: the best kind of dog breed | Result: Animal Planet: Dog Breed Selector

Query: "the best kind of dog breed" | Result: Yahoo Answers: What is the best kind of dog breed?

Notice how the first result had a really popular website that was related to only a few words from the group, whereas the second result was a page 100% specific to what I was searching. Awesome! This is all because of the quotes and we forced Google to find pages that had to have ALL of the words in it. Therefore, there were only three total results (compared to 3,010,000 with the first search).

Query Operators

Query operators give us the real power to ensure we feel super-lucky. We are specifically going to look at query operators that are about different attributes such as the page-title, page-url, and page-domain.

Page Title (intitle:) and Page URL (inurl:)

We can use the intitle: operator to make sure a word or phrase is in the actual title of the page and not just found within it. Note: for all these operators, the word or phrase must come directly after the colon in the operator, no spaces!

Query: cocker spaniel dog reviews | Result: YourPureBredPuppy.com - Cocker Spaniel

This website is just really about cocker spaniels and is more of an overview than a dog review. Although we could put "cocker spaniel dog reviews" as a query, it restricts the search a little too much and in fact this query returns something completely irrelevant. We can use the intitle: operator to feel more lucky.

Query: intitle:"Cocker Spaniel" dog reviews | Result: ReviewCentre.com - Cocker Spaniels

This website is dead-on with what I was looking for! The Page URL modifier serves the exact same purpose in that it searches for within the URL and not the page title.

Site Domain (site:)

The final, and perhaps the most important, operator relates to the Site Domain. Generally, website domain names that end in .gov, .org, or .edu have the most reliable results. When searching for information you want to make sure is as correct as possible, use the site: operator to ensure only those sites are returned to you. Keep in mind that, unfortunately, Wikipedia.org is still a result option with .org but is not always accurate. Here is a great example though:

Query: number of pets in america | Result: $40 Billion Spent on Pets

This is certainly a number and is certainly about pets…but it's not what I am looking for. If add site:gov lets see what happens.

Query: site:gov number of pets in america | Result: Census.gov - Pet Ownership

Once again, this is exactly what I wanted. However, the US Census is typically a common place for statistics so I decided to try to get a similar result without using the site: modifier. It did not go so well.

Query: us census number of pets in america | Result: $40 Billion Spent on Pets

I got the same result as I did without the us census in the query! Interesting discovery.

Conclusion

In closing, I hope that after reading this it is clear that just typing in words and hitting enter will not always provide you with the result you want. You may get it eventually after trying out a few different word combinations, but why waste time when there are ways to improve the accuracy of Google's results? In the interest of space, I did not go too in-depth on all the different modifiers and operators for queries. If you want more information, please visit: Search Techniques and Strategies


Exercise 2 - Web Search

by BrianHeM10BrianHeM10 (10 Sep 2008 16:51; last edited on 16 Sep 2008 02:51)

Web Search Exercise

Introduction

When someone wants to search for something online, what do they do? They don't search for it, they "Google" it. Google has increasingly become the default search engine for people across the world - its presence existing on desktop applications, mobile phones, instant messaging programs, etc. There was a time when people would consider using Yahoo, Ask.com, Altavista, and a few other major search engines when trying to find information. Now, most searches either are completed after one Google search or multiple Google searches.

This exercise introduces us to the crazy idea that Google is not the best at everything and, by leveraging the core strengths of multiple search tools, there are ways to improve our searching efficiency by thinking outside the gBox. Also, even within the Google search engine, the vast majority of searches are "simple queries" - a few words without any sort of advanced search syntax. This exercise opened our eyes to these more advanced ways of finding information within Google and Yahoo.

yahoo_logo.jpg 1_google_logo.jpg windows_live.jpg
Technorati.jpg google_blog_search_beta_logo.png Bloglines%20logo.png

Learning Advanced Search Syntax

Although most people do not know this, when one types in 'university of michigan' (without the quotes) into Google, Google really searches for anything that has the words University, Michigan, or University & Michigan. Although most results will come back related to our University, there still will be results (especially those past page 1) that relate to Michigan State University, Central Michigan University, Eastern Michigan, etc. Hence, Google returns over 25 million results from that search! There must be a way to control these results…and there is. Below are some of the following new ways I can search for information:

  • Search for X but Not Y (X -Y)
    • If I wanted to search for the cheese but I hate cheddar cheese, my query could be "cheese -cheddar"
    • Returns results related to cheese that do not have cheddar in the result
  • Search X + Words Similar to X (~X)
    • If I wanted to search for different kinds of wood doors, my query could be "~wood doors"
    • Returns results with doors + any word synonymous with wood, such as oak, birch, etc
  • Search X in Page Title (intitle:X)
    • If I wanted to search for websites related to granite that contained 'rocks' in the page title, my query would be "granite intitle:rocks"
    • Returns results with websites related to rocks and had page titles with "Rocks" in it
  • Search X in Page URL (inurl:X)
    • If I wanted to search for websites related to speakers that contained 'Logitech' in the page url;, my query would be "speakers intitle:logitech"
    • Returns results about speakers that have Logitech in the url (ie - http://www.cnet.com/reviews/logitechz5500.html)
  • Search X within site Y (X site:Y)
    • If I wanted to search websites about the Michigan Union but only wanted sites related to umich.edu, my query would be "union site:umich.edu"
    • Returns results about the Michigan Union that only have umich.edu in the URL

The intitle:, inurl:, and site: syntax have been helping me a lot in the last few days. About 50% of my searches now take advantage of at least one of those three syntax forms. They really help when you have a more specific idea of what type of page you are looking for. For example, I really use CNet and Tomshardware to get some hardware reviews, but I can't search both things at once. Therefore, I can type 'Intel Q9550 site:cnet.com OR site:tomshardware.com' to get information about that processor from only those two websites.

Exploring Web Search Engines

The next part of the exercise was to compare the results of the three largest search enginges - Google, Yahoo, and Microsoft Live Search. For the exercise, we tested the results of Google/Yahoo/Live Search when querying "Climate Change". My observations:

  • First results for Google and Yahoo were the EPA.gov, first result for Live.com was a wikipedia entry for climate change
  • Second result for Yahoo was its own News service section about climate change
  • Live and Yahoo's presentation is much busier than Google's

Exploring Blog Search Engines

Next, we tested and compared the three largest blog search tools: Technorati, Google Blog Search, and Bloglines. I had heard of Technorati and Bloglines before but never used them. Our search query was once again Climate Change. My observations:

  • Technorati's first result about climate change related to polar bears and the ice caps - I like this topic. Seems to rank results by most recently posted as opposed to most linked / most read (Google)
  • Google's first result is strictly about climate change - ranking results by relevancy and apparently by popularity. Note: includes "related blogs" that appear to be directly about the search
  • Bloglines first result is pretty random, happens to have climate change in the body. Results can be ranked by date, relevancy, or popularity.

Technorati's presentation was very nice and has more of a portal/news site look compared to Google Blog Search which is strictly a search-find site. I can now go to Technorati to find blog posts similar to how I can go to New York Times.com to find articles posted about sports. Bloglines has yet to really satisfy me with its results.

Exploring Other Search Engines

To further break the idea that Google's web search solves all problems, we explored some other types of search engines within and without of Google. Such as:

  • Google News returns results from newspapers, magazines, and news sites
  • Clipoid (powered by Google) returned primarily Youtube result. Why use Clipoid over Google Video? Useful for searching video clips.
  • Google Images returns images with climate change in the file-name.
  • Yahoo Directory returns lists of sites or companies that relate to a very specific set of information. Benefits is you can narrow down your search to a niche topic and get a set of results you know are 100% relevant

I've known Yahoo Directory existed for a while, but never understood its usefulness until now. It's helpful to have a tool that lets you go in without having a strong idea about what you are looking for.

Gerald Ford Exercise

For the final part of the exercise, we used a few different search engines to compare the results of searching for Gerald Ford

  • Yahoo Directory - Related categories are US Presidents and 20th Century History
  • Yahoo returns Gerald Ford's wikipedia page and mostly information about his presidency (16m results)
  • Google returns 2.3m results
  • Adding the -automotive + -cars syntax brings Yahoo's results down to 12.9m and Google's to 1.8M

Conclusion

This exercise was surprisingly interesting and very useful. As Professor Moore told us when the class first started, there is more to searching than typing "xxxxx" into Google and rinse/repeat if necessary. There are some really helpful search syntax techniques that are overlooked. Also, I had heard of but never used Technorati or Clipoid before so it was nice to try those out. Considering how much I learned in the class' first exercise (especially for someone of my technical background), I am excited what is in store for future classes.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License