Death Before Birth - The Life Cycle of a Search Engine
by Aaron Wall
"Nothing last forever but the Earth and sky." - from Dust in the Wind by Kansas.
In the past decade search has went from not existing, to general file matching, to machines so complex that few people could fully understand how they work. As they evolve in complexity it is certain there will always be a new and better way restructure the worlds largest data set. As old horses such as AltaVista limp toward extinction, now at 5, Google is a seasoned veteran in the game.
Google has spent the last year evolving from a search engine to a giant media corporation. At birth Google was worried about only one thing, search - and that focus is why it became so successful. As Google spreads out many are wondering, are they doing it too fast? Are they letting quality slip? As any company evolves it will make mistakes, but has Google lost sight of its goals?
Google gained large distribution when Yahoo switched to it to drive down Inktomi stock price, only later to buy Inktomi at $1.65 a share. Google then came to power stealing traffic from Yahoo by providing clean, relevant results, and good search tools for surfers and webmasters alike. Many estimate that Google controls in excess of 75% of the search market. The problem is, the ideals that gave them this power appear to be fading.
Microsoft is still working on its search engine. Yahoo recently acquired Overture and is working behind closed doors in much the same way as MSN is. LookSmart has not updated Wise Nut in a long time, is hated by a large portion of the internet community, and is soon set to loose most of its distribution. And Ask Jeeves (owner of Teoma) has its top results powered by Google AdWords. With Google powering Yahoo, AOL and many other sites Google lacks a clear competitor today.
Right now Google can take its huge lead and extend it, or let it slip. No system is perfect and there will always be complaints, but I have to wonder if Google has forgotten why or how it became such an icon.
Google grew to popularity by organizing the web based on links. They used PageRank to perform an empirical analysis of web link structure. Many people have reported their Google Toolbar has been failing to return PageRank 90% of the time. While the hysteria around PageRank is somewhat overrated, it would be more assuring if the feature worked often. In fact, it is not just the toolbar which is broken.
As a search engine has its distribution grow, there are more and more people who desire to take their free money from it. Top Google listings may be worth thousands of dollars for some phrases.Over a year ago articles such as Google Degrades, Geek's Aghast appeared in top web magazines such as Wired. With few alternatives available many people get frustrated to see 404 error pages at the top of search results.
Spammers create link networks to manipulate PageRank. The one biggest flaw with PageRank is that a link counts as a vote. A link is not always a vote.
Some types of information are generally link heavy. Weblogs, for example, generally consist of a short entry and a few links. This small entries are frequently created by people with a unique spin on the world. After a few collogues link in, suddenly these ideas can be misrepresented as worldwide views. Natural blogging is not the only idea degrading search results though.
Some blogs have software which leave inline comments (such as Movable Type). People can thus comment from a popular blog and parse PageRank through to their site. I am a longtime reader of Steven Berlin Johnson. He recently celebrated his "1 year blogiversery". He linked back to his original post and spammers responded kindly. Again his popularity has earned him more visits and entries by the dubious Lolita and Mr Viagra. Software such as MT blacklist aim to stop this abuse, but the widespread abuse is only a sign of the weakness of the search engines.
While dealing with this massive abuse, Google must find ways to pay for the software, hardware, and engineers to power around 300,000,000 searches each and every day. With a distribution network that large, changes of any kind are not taken lightly.
Earlier search engines used metrics that ignored technological evolution and financial responsibility. They did not care if they lost money. While the financials and technology have improved, one of the biggest problems search engines face today is a lack of quality content on the web.
Earlier this year Google introduced a program called AdSense which displays its pay per click AdWords ads on many mid sized web sites. AdSense was designed to help pay to produce better content sites (and thus, better search results). While still in infancy, the AdSense program has made many flaws.
Soon after Google introduced AdSense they included a related searches link set underneath the ads which made webmasters angry. This technique was siphoning off traffic from websites back to Google with no payment of any kind. Quickly Google had to repeal this move.
Google has also signed its AdSense members to a gagging clause. Beyond that gagging clause many have complained about getting kicked out for reasons they did not know, and could not even challenge. Then for these same members to see how much money Google had owed them up to that point they had to agree to another set of terms that prevented them from criticizing the AdSense program. But the ads get worse.
Google was in a race with Overture to be the first to provide broad matching on its search terms. Google got there first. The idea behind broad matching is that it will allow Google to sell more of its ad space by providing ads on similar terms that were not yet sold. Overture allows different bid prices on different levels of matching. Google sets a single price on the ads, and this causes a huge problem for those who do not know how to use the system.
First the broad matching ads are less relevant - which is in the exact opposite direction of Google's roots. This new type of broad matching matches many remote search phrases to those paid for by the advertiser. Couple the near matching idea with the fact that AdSense sites and pages are scanned for relevancy (which is frequently inaccurate) and a big problem starts to come about with how to provide relevant ads on the syndicated results.
While many of the SEO experts know how to use negative keywords, tracking, and other advanced features, the smaller advertisers do not always have the resources or understanding to effectively use this new, more complex medium.
Now instead of rewarding businesses for hunting out the phrases that exactly match them (and thus providing higher quality search results), Google is rewarding the largest companies by allowing them to be lazy. Google is shooting itself in the foot by degrading relevancy for short term profits. In the long run, less advertisers will eventually bring in less revenue.
Some large companies bid on generic cover all terms at prices which loose money on every transaction until the competition goes under. Not only do these name brands enjoy higher click through rates (due to brand recognition), but those with stockpiles of cash can afford to burn through thousands without a blink. Many small sites can not, thus the sad state of internet media is that it is now consolidating much the same way as offline media is.
It gets even worse for the small website now though. These same ads which they are using may now appear on pages that sometimes do not even remotely fit the ads. I was looking at the GMT clock time zones to ensure my clock was set at the right time today (so I missed daylight saving time by a few days). The page which had the different time zones listed a few US cities and a group of AdWords. Most of these AdWords were targeting Las Vegas (most likely the most expensive US city). Las Vegas was not even one of the cities mentioned on the list. Not a relevant ad set. Bad for all parties involved.
What is the result of this change to AdWords? Lower quality ads at a higher price. Nick Denton predicted that the expansion of these ads across the web (especially coupled with decreased relevancy) that users will start to ignore them. Much the same way as banners have faded, only a few years may be left before this advertising medium chips away and destroys itself.
What about the regular Google search results? At least they are strong, right? Sometimes they are rather week. Some clients have had search results dominated by the same sets of interlinking sites. Aiming to fight spam, Google is acknowledging this fact by incorporating a new major algorithmic change on the web live. My site went from #17 to #7 to not in the top 1000 websites for "search engine marketing". While I still have customers that need work done on their sites, this sporadic re mixing is not refreshing in my mind, as well as in the minds of many of my customers.
I have already had concerned emails arrive from friends worried about loosing thousands of dollars a month as their top listing evaporated. All I can tell them is wait and see.
Much of what people have feared would happen to Google after it goes public has already happened. As they are just about to go to their IPO it is clear that today Google is not as strong as it once was. Can any company employ enough people to provide long term quality results?
All of these problems exist, and Google has yet to go public. Shareholders are frequently short sighted. Balancing profits, public interest, and the distribution of the world's largest unorganized data set is not an easy task. As the search market consolidates, and spammers and search engines continue the cat and mouse game, it is clear that competitive open source alternative search engines such as Nutch are not desired so much as required.
While verifying no article sharing this name this was in existance I got to see an AdWords ad for exersize bikes by bigfitness.com
| DISCLAIMER: The content provided in this article is not warranted or guaranteed by Developer Shed, Inc. The content provided is intended for entertainment and/or educational purposes in order to introduce to the reader key ideas, concepts, and/or product reviews. As such it is incumbent upon the reader to employ real-world tactics for security and implementation of best practices. We are not liable for any negative consequences that may result from implementing any information covered in our articles or tutorials. If this is a hardware review, it is not recommended to open and/or modify your hardware. |
More Search Engine Tricks Articles
More By Developer Shed