| What is a portal/directory? Portals/directories are "search engines"
that require submission of our site to see it included in their database. They
normally have quite a number of editors reviewing the sites before they will be
accepted. Many of these are business to business or otherwise focussed on a certain
topic. One of the most popular directories is http://www.yahoo.com. Many portals
are linked with search engines that provide search results if the directories
do not have any sites within a specific search term. What is a search
engine? A search engine is a huge database that is constantly updated by "spiders"
- these are robots or automated programs that constantly crawl the www following
links on home pages (web sites). These capture the text on webpages and, based
on different algorithms, they output results when people search in them. Some
very popular search engines are http://www.google.com, http://www.altavista.com
and http://www.webcrawler.com. Many search engines cooperate with portals/directories
for providing their users an alternative way of finding information instead of
relying on search by keyword. . What is cloaking? Cloaking is a
technique used by some web sites to feed different content to search engine spiders
(see above) and to human visitors. This may be employed to improve ranking for
a site as the output to the search engines will usually be optimized, targeting
their specific ranking algorithms. Another major use of cloaking is to protect
web page code from being stolen by competitors. Finally, cloaking may be required
to work around browser incompatibility issues, non-spiderable page code (e.g.
graphics rich sites, splash pages, Flash, Java, JavaScript, etc.), dynamic page
delivery, etc. Please notice that many search engines do _not_ approve of this
practice while a few others encourage it. This is mainly so if cloaking is employed
in a misleading ("spammy") way, e.g. by redirecting surfers to content they did
not target when clicking the displayed search result URL. What is search
engine optimization? Search engine optimization or search engine positioning is
the art and the science of constructing or organizing web pages in a way to help
them achieve good rankings with the search engines. All search engines follow
their own, proprietary ranking algorithms which are continuously tweaked and improved
upon. These algorithms being treated as trade secrets, the search engines will
obviously not divulge their details. This makes professional search engine optimization
very similar to reverse engineering: some experts will run test pages and even
whole test domains for the sole purpose of determining individual search engines'
ranking behavior. This may involve questions like which engine values meta tags,
titles, alt tags, link popularity, click-through frequency, etc. Hence, efficient
optimization can turn into a very involved affair requiring lots of specialist
knowledge, up-to-date information, statistical analysis, etc. The more competitive
the WWW becomes, the harder it gets to achieve decent rankings in those areas
where many sites are vying for attention. Why is search engine
ranking important? Surveys and studies have shown that surfers searching the engines
for keywords or phrases will typically click through to those sites featured highest.
Page one to page three rankings will make for appr. 90% of all search engine generated
user traffic. What this boils down to is that your web site will not generate
any traffic worth mentioning if it is featured lower than (typically) Top 30.
So if you want your site to be known and to draw lots of visitors, a good ranking
with the major search engines is crucial. What keywords or phrases should
I optimize my web site for? Regardless of whether you have a commercial or a non-profit
or amateur web site: picking the keywords or search phrases for optimization of
your site is crucial. A frequent mistake among webmasters is gauging the popularity
of keywords biased by their own tunnel view of what people should be interested
in. Luckily, many search engines (major and minor) offer real time search monitoring
on special pages (so-called "voyeur" function or pages). There is also an abundance
of real life search phrases databases (both free and commercial) available on
the net. Finally, you can make use of special software which can help you automate
the process. For a fairly extensive overview of real life keyword research resources
see "Keyword Research" in the resources section below. Will a search
engine spider my frames page? They will if you link all your subpages from the
text within the noframes tag. However - It will not index your frameset, but each
single page. This means that users entering you site will most likely NOT load
the frameset. You can use JavaScript to check that the frameset is loaded. However
that presents 2 problems: 1. Most of them do not work very well. 2: The client
side redirection might get your page banned from the search engine. It is recommended
that you (concerned to SEO, not to pagedesign) do not employ frames. If you chose
to do so, it is highly recommended that you have navigation within your framed
page as well so the user can navigate without the frameset. What is
"robots.txt"? The Robots Exclusion Protocol is a method that allows you to tell
visiting spiders what to index and what to leave alone. You can exclude a particular
spider or all spiders (that follow the standard) from your entire site, from particular
directories, or from particular files. - Should I create a robots.txt file? Only
if you want crawlers to stay away from your site (or parts of it such as password
restricted areas, graphics directories, etc.) - Can I leave the robots.txt blank?
Yes, but that will cause some spiders to leave without indexing. - How should
my robots.txt look like? Check here: http://info.webcrawler.com/mak/projects/robots/exclusion.html
as this page features links to relevant sites. - Can I prevent indexing by other
means than robots.txt? Yes, you can use: in your header. However, not all robots
respect this. How can I start my own search-engine? Robots (also known
as spiders, wanderers, worms, crawlers and gatherers) follow links from one web
page to another. They work with indexing code to store data for later searching.
There is a good deal of free open source code available -- you don't have to start
from scratch. You can find a long range of search engines in the programming language
best suited for your needs at: http://www.searchtools.com/robots/robot-code.html
_________________________________________________________________ II.10 Virtual
Hosts / individual IP addresses It is a common problem that search engines will
occasionally index one site and redirect to another. Usually this issue relates
to problems with the HTTP/1.1 standard. The World-Wide Web Consortium strongly
recommends that web servers use virtual hosts, so as not to waste additional IP
addresses simply for Web hosting. This means that hundreds of domains can reside
on the same ip address. The problem results from the fact that not all Search
Engines honor the HTTP/1.1 standard which allows for this particular implementation,
or, in rare instances, that the web hosting services have misconfigured their
servers. Avi Rappoport have done research that shows that AltaVista, Excite, FAST,
Google, Northern Light, Go (the engine formerly known as Infoseek), etc. do not
have any problems with this at all. The only spiders that failed to send the proper
headers were MOMspider/1.10 libwww-perl/0.40 and PerlMan Surf; additionally, similar
problems with the French search engine Voila have been reported. How can you resolve
this problem? Simply put, leave your current web hosting service if they fail
to address the issue, or get an individual IP address. Contacting the search engines
directly might also produce results. This problem should evaporate in 2001 at
the very latest, as indivdual ip addresses are getting ever more rare and the
Search Engines simply cannot avoid adapting to the standards of the World-Wide
Web Consortium. _________________________________________________________________
_________________________________________________________________ II.11 What
is this Dmoz/Open Directory Project (ODP) everyone rants about? Throughout last
year there has been a lot of discussion about the Open Directory Project (dmoz.org)
that delivers directory data to some of the major search engines. It is currently
the largest index on the web, with more than 2 million unique sites and almost
no dead links. There are basicly two camps participating in this discussion:
1. People defending the ODP. 2. People that hate the ODP. There
are lots of problems under discussion - the following is intended as a short non
partisan summary. 1. The directory is owned by AOL/Warner/Netscape, one of
the major players in the internet market, but the directory is driven entirely
by volunteers that do not get paid for their efforts. Some people consider this
an "abuse" of volunteers for commercial purposes. 2. The Dmoz/ODP people are
apparently constantly seeking new editors, but rejecting 90% of all applications.
It is known that people who are is very qualified for a particular category (that
does not have any editor yet) are being rejected with the same standard reply.
Many people have experienced thei applications not getting any response at all.
3. Dmoz fires many editors without reasons given. This contention should
be taken with a grain of salt as representing only one of two sides to this story.
Editors have been known to abuse their privileges before and have been fired for
this, while others seem to have been fired without apparent reason. 4. ODP
is quite inaccessible. They don't publish any phone numbers, their address seems
to be treated as a state secret and is virtually impossible to obtain. Some people
consider this a bad thing, while others diasagree. It is a common occurrence that
you do won't get any feedback from editors, explaining why your submissions have
been turned down and what you could to to rectify the situation - even if you
have been indexed in all the other major directories. 5. The Editors have
incredible power. If they manage to get in charge of a category in which they
themselve have a vested interest (typically a web site of their own), they can
"cool" their site and change the descriptions of the other sites in that category
and keep competitors from getting their page indexed at all. It is an established
fact that this happens, but it is also well known that the Dmoz administration
are working on preventing this. Needless to say, many people nurture hard feelings
towards Dmoz. Either not for getting any feedback on their site submissions (e.g.
an explanation for not being indexed) or their applications for editorship, or
for getting the same boiler plate reply that all people get. On the other side
their is the "pro-Dmoz" front consisting either of established editors or people
who are simply in favour of the index. |