index alternatives examples Add URL page
+ free PPC
Google toolbar worrying changes
links overheard ranking chris' rant the future Banners - back again - Free banner design
Site Submission Business Run your OwnPay Per Click Search Engine Your Own MetaSearch Engine faq newsgroup search Add search to your site Good site? How to tell
Nitromarketing Overture Kanoodle $5 free Excite The Bald Eagle effect  
workround for google toolbar autolink - free fix for autolink nightmare Directories to link to eyetools Get good links for your site Google doesn't list me Link Machine


FAQ from alt.internet.search-engines
Hope no one minds me posting the FAQ here.

 

What is a portal/directory? Portals/directories are "search engines" that require submission of our site to see it included in their database. They normally have quite a number of editors reviewing the sites before they will be accepted. Many of these are business to business or otherwise focussed on a certain topic. One of the most popular directories is http://www.yahoo.com. Many portals are linked with search engines that provide search results if the directories do not have any sites within a specific search term.

What is a search engine? A search engine is a huge database that is constantly updated by "spiders" - these are robots or automated programs that constantly crawl the www following links on home pages (web sites). These capture the text on webpages and, based on different algorithms, they output results when people search in them. Some very popular search engines are http://www.google.com, http://www.altavista.com and http://www.webcrawler.com. Many search engines cooperate with portals/directories for providing their users an alternative way of finding information instead of relying on search by keyword.

 

. What is cloaking? Cloaking is a technique used by some web sites to feed different content to search engine spiders (see above) and to human visitors. This may be employed to improve ranking for a site as the output to the search engines will usually be optimized, targeting their specific ranking algorithms. Another major use of cloaking is to protect web page code from being stolen by competitors. Finally, cloaking may be required to work around browser incompatibility issues, non-spiderable page code (e.g. graphics rich sites, splash pages, Flash, Java, JavaScript, etc.), dynamic page delivery, etc. Please notice that many search engines do _not_ approve of this practice while a few others encourage it. This is mainly so if cloaking is employed in a misleading ("spammy") way, e.g. by redirecting surfers to content they did not target when clicking the displayed search result URL.

What is search engine optimization? Search engine optimization or search engine positioning is the art and the science of constructing or organizing web pages in a way to help them achieve good rankings with the search engines. All search engines follow their own, proprietary ranking algorithms which are continuously tweaked and improved upon. These algorithms being treated as trade secrets, the search engines will obviously not divulge their details. This makes professional search engine optimization very similar to reverse engineering: some experts will run test pages and even whole test domains for the sole purpose of determining individual search engines' ranking behavior. This may involve questions like which engine values meta tags, titles, alt tags, link popularity, click-through frequency, etc. Hence, efficient optimization can turn into a very involved affair requiring lots of specialist knowledge, up-to-date information, statistical analysis, etc. The more competitive the WWW becomes, the harder it gets to achieve decent rankings in those areas where many sites are vying for attention.


Why is search engine ranking important? Surveys and studies have shown that surfers searching the engines for keywords or phrases will typically click through to those sites featured highest. Page one to page three rankings will make for appr. 90% of all search engine generated user traffic. What this boils down to is that your web site will not generate any traffic worth mentioning if it is featured lower than (typically) Top 30. So if you want your site to be known and to draw lots of visitors, a good ranking with the major search engines is crucial.

What keywords or phrases should I optimize my web site for? Regardless of whether you have a commercial or a non-profit or amateur web site: picking the keywords or search phrases for optimization of your site is crucial. A frequent mistake among webmasters is gauging the popularity of keywords biased by their own tunnel view of what people should be interested in. Luckily, many search engines (major and minor) offer real time search monitoring on special pages (so-called "voyeur" function or pages). There is also an abundance of real life search phrases databases (both free and commercial) available on the net. Finally, you can make use of special software which can help you automate the process. For a fairly extensive overview of real life keyword research resources see "Keyword Research" in the resources section below.

Will a search engine spider my frames page? They will if you link all your subpages from the text within the noframes tag. However - It will not index your frameset, but each single page. This means that users entering you site will most likely NOT load the frameset. You can use JavaScript to check that the frameset is loaded. However that presents 2 problems: 1. Most of them do not work very well. 2: The client side redirection might get your page banned from the search engine. It is recommended that you (concerned to SEO, not to pagedesign) do not employ frames. If you chose to do so, it is highly recommended that you have navigation within your framed page as well so the user can navigate without the frameset.

What is "robots.txt"? The Robots Exclusion Protocol is a method that allows you to tell visiting spiders what to index and what to leave alone. You can exclude a particular spider or all spiders (that follow the standard) from your entire site, from particular directories, or from particular files. - Should I create a robots.txt file? Only if you want crawlers to stay away from your site (or parts of it such as password restricted areas, graphics directories, etc.) - Can I leave the robots.txt blank? Yes, but that will cause some spiders to leave without indexing. - How should my robots.txt look like? Check here: http://info.webcrawler.com/mak/projects/robots/exclusion.html as this page features links to relevant sites. - Can I prevent indexing by other means than robots.txt? Yes, you can use: in your header. However, not all robots respect this.

How can I start my own search-engine? Robots (also known as spiders, wanderers, worms, crawlers and gatherers) follow links from one web page to another. They work with indexing code to store data for later searching. There is a good deal of free open source code available -- you don't have to start from scratch. You can find a long range of search engines in the programming language best suited for your needs at: http://www.searchtools.com/robots/robot-code.html _________________________________________________________________
II.10 Virtual Hosts / individual IP addresses It is a common problem that search engines will occasionally index one site and redirect to another. Usually this issue relates to problems with the HTTP/1.1 standard. The World-Wide Web Consortium strongly recommends that web servers use virtual hosts, so as not to waste additional IP addresses simply for Web hosting. This means that hundreds of domains can reside on the same ip address. The problem results from the fact that not all Search Engines honor the HTTP/1.1 standard which allows for this particular implementation, or, in rare instances, that the web hosting services have misconfigured their servers. Avi Rappoport have done research that shows that AltaVista, Excite, FAST, Google, Northern Light, Go (the engine formerly known as Infoseek), etc. do not have any problems with this at all. The only spiders that failed to send the proper headers were MOMspider/1.10 libwww-perl/0.40 and PerlMan Surf; additionally, similar problems with the French search engine Voila have been reported. How can you resolve this problem? Simply put, leave your current web hosting service if they fail to address the issue, or get an individual IP address. Contacting the search engines directly might also produce results. This problem should evaporate in 2001 at the very latest, as indivdual ip addresses are getting ever more rare and the Search Engines simply cannot avoid adapting to the standards of the World-Wide Web Consortium. _________________________________________________________________ _________________________________________________________________
II.11 What is this Dmoz/Open Directory Project (ODP) everyone rants about? Throughout last year there has been a lot of discussion about the Open Directory Project (dmoz.org) that delivers directory data to some of the major search engines. It is currently the largest index on the web, with more than 2 million unique sites and almost no dead links.
There are basicly two camps participating in this discussion:
1. People defending the ODP.
2. People that hate the ODP.

There are lots of problems under discussion - the following is intended as a short non partisan summary.
1. The directory is owned by AOL/Warner/Netscape, one of the major players in the internet market, but the directory is driven entirely by volunteers that do not get paid for their efforts. Some people consider this an "abuse" of volunteers for commercial purposes.
2. The Dmoz/ODP people are apparently constantly seeking new editors, but rejecting 90% of all applications. It is known that people who are is very qualified for a particular category (that does not have any editor yet) are being rejected with the same standard reply. Many people have experienced thei applications not getting any response at all.
3. Dmoz fires many editors without reasons given. This contention should be taken with a grain of salt as representing only one of two sides to this story. Editors have been known to abuse their privileges before and have been fired for this, while others seem to have been fired without apparent reason.
4. ODP is quite inaccessible. They don't publish any phone numbers, their address seems to be treated as a state secret and is virtually impossible to obtain. Some people consider this a bad thing, while others diasagree. It is a common occurrence that you do won't get any feedback from editors, explaining why your submissions have been turned down and what you could to to rectify the situation - even if you have been indexed in all the other major directories.
5. The Editors have incredible power. If they manage to get in charge of a category in which they themselve have a vested interest (typically a web site of their own), they can "cool" their site and change the descriptions of the other sites in that category and keep competitors from getting their page indexed at all. It is an established fact that this happens, but it is also well known that the Dmoz administration are working on preventing this. Needless to say, many people nurture hard feelings towards Dmoz. Either not for getting any feedback on their site submissions (e.g. an explanation for not being indexed) or their applications for editorship, or for getting the same boiler plate reply that all people get. On the other side their is the "pro-Dmoz" front consisting either of established editors or people who are simply in favour of the index.


III 1. Search engines.

Search Engines: -
http://www.altavista.com/
http://www.alltheweb.com/
http://www.directhit.com/
http://www.excite.com/
http://search.go.com/
http://www.google.com/
http://www.goto.com/
http://www.hotbot.com/
http://www.lycos.com/
http://www.northernlight.com/
http://raging.com/

Portals directories -
http://www.dmoz.com/
http://www.looksmart.com/
http://www.snap.com/
http://www.yahoo.com/

Submission URLs -
http://searchenginebase.com/sbsubmissions.html

You can find a comprehensive list of all major and many minor search engines at: http://www.searchenginebase.com/

 

III.2. Cloaking Tutorial + FAQ -
http://fantomaster.com/fafaqcloak1.html -
http://www.spiderhunter.com/

 

III.3. Keyword Research -
http://fantomaster.com/fasmbres03.html#voyeur

 

III.4. Meta Tags -
The Definitive Resource:
http://vancouver-webpages.com/META/

 

III.5. Search Engine Newsletters Newsletters Featuring Search Engine News (in alphabetical order) - Actu Moteurs (in French): http://www.abondance.com/ - Google Friends Newsletter: http://www.google.com/ - Pay Per Click Search Engines Update: http://PayPerClickSearchEngines.com - Search Engine Guide: http://www.searchengineguide.com/

III.6. Search Engine Optimization Newsletters Newsletters on Search Engine Optimization (in alphabetical order) - fantomNews: http://fantomaster.com/fantomnews.html - RankWrite: http://www.rankwrite.com/ - Search Engine News: http://www.searchengine-news.com - Search Engine Optimization and User Interface: http://www.cre8pc.com/seui.html - Search Engine Quarterly: http://www.searchengineworld.com/ - Search Engine Watch: http://www.searchenginewatch.com/ - The Spider Report: http://spider-food.net/

III.7. Discussion Forums (in alphabetical order) - AIM-Pro: http://www.aim-pro.com/cgi-bin/Ultimate.cgi - Market Position Talk: http://www.marketpositiontalk.com/forums/ - SearchEngineBase Forum: http://searchenginebase.com/discussions.html - SearchEngine Discussion Forum: http://searchenginediscussion.com/cgi-bin/ubb/Ultimate.cgi - SearchEngineForums: http://www.searchengineforums.com/ - SearchEngineMatrix Forum: http://www.searchenginematrix.com/ - SearchEngineWorld Forum: http://www.webmasterworld.com/index.cgi

III.8. General Tips + Tricks (in alphabetical order) - http://www.aim-pro.com/ - http://fantomaster.com/ - http://www.searchengineworld.com/ - http://spider-food.net/ - http://www.spiderhunter.com/

III.9. Search engine spider verification service - http://spiderscouts.com/

 

Current version and posting-frequency. The current version of this document can always be found at the following: WWW http://searchenginebase.com/aise-charter-faq.html http://search.mermaidconsulting.com/altinternetsearchengines.txt http://www.geocities.com/ranktips/faq.htm

USENET Posted three to four times per month to alt.internet.search-engines ___________________________________________________________


index alternatives examples Add URL page
+ free PPC
Google toolbar worrying changes
links overheard ranking chris' rant the future Banners - back again - Free banner design
Site Submission Business Run your OwnPay Per Click Search Engine Your Own MetaSearch Engine faq newsgroup search Add search to your site Good site? How to tell
Nitromarketing Overture Kanoodle $5 free Excite The Bald Eagle effect  
workround for google toolbar autolink - free fix for autolink nightmare Directories to link to eyetools Get good links for your site Google doesn't list me Link Machine