Posts tagged: Spiders

How to Block Search Engine Robots

Robots.txt is a text-based file used by many web sites for the purpose of giving specific instructions to search engine “robots” or “spiders”. The file typically tells them what pages or directories they shouldn’t index. It is usually located in the root (main) directory.

The robots.txt file may be created in a basic text editor like Notepad or Edit. Be sure to save it in pure, text-only format. cPanel’s “File Manager” or FTP Client software may be used to upload it. Each line is a separate instruction. Some sample instructions to include in robots.txt are as follows:

Disallow: /email/
Disallow: /contact.php
Disallow: /

The first example blocks the entire “email” directory (folder) from being accessed by search engine spiders, while the 2nd disallows them from indexing the “Contact” page. The 3rd example requests that they not index any files on the site. The initial slash refers to the directory the robots.txt file is in. Do not use full URLs.

Most robots.txt files begin with a line reading “User-agent: *”. The purpose of this is to tell ALL search engine spiders that it applies to them. If the asterisk were replaced with the name of one engine’s robot, instructions would only apply to it and others would ignore them. A 404 error is logged if a robot tries to access the file and it doesn’t exist.

If there is a specific type of document which search engine spiders should be prohibited from indexing (such as PDF, DOC, or RTF), consider putting all of these files in the same folder and adding a “Disallow” statement that specifies this directory.

The overall purpose of using a robots.txt file is usually to control which pages visitors enter the web site through, reduce access to certain pages by “spammers”, and/or limit the amount of bandwith (data transfer) being consumed by search engine spiders as they read from the site.

The robots META tag can be used for much of the same purpose as the robots.txt file, but it is not applicable to non-HTML pages like text files, PDFs, images, and so on. If a web site operator wants all files to be indexed by search engines, there is no real purpose in having a robots.txt file.

Here is an example of a robots.txt file that we have created:

User-agent: Slurp
crawl-delay: 20

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

The first “delay” in the robots .txt file is for Yahoo Slurp.  They were hitting our site pretty hard and was really slowing down the clients server.  While we do not recommend slowing down a robot that is coming to your site there is always exceptions to the rule.

How to Write META Tags Properly

META tags are incorporated into the HTML code of many webpages. The most common META tags are TITLE, DESCRIPTION, and KEYWORDS. They affect not only how a site appears in search listings, but how high or low a position it receives in results. Read on to learn how to write META tags properly and effectively…

TITLE TAG

Perhaps the most important META tag, the TITLE tag determines the clickable title of a search result, as well as the text which appears on the title bar of a web browser. Here are some tips on how to properly write this tag:

  • Use fewer than sixty five characters; otherwise, part of the title will be cut off by browsers and search engines.
  • Every page on the website needs to have a unique title.
  • Your primary keyword target should be at the beginning of the TITLE tag.
  • Place your company name at the end of the TITLE tag.

DESCRIPTION TAG

The 2nd most important META tag, DESCRIPTION determines what text appears below the title in search results (with some exceptions). People who visit a webpage will not see this tag, unless they look at the source code. Tips on how to use it properly:

  • Keep the description approximately 125-175 characters.
  • Don’t put specific data searchers are looking for in the description tag; they might only read it and not visit the site.
  • Use the description to market to the user and increase the click through to the website.
  • The description is not a ranking factor, but be sure to use your target keyword(s) and keyword phrases so that they will be bold in the search results.

META KEYWORD TAG

A less important META tag is KEYWORDS as many search engines do not use them as a ranking factor.  The META KEYWORD TAG contains one or more search keywords that relate to the web page in question.  Here are a few tips:

  • Avoid using more than fifteen keywords, and don’t write any word more than once.
  • Use words which aren’t in the TITLE or DESCRIPTION tags but that are in the copy of the page.
  • Avoid words that aren’t relevant to the page’s subject.  In fact, using keywords not relevant or included in your page can lead to ranking penalties.
  • To properly separate multiple keywords, use commas.

OTHER META TAGS

Some lesser-known META tags are not used by most search engines and browsers, making them minimally useful. However, one other fairly important META tag is ROBOTS. It gives specific instructions to search engine “robots” or “spiders”; automated computer programs which visit and index websites, recording information that will appear in search results.

You may not need to know HTML to write META tags properly. Programs like Frontpage and Dreamweaver allow users to set the META tags for a webpage design automatically. For example, the function for setting META tags in Frontpage is located in “Page properties”, under the “Custom” tab. The TITLE is set under the “General” tab.

SEO Friendly Directories

Getting your web site listed on directories can be good for search engine optimization (SEO), while increasing direct traffic at the same time. However, not all directories are SEO friendly. When you need to determine if a site can be used for this purpose, please refer to the following SEO Friendly Directory Checklist:

- Do the directory’s links go directly to web sites? Look at one of the categories and see if (point at a link while looking at the status bar) the links go to a redirection/tracking page or directly to the sites they refer to. Directories with direct links are SEO friendly, as search engines can easily identify these links and their destination.

- Does the directory have a Google PageRank level of at least one? You can check this by entering its URL in a web site like checkpr.org. Directories with no/zero PageRank are either too new/obscure to affect SEO, or have been penalized by Google for allowing too much “link spam”. Be careful not to use “FFA” link sites or directories full of miscategorized pages.

- Are web site names used as the titles (anchor text) for links? It is more SEO friendly when directories use the title (or other relevant words) in the text of their links, rather than using a URL, an image, or a generic phrase like “Click Here”. However, gaining links with less desirable anchor text still benefits SEO efforts, just not as much.

- Does the directory not require a reciprocal link? Some directories demand a reciprocal link in return for approval, while others make it optional or don’t ask for it. It is better to gain listings on directories that don’t require reciprocal linking; generally, one-way inbound links to your web site provide greater SEO benefits.

- Is it possible for search engine “spiders” (a.k.a. “robots”) to navigate/crawl the directory and find your link? The majority of directories are search spider friendly. However, if the site can only be navigated using animated/javascript menus (and there is no alternative method like a Site Map), this could be a problem.

There are far too many SEO friendly directories to list here, but they include Yahoo! Directory, URLdirectory.org, DMOZ.org, WebWorldIndex.com, and FreeWebsiteDirectory.com. It is especially important to be listed on DMOZ.org; not only does DMOZ/ODP have an impact upon SEO, but it also provides some of the results for many small search engines.

Segment Your Web Site for SEO

A major part of search engine optimization is making sure that all of the pages on your web site are easily accessible by both humans and search engines. Since each page on a web site can be indexed you can also optimize each page for searchers and search engines.

To optimize your site for search engines consider that they use search engine robots to find and index a site. These robots, also called spiders, continually look for content on the web that needs to be indexed. Once they find something these robots will follow the hyperlinks to each web page. This is called “crawling” a site. When the robot arrives on a page it reads through the content and adds it to the index. Because robots do this for every page and this is a way your site pages get added to search engine results it is important to have navigational site structure that is friendly to the robots.

Additionally the search engines will only rank pages that are perceived as important. That’s why it is necessary to create content hierarchy in your navigational structure – your most important pages should be at the top of your site structure. Whichever page is at the top, usually your home page, generally attracts the most links. Often search engine robots stop searching after 3 clicks from the homepage. That’s why it’s important to decide on a hierarchy for your site’s pages.

This leads to categorization. If you want to organize your content in a natural way you should create categories for all of your site’s content. Then link those categories to your homepage. This helps create more key phrases to link to which can help you attract a wider audience for searches.

Keep in mind that search engine robots only follow html links. This means that any links using Flash, JavaScript, dropdown menus or submit buttons don’t get picked up by them and therefore don’t get indexed. Besides that html links are a better choice because the anchor text can describe the destination page for human visitors to see.

Finally, create a sitemap. A sitemap is basically works like an index, listing links to all the pages on your web site. If you link a sitemap to your home page robots have easy access to all your web site’s pages. Be aware though that robots generally follow less than 100 links from one page. If you have more pages than that consider creating a multi-page sitemap.

How Social Bookmarking Affects SEO

Social bookmarking, including systems like Reddit and Digg, notably affects SEO (Search Engine Optimization) in a few different ways. Social bookmarking links can help determine which SEO keywords a web page shows up under, increase its Google PageRank, and/or make it appear in search results more quickly.

As a link receives additional votes in its favor, it will appear on more prominent pages of the social bookmarking service. This affects SEO more significantly, while producing an increase in direct traffic. Many web site operators have added links or icons to social bookmarking services at the beginning or end of each web page, thus encouraging their visitors to bookmark the pages.

When search engine “spiders” detect a link, its wording (”anchor text”) affects their identification of the content on the page it links to. Then the linked page becomes more likely to show up in search results for those words or phrases. This is why an inbound link with a vague anchor text phrase like “Amazing New Product” affects SEO less favorably than more descriptive links do.

For example, a social bookmarking link titled “New Operating System Released” would make the page it is linked to more likely to show up in search results for each of these four words. Thus, the more relevant and searchable the words in the link, the more useful it is for SEO purposes.

Most social bookmarking services are classified into categories or “tags”. Links on web pages with related content usually provide more SEO benefits, so the same is likely true for social bookmarking links which are grouped with other links on similar topics. Thus it is probably best to choose “tags” which are relevant, but not obscure.

A difference between social bookmarking and using a search engine is that people generally use search engines to find specific things they are looking for, while a social bookmarking system is more often used to explore new information they might be interested in. It is somewhat like the difference between advertising in the yellow pages and posting an advertisement on a laundromat’s bulletin board.

Basically, social bookmarking affects SEO in the same way that inbound links on other web sites do, but is different in that anyone can post or vote on these links. Their position is also more subject to change than most other links, and new content is generally favored over older material.

Launching a New SEO-Friendly Web Site

Launching a new web site with SEO (Search Engine Optimization) in mind is less time consuming than waiting until it has been completed to begin SEO efforts. Putting an emphasis on this from the beginning will prevent the web site owner or designer from wasting time on the creation of pages which aren’t search engine friendly. Here are some tips on launching a new site with SEO in mind.

1. Web site owners who create their own sites should learn about META tags and how to achieve the best keyword density. It is also helpful to understand how search engine “spiders” (or “robots”) work, including the types of content they can and cannot read.

2. If you have someone else design your new web site, find a web designer with SEO experience who is willing to optimize the site in the process of designing it. Some designers have little understanding of SEO and create sites which are very unfavorable in this manner.

3. When initially launching and promoting the web site, be careful not to use advertising methods which are frowned upon by search engines. Such methods include posting to FFA link pages or buying links without the “nofollow” attribute/tag.

4. If the new website needs to have text-based content produced for it before launching, it is best if a writer who is skilled in SEO creates this material. This type of content may include informational articles, blog entries, press releases, or product descriptions.

5. Although it may seem more visually appealing or easier to create, designers should avoid putting paragraphs of text inside images. This is detrimental to SEO in most situations, because search engine spiders can’t read this type of text. Navigation systems should also be search engine friendly, and it is a good idea to create a site map before launching the new website.

6. Some types of SEO oriented promotional techniques can draw attention to a new web site as it is launched, while also improving its search engine ranking. These techniques include submitting the site to article directories, posting it on social bookmarking systems, or using it in a forum signature.

Keeping SEO in mind while launching a new website provides some incidental benefits as well. Sites which are search engine friendly are often less difficult to access for people who have less common web browsers or are visually impaired. Sites designed with SEO in mind also tend to have easier navigation and are better-organized.

5 Big SEO Myths

As with any complex subject or issue, there are some big myths about Search Engine Optimization (SEO) which should be dispelled. Read on to learn about five of these myths.

1. Some people claim that the purpose of SEO is to “manipulate” search engines. Actually, SEO helps make it possible for search engine software to identify sites as being relevant to the keyword someone is searching for. For a web site, it is sort of like putting your resume on high-quality paper and using a laser printer to make it more attractive for an employer.

2. Others state that just having high-quality content and writing “naturally” will usually produce a good keyword density, traffic, and high rankings. Realistically, the “spiders” which index pages cannot judge the quality of web sites, and there is a lot of high-quality content on the web that hardly anyone reads. Creating content which is of good quality is important, but does not eliminate a site’s need for SEO.

3. Another one of the myths within SEO is that only getting listed on the major search engines and directories is important. In reality, getting a link to your web site on a little-known site with a link directory about the same topic can bring it a substantial number of hits. It is also worth spending a bit of time to get listed on smaller specialized engines related to your site’s topic.

4. One of the more general myths is that the goal of SEO is purely to obtain higher search engine rankings. It is true that this is a big part of SEO’s purpose. However, high rankings are worthless if people don’t click on them, or quickly leave the page these results link to. SEO work must take into account the readability of pages, as well as the attractiveness of title and description tags. If this page had the keyword MYTHS in it twelve times, it would have a five-percent keyword density, but be less desirable to read.

5. Finally, yet another of the myths is that paying for links is always harmful (with regard to SEO). There are a few reasons why this is incorrect. First, Google appears not to penalize paid listings in (at least some) directories which thoroughly review sites before listing them; its “Webmaster Guidelines” page encourages webmasters to submit their sites to the Yahoo! directory. Also, if the “nofollow” attribute is used in a link, it won’t enhance or worsen these rankings.

How Long Should an SEO Campaign Take?

One cannot expect to complete any effective SEO campaign in a matter of days, but the length individual campaigns should take varies depending upon the type and size of the web site in question. There is also a difference between the time it will take to complete the SEO work and how long it takes for its benefits to materialize.

A sales-based web site with a single promotional page to be advertised is an example of a site for which a relatively short campaign might be suitable. If there are many pages which have not been optimized for search engines and are targeted to new users, the campaign is likely to take longer. Web sites which are based upon earning ad revenue and continuously create new content may never cease their SEO efforts. For example, it could be said that a web site which keyword-optimizes each new article and trys to gain links to it on other sites has an ongoing SEO campaign.

After an SEO campaign is carried out, it should normally take several weeks before these efforts start to provide a benefit. Directories will have to approve links, and search engines have to find new links to the web site and re-spider its newly keyword-optimized pages. It can be preferable to have the SEO campaign extend beyond this time period so that the strategies which have not produced better rankings can be improved upon or replaced.

When a successful SEO campaign is completed, it is effective for a long period of time compared to most other promotional methods, but does not permanently remain as effective as it once was. After years have passed, some inbound links to a web site will likely disappear as the sites they are located on go out of business or change ownership, and search engines will have partially altered how they determine search result rankings.

Basically, the amount of time an SEO campaign should take depends upon the size and type of the web site, whether or not the site is to expand the content it offers, the skill and methods of those carrying out the campaign, and any previous SEO campaigns or promotional efforts which have been applied to it. There is an additional, relatively long period of time waiting for such efforts to take effect, after which some measures may need to be taken based upon the results.