Pluralsight blog Where devs, IT admins & creative pros go for news, tips, videos and more.
3,500+ tech & creative courses authored by experts - unlimited & online Get it now →
December 3, 2013

SEO essentials for web developers: Website structure

By shutterstock_135856706_feat

And we’re back! In my last post, we walked through the touchy subject of why knowing SEO is important for web developers and how spending some time digging in to SEO can save you time, earn you dollars and keep you sane.

This time, I want to take on some of the specific, basic knowledge surrounding website structure that web developers should make a part of their repertoire. Along the way, I’ll point you to some resources you can check out to learn more about the topic at hand.

Building websites google can crawl

This all starts with the coding and format of the site. These tips revolve around the basic structure and code of the website.

Primary domains, sub-domains and subfolders/subdirectories

When laying out websites, there are important choices and distinctions to be made.

The first is whether or not to use the “www.” in the site’s indexed URLs. You can specify which version of the site you want Google to index using Google Webmaster Tools, but it’s important that the live URLs reflect this and that your preferred version be the one that resolves.

The second may be whether you want to use sub-domains or subfolders to handle the different requirements of the website’s content. The two can be used to separate content by language, geography (be sure to indicate the intended geography in Google webmaster tools) or any other categorical separation. You can reference this helpful guide, but in general:

  • Unless the site is geographically spread across multiple countries, uses a franchise structure or runs many different product lines, subfolders/directories are easier to manage. Subdomains do allow for completely separate websites, logins and administration, though this may not be desirable.
  • Even in cases of businesses who operate in multiple countries or have multiple languages, subfolders/directories can be the optimal solution. You can specify the intended geographies of folders/director4ies in Google Webmaster Tools, allowing you to avoid problems with duplicate content.
  • While subfolders/directories always share linking value and authority with the primary domain, subdomains may not, making it much harder to rank them when this is the case.

AJAX, JavaScript, Flash and parallax

Flash websites cannot be read or understood by search engines. Because they are image-based, Google cannot read, crawl or understand the content of a Flash website in the same way as HTML. Generally, avoid the use of Flash whenever possible.

JavaScript should be used very carefully, but there are ways to make many JavaScript elements SEO-friendly. As outlined in this handy list of JavaScript Do’s and Don’ts:

  • Keep JS simple
  • Put heavy JS sections in a separate file from the page
  • Replace JS menus with CSS menus whenever possible
  • Turn off JS when testing pages for a better understanding of what search engines (and not just users) will see.

Though search engines are getting better at understanding and indexing content that utilizes AJAX/Javascript, they still struggle. If you need more detail, here’s a great guide to implementing AJAX well.

  • Search engines will not distinguish hashtagged URLs (ie: #about or #products) as separate URLs. Each unique page of unique content should have its own static URL accessible from the AJAX pages.
  • User detection can help ensure you display the right version of the content for the user, dependent on their device, sharing method and so on.

Parallax websites create unique challenges for SEO, as they are essentially one very long page containing all of the content.

Websites perform best when the title tags, meta descriptions and other on-page elements can be optimized on a page-by-page basis for their keyword targets. When just one page is present, the relevance of that page’s title tags and content will be severely reduced, making it very hard to compete in competitive niches.

In addition, parallax does not create a very good experience on mobile devices, creating the need for a separate, mobile-friendly website. Lastly, parallax also makes it very difficult to measure engagement on the page and optimize for better conversions.

Redirects

This is one is short and sweet. Whenever redirecting content permanently, use a 301. A 302 redirect will not pass along the link value or authority of the old page, and because this is seen as a temporary redirect by Google, the new URL may not be indexed and the old URL may still be live.

In addition, never kill a URL on a website without first checking to see if there are external links pointed at that page. When a URL dies, the value of the links pointed there die too; 301 redirects can help preserve rankings, especially across website redesigns or when a new CMS is put into place.

SEO-friendly URLs

The URL structure of a website can impact the way a site is crawled and indexed. There’s a fantastic cheat sheet here, but in general:

  • URLs should contain targeted keyword phrases as this influences rankings, but be kept reasonably short.
  • Even slight changes to URLs, such as mixed-cases (ie: example.com/about vs. example.com/About) will be treated as completely separate URLs by a search engine and could cause duplicate content issues.
  • Don’t use unfriendly syntax in URLs. For example, while search engines can understand a hyphen as a separator of words in a URL, an underscore will be treated as a conjoiner and part of the word. Another important note is that search engines ignore everything after a pound sign(#) in a URL. A more complete and brilliant guide to unfriendly syntax and search engine interpretations can be found here.

Duplicate content and canonical tags

As previously mentioned, Google will treat any differentiation in a URL as a unique page, even when that URL points to an identical page of content.

Example.com, www.example.com, example.com/home and example.com/index.html might all point to the same URL, but they are duplicates in the eyes of search engines.

This can create all kinds of problems with “duplicate content” especially on eCommerce platforms or other websites with dynamically generated URLs.

As a rule, Google doesn’t like duplicate content and will choose which one of the URLs to index, and which to ignore. This may not be ideal if the preferred URL is not chosen. In addition, links pointed to the differing URLs may not be amalgamated, meaning your client will lose linking value — and thus have a harder time ranking. Moreover, when duplicate content runs rampant on a website, Google may take this as a negative quality signal.

To get around duplicate content, enforce a single version of live URLs whenever possible using 301 redirects. When this is not possible, such as in the case of e-commerce CMS platforms, you will need to use the “rel=canonical” tag.

The “rel=canonical” tag allows you to tell search engines which URL is the preferred original. Unlike a 301 redirect, this does not necessarily guarantee that link value will be amalgamated from one URL to another, but it WILL ensure that Google will index the correct URLs of your website and solve your duplicate content issues. Note that rel=canonical tags can be self-referential in cases where only one physical page exists but can be accessed via multiple URLs.

Mobile websites: Responsive or specific HTML

When designing a site for mobile, Google has come out in support of responsive design whenever possible, and device-specific HTML where responsive is not an option. Content should also be tailored to mobile devices. Because mobile traffic now constitutes in excess of 2.1 billion searchers a year, businesses simply cannot afford to ignore device-specific experiences — and thus, neither can developers at the risk of having the client come knocking months after the site launch.

HTML and XML sitemaps

To close out, let’s quickly touch on HTML and XML sitemaps, both of which are essential. HTML sitemaps should be coded with the site’s primary pages and stylized for users; they should be kept current, especially when new pages are added.

XML sitemaps, on the other hand, play a far more important role for SEO. You can specify and submit a sitemap to Google using Google Webmaster Tools, which their crawlers will then use extensively for understanding the site’s architecture.

What many do not know, however, is that in the case of more complex sites or sites with many, many pages, it is possible to create a “sitemap of sitemaps”, or an XML sitemap tree. This is especially useful when there are distinct categories or expansive sections of a large website, and can drastically increase the number of pages crawled as it eliminates the onus on the search engine to sort through one enormous sitemap.

Wrapping up

That’s it for this time! In my next post in the series, I’ll delve into the on-page elements as well as some of the newer, more interested schema.org markups that every web developer should know.


About the Author

lead an agency-side SEO team for over 4 years before going rogue and starting Business Casual Copywriting, where he helps brands create content that gets into people’s heads. You can follow him on Twitter at @JoelKlettke.

Author's Website: http://businesscasualcopywriting.com


Discussion