How to Perform a Technical SEO Audit

Dino Kukic

Nov 3, 2017

What most of us probably got used to when talking about Search Engine Optimization is that we should find some relevant keywords, take a look at the search volume, check the competition and write the content which will address them and everything is going to be alright.

However, this is only one and pretty much a shallow part of the story when it comes to website optimization. For something more you got to roll up your sleeves and pick up couple of more skills.

Preparation for an audit

Of course, primarily a cup of coffee and a glass of water. Prior to even starting with the audit we need to learn how to look at the website with the eye of the search engine – this means disable cache, switching your user agent to “GoogleBot” and turn off the JavaScript. In the last couple of years search engines are getting much better in reading and discovering content and links generated by JavaScript, so there is probably nothing to worry about if something becomes non-clickable, but there might be a doubt. JavaScript can stay on for a mild version of analysis.

What you need to do next is crawl the website. Some of the tools I’d like to recommend are A1 Website Analyzer, Screaming Frog, Netpeak Spider or even Xenu which is probably older than the term SEO, but can be used in different ways to fit the purpose. If you are working with a fairly large website, you might want to do your crawl from the cloud for which you could use DeepCrawl. The idea is to get a list of URLs which exist on the website, their HTTP status codes and other elements specific to individual pages.

Website Accessibility Analysis

Robots.txt

By the default content of this file, there shouldn’t be any problems, however, if at any point someone thought it’s a good idea to play around with it, there might be a chance that access to a directory could be forbidden. This means that for some pages their chances of getting to the SERP is disabled. Of course, if you wouldn’t like to have some of your pages ranking you should forbid crawler access to them, but a better way to do it comes next.

Meta Directives

What I called meta directive here is placed in the <head> part of HTML code among meta tags and specifically I mean meta robots:

<meta name=”robots” content=”index, follow”>

The important thing here is in the content part and what can be there is “index, noindex, follow and nofollow”, combination of one out of first two and one out of second two. The explanation:

Index – says “index this page i want it in search results”
Noindex – says exactly the oposite – “don’t index this page”
Follow – says “follow all the links that are on this page”
Nofollow – of course, opposite of Follow – “don’t follow any of the links on this page”

Meta robots are also the best way to exclude something from the search results. If you do it through “disallow” command in robots.txt the page can still be indexed, but with the following information:

description for this page is not available due to robots.txt.

If you disallow crawler to access a certain page through the robots.txt file, the search engine cannot even see that the page shouldn’t be indexed.

HTTP Status codes

Briefly, HTTP status codes are codes which server sends as a response to client’s request. Every state/response has its code and some of them will be covered in the following part of this text.

2xx — means OK, which further means that the page is found and everything looks good.

3xx — In something that sounds like a SEO Utopia, the website would be perfectly planned and we wouldn’t have to use redirections in order to make things right. However, in real life this is practically impossible, but what we can do is use redirections that will pass the value from one page to another, and that’s 301. In the last year or two there were rumors that all types of redirections i.e. also 302 and 307 now pass link value, but this turned out as not entirely correct in multiple instances.

Additionally, a common mistake is that some pages are being redirected while not paying attention whether there is already a page redirected to that one. This creates chain redirection which can cause many problems. A common argument for not solving any chain reaction is the Matt Cutts video where he talks about the fact that GoogleBot follows up to 4-5 pages in a chain redirection, but it is an undeniable fact that even 301s don’t pass 100% of link value, but between 90% and 99% so I’ll leave you to calculate where it leaves you with 4 redirections.

4xx — Usually you will read on forums and Google blog that pages with 404 response code are harmless to the website’s ability to position itself in SERP, but in some way that’s not entirely true. In order to see a problem you need to look at it from another angle. All the links that lead to a page that doesn’t exist anymore are becoming practically worthless. Therefore, for a better use of those you need to do a permanent redirection (301) on a page that is the most relevant to the deleted one. Pages that return 404 are best found in Google Search Console property for that domain.

5xx — Response codes which start with 5 means that there is a problem on the server. Even though the problem is not directly tied to SEO, if you website do not exist you’re hardly going to position it on any search engine.

Website archtecture

This aspect is unfortunately often neglected and what’s the worst is that people think that URL structure is in fact an website structure indicator. A giant NO!

The best way to imagine the architecture of your website is to start by clicking to the links on your page and thinking how many clicks you need to get to a certain page. Also, it is important that the pages are mutually connected, because the authority of the page drops with the amount of clicks required to get to that page. It is natural that majority of links will come to the homepage and is thus required to make sure the value of those links is passed to other pages – the less clicks needed to get to a page the value passed is greater. Of course this is not all when it comes to links and, for instance, some pages can get more links than the other (for example, article compared to a category) and therefore the value can be passed further with internal linking.

WordPress websites architecture

On the other hand, for websites that use WordPress, there are a lot of ways to achieve this and only some of them are use of tags and so called breadcrumbs. Tags connect articles that are not necessarily in the same category and by that create a mutual connection of the pages. Additionally, people often put a widget on their blogs which lists the most used tags and, in this way, raise the value of tag pages as well.

Have in mind that too many tags eventually leads to counter effect. Each tag represents a new page and since they don’t have any extra value besides grouping other pages, after a certain number they start loosing their significance, because the more pages without any additional value you have their competitiveness in search results is weaker while spreading the same amount of link value to too many pages. On the other hand, breadcrumbs allows crossing from individual pages to categories and subcategories, etc.

Website performanse

Website load speed is one of the key things search engines (Google) take into account when deciding what will be shown before. Some of the tools that are good for checking you website load speed are Google Page Speed Insights which gives you a general list of things to be changed in order to improve your load speed, but for a more detailed report you should check out Pingdom Tools where you can see which file exactly requires more time in order for the page to load faster.

Canonicalization

Code that looks like:

<link rel=”canonical” href=”https://www.example.com”>

Is a very useful, but unfortunately also an element which brings a lot of misunderstanding. Canonical link actually tells the search engine: “the original content you see here is in fact there”.

If you use WordPress and Yoast plugin, this part of the code is automatically generated so that it references the page where it is. The reason why this might be useful could be, if someone copies your entire code they will also copy the canonical which will point to the original page, however, if you would be the one copying something from the <head> to some other page, you might copy this as well which will become an obstacle for that page to rank well.

The most interesting example of this mistake I have ever see was moving from www to non-www where canonical link was pointing to www while that page was 301 redirected to non-www: Mind blown.

Page indexability

Here the search operator “site:” will be tremendously helpful. By typing site:mywebsite.com in Google search you’ll get a list of all indexed pages on that domain. There could be 3 possible outcomes when it comes to the number of those pages.

Number of indexed pages is smaller than the real number — there is a chance that due to technical problems crawler cannot access all the pages and index them.

Number of pages in the search results page is more or less the same as the actual number — this is what we are after and it means that our website is pretty healthy in this manner.

Number of indexed pages is bigger than the actual number — there are couple of cases when the number of pages in the search results go beyond the actual number of pages and one of them can be a duplicate content. Often it happens when the website moves to https version some pages stay not redirected. One of the best tool for discovering these issues is Screaming Frog.

Bear in mind that the more pages you have on your domain the number you get as the number of indexed pages will be less accurate.

What you should also pay attention to is whether there are pages which are www and pages that are non-www, are there http and https website versions and whether the order actually makes sense i.e. is the homepage actually a first result.

Global navigation

In this part of the audit you might want to turn off JavaScript in the browser after all. The reason is because up till recently JavaScript links didn’t pass any link juice at all and even though this is way better now, this part is too important to gamble with it so you should see what’s going on after JS is gone. Global navigation should represent just a regular unordered list <ul> which you can see or click on without any interference of CSS or JS.

Off-page analysis

When we say off-page analysis we normally think of analysis of backlink profile and social media activity. Even though on a recent conference in Serbia I have visited Gary Ilyes from Google claimed they don’t use any social media activity for positioning, there were couple of tests that found a certain level of correlation of social media signals with better positioning in SERP. Of course, correlation doesn’t mean causation, but you should have it in mind.

When it comes to backlinks there are two tools that are truly great for their analysis and those are MajesticSEO and Open Site Explorer, but if it comes to a personal preference, I almost always choose first. The metrics they have are Citation Flow and Trust Flow and a short and shallow intro to both would be that Citation Flow is affected by the number of links while Trust Flow is influenced by their authority. They can help you find bad links which negatively affect your website in order to disallow them in Google Search Console.

Unfortunately there is no tool which will tell you which link actually contributed to your recent Google penalty, but there are certain patterns you can follow till you get some more experience. Above all, it is important to know that “nofollow” links don’t bring you any kind of value nor damage, but if your website has more than 90% of such links it can be understood as a spammy attempt of placing links in comments and such. This will also be manifested by the low trust flow.

Often, in SEO circles you can hear that it is extremely important to have links from the websites where trust flow is higher than yours, but that is not necessarily true. Link from contextually good and localized website with relatively low trust flow can bring even more value than from some other. Of course, if you always doubt, you can check the website from where the link is coming. If it looks spammy with too many ads and suspicious content you probably don’t want their link.

Breaking SEO myths

Google uses bounce rate as a ranking factor — as a matter of fact, Google doesn’t even know your Bounce Rate. What is being used is when someone types a keyword and your page ranks first, then a person clicks on that result and after a little while comes back and clicks on something else, it is considered that your page is not that relevant for this keyword.

Links to other websites “spend” the Page Rank of your website — this is a very Old School thought and links to other website actually raise the value of your website by citing the other resources that are relevant for that topic.

This already too long article unfortunately didn’t cover topics regarding the on-page ranking factors such as content and its structure as well as keyword research, but if someone is particularly interested or any other topic related to SEO, and not found in this article, we can cover it in comments.

I originally wrote this article in Serbian for the organization i work for, Startit, and it can be found here.