What's the maximum file size Google can index? - ICG: Strategic Marketing Agency

A typical modern day website relies on interactive elements to provide functionality.

For complex or detailed pages this can result in a webpages’ filesize getting big, sometimes very big, creating a situation where the content that you want Googlebot to read is actually a long way down the page.

So how big is too big for Google?

To answer that question we are going to have to get a bit creative. I generated a list of 202 keywords that currently have no indexed pages, they are pretty much just gibberish with a number attached to the end. I recorded which ones Google picked up after a period of around 10 days.

Each keyword was separated by 100kb of commented out text provided by Project Gutenberg, then the page was submitted to the search results via the “fetch as Google” tool within search console.

You can see from the screenshot above that Google only rendered up to the 158th keyword, or between 15700 kb (15.7 mb) and 15800 kb (15.8 mb), however when clicking on the Fetching tab and viewing the HTTP response, I was only actually shown the first 250 kb of data

I didn’t expect there to be such a massive discrepancy between what Google can actually render and what it shows in the fetch response, Google only showed three of the 202 keywords in the HTTP response section.

Having submitted the fetched page to Google I had to wait for the page to be picked up and after a few hours the results began to trickle in. I allowed Google just over a week to index as many keywords as it could.

As you can see from the screenshot above Google had picked up and indexed the keywords and included some of them on the page as part of the description. Rather than repeat this process another 202 times I ran a rank tracker to pull the first page of Google results, the result was surprisingly similar to the fetch as Google tool

Google has indexed up to the 158th keyword, 15700 kb (15.7 mb) which is exactly where the visual portion of the fetch and render stopped.

This seems to indicate that the “fetch as Google” tool is very much how Google will see your page, just keep in mind if you are looking for a specific block of code that you can’t see beyond the first 250kb in the HTTP response.

Whilst doing this research I also came across a rather strange situation with Google cache; a user had a very heavy javascript page that was indexing fine but when they viewed the page in the cache it was entirely blank. It appears as though Google cache was only holding onto the first 1 mb of data, this was causing the javascript to be truncated and thus leaving the page with no visible content.

So to sum up:
Fetch and render will get between 15.7mb and 15.8mb of data but will only show you 250kb in the HTTP response. This can make it a little bit difficult to debug issues where your page size goes over 250kb.
The actual Google index seems to stop indexing somewhere between 15.7mb and 15.8mb – this is in line with the visual fetch and render which means you can trust the visual portion of the fetch and render to show you what will actually be seen by Googlebot.

Google cache capped at 1mb – it will truncate content after this which can lead to some very strange pages in the cache. Watch out for this as when it truncates it may break page functionality leading you to believe there is a problem with the page!

Why does this matter?

John Mueller recently spoke about how SEO’s can help devs and share knowledge,

The web has moved from plain HTML – as an SEO you can embrace that. Learn from JS devs & share SEO knowledge with them. JS’s not going away.

— John â˜†.o(â‰§â–½â‰¦)o.â˜† (@JohnMu) August 8, 2017

There is an assumption that Google can now crawl so much content that it will be able to index anything, however there is still a hard line which Google will not cross. You are unlikely to hit this simply using on page text but with the increasing use of javascript to create interactive elements and dynamic content, it’s useful to know exactly where Google draws its line.

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category.
cookielawinfo-checkbox-functional	1 year	The GDPR Cookie Consent plugin sets the cookie to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Necessary" category.
cookielawinfo-checkbox-others	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores user consent for cookies in the category "Others".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie stores the user consent for cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
rc::a	never	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::c	session	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	1 year	The GDPR Cookie Consent plugin sets the cookie to store whether or not the user has consented to use cookies. It does not store any personal data.

Cookie	Duration	Description
yt-player-headers-readable	never	The yt-player-headers-readable cookie is used by YouTube to store user preferences related to video playback and interface, enhancing the user's viewing experience.
yt-remote-cast-available	session	The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
yt-remote-cast-installed	session	The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-fast-check-period	session	The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
yt-remote-session-app	session	The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
yt-remote-session-name	session	The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
ytidb::LAST_RESULT_ENTRY_KEY	never	The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie	Duration	Description
_fbp	3 months	Facebook sets this cookie to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising after visiting the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.

Cookie	Duration	Description
VISITOR_INFO1_LIVE	6 months	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
VISITOR_PRIVACY_METADATA	6 months	YouTube sets this cookie to store the user's cookie consent state for the current domain.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.

What’s the maximum file size Google can index?

So how big is too big for Google?

Why does this matter?