O
23

Tracking down a client's duplicate content issue ended up taking 14 hours across 4 days

I found 47 exact copies of their homepage floating around from an old staging site we didn't know existed, and by the time Google actually recrawled everything we had lost two weeks of rankings on their main keywords - has anyone else dealt with hidden duplicate content from old dev environments?
2 comments

Log in to join the discussion

Log In
2 Comments
the_daniel
Man that's a brutal find. I had something similar happen with a client who had three different WordPress installs on the same server from like five years ago. Google had indexed all of them because the old sites were still live with weak htaccess rules. What finally worked for me was running Screaming Frog on their entire domain and then checking every single URL that came back with a 200 status. Then I set up a rule in the server config to block all the old staging folders with a 401 instead of a 404 so Google would stop trying to crawl them entirely. Took about 6 hours of digging through server logs and old backup files to find the staging sites, but that 401 block cleaned up the issue within a week.
3
anderson.jason
Heard Google's John Mueller say 401s work better than 404s for this exact thing.
2