What I learned about security and SEO by auditing 25000 sites
Auditing 25k sites to know the state of security and seo in the wild, wild web#tech , #seo , #security , #martech
Most write out of authority, authority in the field. I don’t. I am a learner. I write for the unlearned about things in which I am unlearned myself. ― C.S. Lewis.
There is an easy way to learn about web-security and SEO. Read the authorities in those fields like, Troy Hunt for security and Brian Dean for SEO. When you have read them, you’ll know everything to know about security & SEO.
Then there is a hard way. Audit top 25k sites to know what is working in the field.
Summary of my findings:
- Nobody cares about security
- Nobody cares about web standards
- Everyone gzips their site
- There is no standard way for SEO
- Facebook open graph is popular
- Keywords are still used
- I pity the web-masters
Now to the details.
http vs https
Google announced, in August 2014, https will be a ranking signal. Despite this, only 25% of the sites uses https. 19639 sites (of the top 26k) still are still on http. Even sites like bbc, bestbuy, and backlinko are still running on http. (e-commerce site of bestbuy is served under https, though).
Revenues from ads could be one of the reasons, popular sites are still on http. Until recently, Google ads weren’t served on https. But, I found a Google Adsense FAQ, saying Google can serve ads over https now. May be this will increase migration to https.
Masking Server Information
Every website emits “meta-information” about itself in the form of server headers. Two such headers inform technology used behind a website. One is server and another is x-powered-by. A 3rd meta information is part of html meta tags—generator. A mass sniffing program use these data to create database of websites and their servers. Hackers, then, look for exploitable bugs in those servers and launch attacks.
So, security practitioners recommend masking server details. But even OWASP, which is a non-profit organization to improve software security, emits this information (it’s webserver is Apache; and application server is mediawiki 1.23.15).
The most popular web-servers are:
| Server | Count | | :