Prudent Devs

What I learned about security and SEO by auditing 25000 sites

Auditing 25k sites to know the state of security and seo in the wild, wild web

#tech , #seo , #security , #martech

Most write out of authority, authority in the field. I don’t. I am a learner. I write for the unlearned about things in which I am unlearned myself. ― C.S. Lewis.

There is an easy way to learn about web-security and SEO. Read the authorities in those fields like, Troy Hunt for security and Brian Dean for SEO. When you have read them, you’ll know everything to know about security & SEO.

Then there is a hard way. Audit top 25k sites to know what is working in the field.

I took up the hard-way. I collected information from these sites using nodejs, stored them, and analyzed.

Summary of my findings:

  • Nobody cares about security
  • Nobody cares about web standards
  • Everyone gzips their site
  • There is no standard way for SEO
  • Facebook open graph is popular
  • Keywords are still used
  • I pity the web-masters

Now to the details.

Security

http vs https

Google announced, in August 2014, https will be a ranking signal. Despite this, only 25% of the sites uses https. 19639 sites (of the top 26k) still are still on http. Even sites like bbc, bestbuy, and backlinko are still running on http. (e-commerce site of bestbuy is served under https, though).

http vs https

Revenues from ads could be one of the reasons, popular sites are still on http. Until recently, Google ads weren’t served on https. But, I found a Google Adsense FAQ, saying Google can serve ads over https now. May be this will increase migration to https.

Between Start SSL, SSL Mate, and Let’s Encrypt, you should be able to find a certificate authority to use in your site.

Masking Server Information

Every website emits “meta-information” about itself in the form of server headers. Two such headers inform technology used behind a website. One is server and another is x-powered-by. A 3rd meta information is part of html meta tags—generator. A mass sniffing program use these data to create database of websites and their servers. Hackers, then, look for exploitable bugs in those servers and launch attacks.

So, security practitioners recommend masking server details. But even OWASP, which is a non-profit organization to improve software security, emits this information (it’s webserver is Apache; and application server is mediawiki 1.23.15).

The most popular web-servers are:

| Server | Count | | :