Abhay Rana aka Nemo

Analysing the Indian government cyberspace

I recently did some work on analysing the Indian government cyberspace, thought I should document them somewhere outside of my Twitter1.

List of GoI websites

I’d made a list of Indian government websites in Jan 2019:

The dataset was from 2 sources:

  1. GoI Directory
  2. crt.sh (All certificates ending in .gov.in were used)

I re-ran the scripts to get an updated list (12842 domains), then tabulated them against the public-suffix2 for each. There is a long-tail, and I’ve published results here. Here are the top public suffixes for Indian government sites:

Public Suffix Domains
nic.in 2454
gov.in 7259
in 528
ac.in 490
com 568
co.in 171
org.in 168
edu.in 117
org 844
res.in 134
net.in 12
net 38

Sanskari Proxy

This was a long standing idea on my ideas repo:

A lot of Indian Government websites are inaccessible on the public internet, because they geo-fence it to within Indian Boundaries. The idea is to make a Indian Proxy service that specifically works only for the Geo-fenced Indian government websites.

For eg, if uidai.gov.in is inaccessible, hitting uidai.gov.sanskariproxy.in will get you the same result, proxied via our servers.

Since I’d made an updated list of GoI websites, this seemed easy enough. I realized that setting up uidai.gov.sanskariproxy.in would likely count as impersonation under the Indian law, so I did the next best thing: run an actual proxy. Here’s the announcement tweet:

Project page is https://github.com/captn3m0/sanskari-proxy, and if you’d like to get access - please reach out.

Cyberspace Ownership

I’d planned to get a complete list of geoblocked websites next. While I’m progressing on this front, the results have been inconsistent/inaccurate so far. As an intermediate step, I’d made a list of IPs against every domain3, which looked like this:

Domain IP Address

While running numerous nmap scans (and failing), I start checking the ASN4 for some of these IPs to see who was hosting each website - especially the ones I was finding were blocked.

I stumbled upon a bulk IP to ASN service by Cymru, ran all the IPs against it and published the results. Here’s the important graph:

% of Indian Government domains hosted by each ASN. The image is a pie-chart representation showing share of each ASN. 45% of the chart is taken up by

As you can expect, NIC5 has the highest share, with NKN6, BSNL, and CtrlS following at roughly 5% each. There are a few other chart on the twitter thread, and the raw data is available here with interactive versions of each visualization.

What next?

I’m working on running and comparing connectivity scans to these IPs to get a better understanding of the geoblocking situation. There’s also some issues with the domain list, as it seems to be missing lots of domains - so more corrections are needed.

  1. Twitter decided to suspend 12 different accounts I had access to recently - I’m starting to get wary of using Twitter for archival now. 

  2. A “public suffix” is one under which Internet users can (or historically could) directly register names. For eg - nic.in or github.io. Mozilla manages the list at https://publicsuffix.org/

  3. There are issues with this approach, since domains do resolve to multiple IPs. But this is okay for the rudimentary analysis I’ve been doing so far. 

  4. Autonomous Systems (AS) is how the internet is sliced up and managed by different entities. Each AS (usually an ISP) is responsible for routing within its network, while announcing network routes on how it can be reached. 

  5. The primary government office (under MeitY) that provides infrastructure and support for government IT services. 

  6. National Knowledge Network is a multi-gigabit research and education network that provides a high speed network backbone for educational institutions in India. 

Published on September 16, 2020 in goi,dataviz