Analysing the Indian government cyberspace ⚓16 Sep 2020
I recently did some work on analysing the Indian government cyberspace, thought I should document them somewhere outside of my Twitter1.
List of GoI websites
I’d made a list of Indian government websites in Jan 2019:
I ran @18F /pulse on Indian Government websites to see how many of them support HTTPS. A quick summary:— Nemo (@captn3m0) January 15, 2019
Total Websites: 14183
Total Live Websites: 11710 (82%)
Websites with Valid HTTPS: 4753 (40% of all live websites)
Raw Dataset for now: docs.google.com/spreadsheets
The dataset was from 2 sources:
I re-ran the scripts to get an updated list (12842 domains), then tabulated them against the public-suffix2 for each. There is a long-tail, and I’ve published results here. Here are the top public suffixes for Indian government sites:
This was a long standing idea on my ideas repo:
A lot of Indian Government websites are inaccessible on the public internet, because they geo-fence it to within Indian Boundaries. The idea is to make a Indian Proxy service that specifically works only for the Geo-fenced Indian government websites.
For eg, if
uidai.gov.inis inaccessible, hitting
uidai.gov.sanskariproxy.inwill get you the same result, proxied via our servers.
Since I’d made an updated list of GoI websites, this seemed easy enough. I realized that setting up
uidai.gov.sanskariproxy.in would likely count as impersonation under the Indian law,
so I did the next best thing: run an actual proxy. Here’s the announcement tweet:
Are you a security researcher outside India? Do you hate getting geoblocked to Indian government websites?— Nemo (@captn3m0) September 5, 2020
Well, I made a proxy for security researchers outside India to access Indian government websites without resorting to shady VPNs.
Project page is https://github.com/captn3m0/sanskari-proxy, and if you’d like to get access - please reach out.
I’d planned to get a complete list of geoblocked websites next. While I’m progressing on this front, the results have been inconsistent/inaccurate so far. As an intermediate step, I’d made a list of IPs against every domain3, which looked like this:
I stumbled upon a bulk IP to ASN service by Cymru, ran all the IPs against it and published the results. Here’s the important graph:
As you can expect, NIC5 has the highest share, with NKN6, BSNL, and CtrlS following at roughly 5% each. There are a few other chart on the twitter thread, and the raw data is available here with interactive versions of each visualization.
I’m working on running and comparing connectivity scans to these IPs to get a better understanding of the geoblocking situation. There’s also some issues with the domain list, as it seems to be missing lots of domains - so more corrections are needed.
Twitter decided to suspend 12 different accounts I had access to recently - I’m starting to get wary of using Twitter for archival now. ↩
There are issues with this approach, since domains do resolve to multiple IPs. But this is okay for the rudimentary analysis I’ve been doing so far. ↩
Autonomous Systems (AS) is how the internet is sliced up and managed by different entities. Each AS (usually an ISP) is responsible for routing within its network, while announcing network routes on how it can be reached. ↩
The primary government office (under MeitY) that provides infrastructure and support for government IT services. ↩
National Knowledge Network is a multi-gigabit research and education network that provides a high speed network backbone for educational institutions in India. ↩
Published on September 16, 2020 in goi,dataviz