Internet Privacy
We built Dombox to address privacy and spam issues. An average internet user don't quite understand the importance of Privacy. Allow us to explain why current internet lacks privacy with an Example.
Hash
Can you identify a bunch of text you typed OR photo you have taken OR a video you captured OR any other digital file for that matter without looking at its contents?
With the help of "Hash" you can.
Hash is a unique string that identifies the given file or string.
Hash is a One Way Ticket. Meaning... you can create Hash as long as you have the original message but you cannot create the original message from the hash.
Hash is not a secret value. Hash can be created by anyone as long as they have the original message. The hash value for "SomeRandomString" gonna be the same for you and the person who live in other side of the world.
Hashes are fixed length string. No matter how much data you feed, you always going to get fixed length hash.
Hashes are about two rules.
Rule 1: Each Hash must be unique
Rule 2: Ditto Rule 1
When a hashing algorithm produces the same hash for two different strings, then it's called collision. Such algorithms are considered broken and will be discontinued for security reasons. e.g. MD5 and SHA1 both are vulnerable to collision attacks.
Use Case 1: Passwords => Your passwords are hashed first before storing it in servers
Use Case 2: Storage => For Identifying duplicate content and saving storage space (e.g. File and video hosting websites)
Use Case 3: Anti-Virus => For scanning malicious files. If the file hash in your computer, match a malicious file hash found in your Anti-virus software hash list, then that's a virus file
Use Case 4: Integrity => File integrity check after downloading a file from the internet.
Use Case 5: Digital Signatures => For verifying the authenticity of digital messages or documents.
Gravatar
Have you ever heard of a site called Gravatar? It is one of the most popular avatar services on the internet. Gravatar stands for "Globally Recognized Avatar".
Before the inception of Gravatar, you need to upload your avatar manually in every website you sign up. But after Gravatar, it's all "one" avatar.
According to their stats, they are serving the avatars over 8.6 billion times in a day.
WordPress is a popular open source software. More than 60 million websites you see on the internet powered by that software. This software comes with Gravatar by default. So more than 60 million websites today supports Gravatar.
Even many of the major professional websites like StackOverflow, Github etc depends on the Gravatar service for avatars.
This is how Gravatar works. You go to gravatar.com, signup with your email address and upload an avatar. This avatar is now linked to your email address.
Gravatar uses the email hash to build the avatar URL. [Hash is a unique string that identifies the data]
This is how your avatar image URL looks like. https://secure.gravatar.com/avatar/{MD5 email hash goes here}
Now if you signup to any third party websites or post a comment with your email address, then the Gravatar will be displayed if the site support it.
Although Gravatar solved a major issue, it created two more major issues.
(1) Email Brute-forcing (2) Privacy
Note: An average internet user may not notice these things. So we will try to explain clearly as much as we can.
Entropy
In a nutshell, Entropy is the "Degree of Unpredictability"
You know what is the most common password on the internet?
It's 123456
Now... A hacker's first try would be trying that password. So entropy of that password is "literally zero". Because the hacker cracked the password in the first attempt.
To increase the Entropy, you need to pick a very strong password.
If we give you a "Hash" of an email address and ask you to find the real email address, you would be completely lost. Right?
e.g. 503A8F0B2D11DA49A27150C868A5EEB5 => ?????????@????????
Because there are Gazillion possibilities. The Entropy is very high. The value of this entropy depends on the possible email address combinations. So you have no idea where to start.
But if we give you the "Name" too, then it's going to make your job much easier. A man whose name "Donald Trump" definitely not gonna have an email address that looks like "barackobama@gmail.com"
Underline the word "definitely". Although you still have no idea about the real email address, you are "sure" of something now. So you weakened the entropy.
Let us give you the "Name" and "Email Hash".
Name | Email Hash |
---|---|
Jeff Bezos | 503A8F0B2D11DA49A27150C868A5EEB5 |
Let's try the following combinations.
Email Address | Hash |
---|---|
jeff@amazon.com | 27D637B6F491BCBEE2C87F13136B675E |
bezos@amazon.com | 12B79F144DBF4AA7FEADFD71679A2F91 |
jbezos@amazon.com | 503A8F0B2D11DA49A27150C868A5EEB5 |
There.. we got the correct email hash in the last attempt.
So one thing is clear in the last experiment.
You can find "Valid Email Addresses" if we give you "Name" and "Email Hash"
But If we give you the "Date" too, then you can find the "Active Email Addresses" easily right?
For example, If a user post a comment within the past 6 months or 1 year, then most likely the user is an active email user.
Email Hash + Name | Valid Email Addresses |
Email Hash + Name + Date | Active Email Addresses |
Brute-forcing
In brute-force method, the spammers have to generate multiple email addresses and try sending an email to each generated email address. If the email got accepted then it's a valid email address.
The success rate of this method will be very low. Let's just say the success rate is 5%, that means 95 out of 100 emails are failing. In such cases popular mail services like Gmail, Outlook etc., usually block and blacklist the spammer's IP address.
In Gravatar case, email brute-force / dictionary / combinations attacks are not going be an issue. All you have to do now is generate email hash based on the name you see right next to avatar and compare with the avatar email hash. If it matches then you found a valid email address.
A spammer can find a massive amount of Gravatar URLs by crawling the web.
Efficiency
Gravatar method is actually efficient too. Let's measure the efficiency.
Total number of email users in the world: 3.8 Billion
Although some users may have multiple accounts, let's go with one mail address for each user.
So we have 3.8 billion email addresses.
An average consumer computer can generate hashes in Millions per second.
A high-end gaming computer that has a graphics card can generate hashes in Billions per second.
Application-Specific Integrated Circuit (ASIC) is a chip designed for specific applications. For example, an ASIC designed for Bitcoin usually has a huge hash rate.
How much are we talking about?
Let us grab the screenshot for AntMiner S9
Can you tell us what "TH" stands for in that screenshot?
Exactly...
Trillion Hashes / Tera Hashes.
In the screenshot they claim, the chip can generate up to 14 Trillion Hashes per second.
If you try 1000 name combinations for each email address, you would use only 3.8 Trillion hashes for 3.8 Billion email addresses.
So you have used roughly quarter of a 1 second to try all the email addresses available in the world.
That's more efficient than sending emails to services like Gmail to validate email addresses. Wouldn't you agree?
Privacy
Gravatar means globally recognized avatar right? If you signup to any website that supports gravatar, then your avatar URL going to be the same. This is the real problem.
Let us explain clearly. Let's say you have a website example.com and you would like to support Gravatar.
There is no API for Gravatar. All you have to do is just take your user's email address and generate email hash.
Now just load the following URL for the image. That's it.
https://secure.gravatar.com/avatar/{your user's MD5 email hash goes here}
If you can do that, then everyone in the world can do that too right? That is the problem here.
In Internet sex sells. There are plenty of people out there who use the same email address for everything from professional use to signing up for porn websites.
For the sake of our argument, imagine you are a girl who goes kinky in such websites and one of your colleagues is stalking you. Now if your colleague does a deep we scan, that would reveal all your activity if the site supports gravatar.
As far as we know Gravatar TOS doesn't exclude any such websites.
Even if a site doesn't support Gravatar today, there is no guarantee the site won't support it in the future.
To be quite honest, we are less concerned about the porn websites.
There are things that require more privacy. e.g. A person from a suppressed country who protest under a pen name now can be traced back.
We can even give you more examples. People who hide their sexuality in the real world but open about it on the Internet, People who seek discreet medical help on public forums etc.
Let us demonstrate the issue by using one of their team member avatars.
Pay attention. We are going to use only the avatar URL to find the user related activity on the internet
Google indexed only certain pages. But if you build a web crawler only for that particular job, then you can have more results.
Even if "Toni" change his name to "John" while signing up to a website or commenting on an article, the avatar URL going to stay the same since it's linked to his email address. So he can be traced back.
Again... We found those results, using only his avatar URL. not his name.
Government agencies can able to create full-fledged scanning tool only for this purpose.
Now we know what you are gonna say.
"I have never heard of Gravatar before. So why should I bother?"
Well... we got news for you. The disturbing thing here is that It doesn't matter whether you have signed up for Gravatar or not.
Keep in mind, the subject of our discussion here is "Gravatar URL". Not "Gravatar Users"
If you have ever used your email address on a third party website for commenting or signing up, chances are your privacy is at risk.
This is because third-party websites have no idea whether you had signed up for gravatar or not. So they need to build the Gravatar URL for everyone using email hash.
If there is an avatar linked to your email address, then that avatar will be displayed. Else a default avatar will be displayed.
The blog on the next image contains 500+ users comments with avatars.
The comments that have an avatar are the real "Gravatar" users. The comments that have dummy avatar are "Non-Gravatar" users.
Pay attention to the "Non-Gravatar" user avatar URLs. Email addresses are still hashed there.
There may be few million gravatar users. But you can most likely find billions of gravatar URLs on the Internet.
For what its worth, We are not blaming Gravatar for this. Because the problem they solved is completely different. We are just pointing out the flaws in their system.
Note: Gravatar privacy issue applicable only to the public pages that can be crawled.