How To Check If Staff Emails Are In Data Breaches

Our Old Friend, the Password

The humble password is still the most common method of authenticating yourself to gain access to a computer or online account. Other systems exist and will continue to appear and evolve but right now, the password is ubiquitous.

The password is a child of the sixties. During the development of the Compatible Time-Sharing System (CTSS), computer scientists realized the files belonging to each user needed to be isolated and protected. A user should be able to see and amend their own files, but they shouldn’t be allowed to see files belonging to someone else.

The solution meant users had to be identified. They needed a user name. And to prove the user was who they said they were, the password was invented. The credit for the invention of the password goes to Fernando J. Corbató.

The trouble with passwords is anyone who knows your password can access your account. It’s like giving them a spare key to your house. Two-factor authentication (2FA) improves this situation. It combines something you know—your password—with something you own—typically your smartphone. When you enter your password into a system with 2FA, a code is sent to your smartphone. You need to enter that code into the computer, too. But 2FA doesn’t replace the password, it augments the security model of the standard password.

Biometrics are being introduced in some systems, too. This combines a unique biological identifier, something you are, into the mix, such as a fingerprint or facial recognition. This pushes beyond two-factor authentication and into multifactor authentication. These newer technologies will not filter through to the majority of computer systems and online services for many more decades, and probably never will arrive in some systems. The password is going to be with us for a long time.

Data Breaches

Data breaches are happening incessantly. The data from these breaches eventually arrives on the dark web where it is sold to other cybercriminals. It can be used in scam emails, phishing emails, different types of fraud and identity theft, and to access other systems. Credential stuffing attacks use automated software to try to log in to systems. These databases of emails and passwords provide the ammunition for those attacks.

People have a bad habit of reusing passwords. Instead of having a unique robust password per system, they often reuse a single password again and again on multiple systems.

It only takes one of those sites to be compromised for all of the other sites to be at risk. Instead of the threat actors knowing your password to the breached site—which you will change as soon as you hear there’s been a breach—they can use that email and password to access your other accounts.

RELATED: The Problem With Passwords is People

10 Billion Breached Accounts

The Have I Been Pwned website collects the data sets from all the data breaches it can. You can search all of that combined data and see whether your email address has been exposed in a breach. If it has, Have I Been Pwned tells you which site or service the data came from. You can then go to that site and change your password or close your account. And if you’ve used the password you used on that site on any other sites, you need to go and change it on sites, too.

There are currently over 10 billion data records in the Have I Been Pwned database. What are the chances one or more of your email addresses are in there? Perhaps a better question would be what are the odds that your email address isn’t in there?

Searching for an Email Address

Checking is easy. Go to the Have I Been Pwned website, and enter your email address into the “Email address” field, and click the “Pwned?” button.

I entered an old email address and found it had been included in six data breaches.

LinkedIn: LinkedIn had a breach in 2016, when 164 million email addresses and passwords were exposed. All of my passwords are unique, so I just had to change one password. Verifications. io: Verifications. io are—or were—an email address verification service. People entered email addresses to find out if they were valid live email addresses. I’d never used them, so evidently someone else entered my email address to have it verified. Of course, there was no password involved, so I had no security steps to take, apart from be on the lookout for spam and phishing emails. Data Enrichment Exposure From PDL: People Data Labs (PDL) make money collecting and selling data. I requested a copy of my data from PDL and from the look of it, I’d guess they get it by scraping and cross-referencing LinkedIn, Twitter, business websites, and other sources. Again, there were no passwords involved, so I had no security steps to take. But I did opt out of their “service” so they can’t sell my data anymore. Onliner Spambot: A spambot called Online Spambot had my email address in it, probably lifted from one of the other breaches. But then the Onliner Spambot itself was breached, leaking 711 million personal records, including some passwords. Collection #1 and Anti Public Combo List: The last two were massive collections of previously breached data, wrapped up into mega-bundles for the convenience of the cybercriminals. So my personal data was in those breaches, but I’d already reacted to and dealt with the original breaches.

The important points to note are:

Your data may be contained in breaches for sites you’ve never even visited. Even when the data breaches don’t contain passwords, your personal data can still be used for criminal purposes, such as spam emails, scam emails, phishing emails, identity theft, and fraud.

Domain Searches

As illuminating and useful as this is, entering the email addresses for all your staff will be time-consuming. Have I Been Pwned’s answer to this is the domain search function. You can register your domain and obtain a report covering any and all email addresses on that domain that have been found in breaches.

And if any email addresses on your domain appear in future breaches, you’ll be notified. That’s pretty cool.

You have to prove ownership of the domain, of course. There are different ways to achieve this. You can:

Verify by email to security@ , hostmaster@ , postmaster@ , or webmaster@ on your domain. Add a meta tag containing a unique ID to the home page of your website. Upload a file to the root of your website, containing a unique ID. Create a TXT record on the domain, containing a unique ID.

This is a great free service and well worth the few moments it takes to register.

Searching for Unrelated Emails

But what if you have a rag-tag collection of emails to check, scattered across different domains? You might have email addresses for gmail.com, and other domains that you’re obviously not going to be able to prove ownership of.

Here’s a Linux shell script that takes a text file as a command-line parameter. The text file should contain email addresses, one per line. The script performs a Have I Been Pwned email search for each email address in the text file.

The script makes use of an authenticated API. You’re going to need an API key. To get a key, you need to register and pay for the service. Troy Hunt has written a thorough blog post on the topic of charging for the use of the API. He explains with complete candor why he was forced to charge as a way to combat API abuse. The cost is USD 3.50 per month, which is less than a coffee from a high street outlet. You can pay for one month, or you can subscribe for a year.

Here’s the entire script.

Before we explain how the script works, you might have noticed it makes use of curl and jq. If you don’t have these installed on your computer, you’ll need to add them.

On Ubuntu, the commands are:

On Fedora, you need to type:

On Manjaro, you’ll use pacman:

RELATED: How to Use curl to Download Files From the Linux Command Line

How the Script Works

The variable $# holds the number of command-line parameters that were passed to the script. If this does not equal one, the usage message is displayed and the script exits. The variable $0 holds the name of the script.

The script reads the email addresses from the text file using cat, and sets $email to hold the name of the email address currently being processed.

The curl command is used to access the API and to retrieve the result. The options we’re using with it are:

s: Silent. A: User-Agent string. Not all HTTP API’s need to receive one, but it’s good practice to include one. You can put your company name in here. H: Extra HTTP header. We’re using an additional HTTP header to pass in the API key. Replace your-API-key-goes-here with your actual API key.

The curl command sends the request to the Have I Been Pwned breached account API URL. The response is piped into jq.

jq extracts the title ( .Title ) of the breach, the internal identifier ( .Name ) for the breach, and the date of the breach ( .BreachDate ) from the unnamed array ( .[] ) holding the JSON information.

A couple of spaces are displayed before the breach title to indent the output. This makes it easier to differentiate between email addresses and breach names. Brackets have been placed on either side of the .Name data item to help with visual parsing. These are simple cosmetics and can be changed or removed, to suit your needs.

Three dashes are displayed to separate the data for each email address, and a pause of 1.6 seconds is added between checks. This is required to avoid bombarding the API too frequently and getting temporarily blocked.

There are 15 data items that you could choose to have displayed. The full list is shown on the API pages of the website.

RELATED: How to Parse JSON Files on the Linux Command Line with jq

Running the Script

Copy the whole script into an editor, replace your-API-key-goes-here with your API key, then save it as “pwnchk.sh.” To make it executable, run this command:

We have a text file called “email-list.txt.” it contains these email addresses:

president@whitehouse. gov vice. president@whitehouse. gov privateoffice@no10. x. gsi. gov. uk

That’s the president and vice president of the United States, and the private office of the prime minister of the United Kingdom. They’re all publicly available email addresses, so we’re not breaking any privacy or security protocols using them here. For convenience, we’re piping the output into less. You could just as easily redirect the output to a file.

The first line mentions “2,844 Separate Data Breaches.”

That’s the name of a collection of breached data made up of 2,844 smaller breaches. It doesn’t mean that email address has been in that many breaches.

Scroll through the output, and you’ll see that those email addresses have been found in multiple breaches dating all the way back to a Myspace breach of 2008.

A Final Word on Passwords

You can also search for passwords on Have I been Pwned. If a match is found it doesn’t necessarily mean that password in the data breach is yours. What it probably means is your password is not unique.

The weaker your password is, the less likely it will be unique. For example, the favorite password of the lazy user, 123456, had 23.5 million matches. That’s why searching by email is the better option.

Always use robust unique passwords. Use a password manager if you have too many passwords to remember. Where 2FA is offered, use it.

The script we’ve presented will help you to check a disparate list of email addresses. It’ll save you a bunch of time, especially if it is something you’re going to run periodically.

Our Old Friend, the Password#

Data Breaches#

10 Billion Breached Accounts#

Searching for an Email Address#

Domain Searches#

Searching for Unrelated Emails#

How the Script Works#

Running the Script#

A Final Word on Passwords#