Checklist for website investigations
Do you regularly investigate websites? Then the checklist on this page can help you. The checklist helps you to structure your website investigations so that you don’t forget anything. In the blog below you can learn how to perform the different steps from the checklist to get even more out of your research.
What are OSINT checklists?
Earlier this month we wrote a blog about how to investigate email addresses. In that blogpost we published our first OSINT checklist: a checklist you can use to structure your investigations. Our checklists provide insight in what you can do with some basis information in an investigation. This includes information such as an email address, telephone number or website. The checklist mainly gives you insight into WHAT you can do with this information, HOW to take the steps from the checklists will be explained in our tutorials and training events. Below you can read how to perform extensive investigations on websites using our checklist.
A website you visit is full of visible information. This includes email addresses, telephone and fax numbers, trade names, Chamber of Commerce, VAT and bank account numbers, names, addresses, social media accounts, etcetera. You can easily check and use this information in your further investigations.
Please also take a look at a website’s terms and conditions and privacy disclaimers. Companies often provide their company details because they are mandatory to do so. So please click on pages and documents of a website to be as complete as possible. However, keep your personal threat model in always in mind.
Monitor a website
Most websites are constantly changing. This means that there may be new relevant information on a website tomorrow or about a week. You can monitor these changes by keeping a look at a website every now and then, but this can be done a lot easier. For example, a website like Visualping.io allows you to automatically monitor a website, where you will be notified by email if any changes occur. You can also indicate exactly what must have been changed in order to get notified.
The term “WHOIS” refers to the protocol that allows you to ask questions about who is responsible for a domain name or an IP address. By requesting the WHOIS data, you may be able to find out who the registrant of a domain name is and which hosting provider (the “Registrar“) has registered the domain name. For example, the WHOIS information may include names, addresses, email addresses, telephone numbers, and information about the technical or administrative contact person.
When retrieving WHOIS data it is important that you combine multiple sources and verify the data you have found. In an upcoming blog post we will explain this in more detail. Sources that can be helpful are sidn.nl (for .NL domains), DomainBigData.com, DomainTools.com (paid), viewdns.info and Whoxy.com (also historical data). You can also retrieve WHOIS data via the “command line” in Linux.
We already described that websites changes can be monitored automatically. Of course, it is also possible to view historical snapshots of websites in some cases. Well-known websites that allow you to view archived files from websites are Archive.org, Archive.is and Cachedpages.com.
Check out our overview of other tools to see more website archive tools. You can also use the Google Cache to retrieve some interesting information! Did you also know that Archive.org offers an “Advanced search”? Super handy!
Texts on a website
Most websites consist of a large amount of plain text. That’s good for you, because texts are very easy to investigate. However, we would like to check if the text is shown on third party websites as well. This is because scammers and criminals sometimes are lazy and use the same texts on multiple websites or in multiple advertisements. This means that you can find other websites or advertisements of the same editor using a piece of text.
One way to check if a text appears on another website is to copy a text and run it through a search engine like Google. The search engine will display all websites that contain exactly the same text (or part of it). For example, other websites that automatically do this for you are Copyscape.com and Plagium.com.
Photos and videos on a website
In addition to texts, a lot of website contain photos and videos. These photos and videos can be just the puzzle piece in your investigation. First, check out these photos and videos to find out if they contain relevant information. You may find a location or time or relevant people. In addition, perform a reverse image search to see if the photo and/or video material is also present on other websites. Finally, also study the Exif data of a photo or video. The Exif data may contain location data, date and times, device types, technical data, and so on.
Hidden links and pages
A website you visit may contain more links and pages than you might think. We are talking about links on the website itself (“internal links“) and links to pages on other website (“external links“). These web pages on websites do exists, but it might be difficult to find them. How do you find hidden links and pages?
A first way to find hidden pages is to search through a search engine like Google with the “site:aware-online.com” operator. This operator will display all web pages of our website aware-online.com. The disadvantage of this is that it only obtains the results indexed by search engines.
A second way is to view the Robots.txt file of a website. This file tells you which web pages will not be indexed by search engines. Therefore, web pages that are mentioned in the Robots.txt file do exist, but are not discoverable via regular search engines.
A third way is to use a browser extension as Link Gopher. Such an add-on helps you to map more links from a web page, but in many cases it is not complete.
A much better way is to use more powerful tooling as the Photon-crawler. With this Python script, all internal and external links are automatically documented and you also get instant insight into used files, email addresses and phone numbers.
Subdomains are part of a domain. For example, www.exam.aware-online.com can be a subdomain of www.aware-online.com. It is obvious that a subdomain may contain additional information that may be useful for your investigation. It is therefore important to always check whether a website has subdomains. One way to check this is through the website Pentest.tools.com or through a powerful Python tool Like Spiderfoot.
When you visit a website, the your web browser translates the source code of the website you are looking at to a nice and smooth readable format. The source code of a website however looks very different from the website you are viewing. The nice thing about the source code is that the source code can contain a lot of information that you cannot extract from the “normal” website. For example, the website Coolblue.nl has a recruitment text at the top of the source code.
Other relevant information that the source code may contain are templates, plug-ins, filenames, Google Analytics IDs, Google AdSense IDs, and so on. This information can provide you with more information about the software running on the website. In addition, using data from the source code, you can investigate whether there are other websites that return the same source code. For example, a website administrator often uses a Google Analytics ID on multiple websites.
If you type in our website in the URL, the domain name System (DNS) automatically translates our domainname to an IP address. This is useful, because without this system we would have to remember all IP addresses of websites we want to visit.
IP addresses for websites can be “unique” or “shared”. With a Unique IP address, you can type the IP address of the website directly into the URL to visit the website. For example, this may be nice to prevent your website from being blocked by a firewall due to blockages imposed on websites that share the same IP address.
For a shared IP address, multiple web sites on a server use the same IP address. You cannot type these IP addresses directly into the URL, because the server does not know which website you want to visit on the web server. By typing in the domain name, the IP address of the web server prompts you to show the domain.
With a shared IP address, you can sometimes find out which websites are running on the web server, which may be relevant to your investigation. In addition, you also get information about the organization that has the IP address in control.
A simple NSlookup via the Windows command prompt indicates that our website aware-online.com has the IPv4 address 220.127.116.11. This IP address is not directly accessed via the URL, which means that there are multiple websites running on this web server. A number of these websites are visible through DomainBigData.
Website certificates are used for the validation and the security (“encryption“) of the traffic between websites (a web server) and a client (your computer). Certificates are provided by so-called Certification Authority’s (CA) which verify the identity of a website. With a valid certificate, you know as a user that you are visiting the “good” website and that the connection is secure (encrypted).
SSL certificates exist in different forms. For example, there are certificates that are valid for a single domain name, certificates that are valid for multiple (sub) domains, and certificates that are valid for an unlimited number of subdomains (“wildcards”). With this data, you as a OSINT practitioner can use a SSL certificate to investigate whether the certificate is also used on other (sub)domains.
Websites that can help you with this are Shodan.io, Censys.io, Crt.sh and Entrust.com. @Sector035 wrote a nice blogpots about it for the Osintcurio.us project!
Other top level domains
A “Top Level domain” (TLD) is the last part of a domain name, for example .nl, .com or .xyz. Top level domains are managed by so-called “registries” which are under contract of ICANN. The Stichting Internet Domein Registratie Nederland (SIDN) is responsible for the .nl top level domain.
Companies that have websites often want their domain name to be visited via other top level domains as well. We have not only registered the domainname aware-online.com, but also the domain name aware-online.nl.
Even though aware-online.nl is redirected directly to aware-online.com when you visit the website, you can get more information about this .nl top level domain. And in some cases you will just get to see the other top level domain’s website, which can provide you possibly relevant information. Therefore, always investigate whether there are multiple top level domains, for example through an search operator such as:
Web site references
It may be interesting to investigate which websites refer to your-target website. This is because these websites may have something to do with your target website. For example, you can investigate this by using the following Google operators:
You will only see results that mention “aware-online.com” except the search results that appear on the website aware-online.com itself.
It may be interesting to check which ports are open on a web server and which services are running on them. For example, this information can be used to map vulnerabilities , which is often done in so-called “penetration testing” or “pentests“.
For example, websites and tools that can help you are Pentest-Tools.com, The harvester and Nmap.org. Always consider whether performing a pentest is legally permissible, whether the target website is not affected and whether you have been given permission to perform a pentest.
Used email addresses
Employees of organizations obviously make frequent use of the email addresses they have received from their organization. For example, we use the email address firstname.lastname@example.org for our communication towards students. Of course, it may be interesting to find out if our employees use multiple email addresses. For example, tools that can help you are SpiderFoot.net and Hunter.io.
Create a personal threat model!
When you visit websites, you leave traces on the website you visit. This directly affects your own safety. Therefore, be aware of the traces you leave and create a personal “threat model” at all times.
Investigate at your own risk
Aware Online has no interest in the websites and tools of third parties mentioned on this website and is not liable for its use. The use of the websites or tools described on this page is therefore entirely at your own risk.
More tutorials or contact?
Want to know more about how you can conduct website investigations? Or do you need support in your projects? Please let us know or follow one of our OSINT training events! Also, we would like to hear from you when you have any comments or suggestions for this article.