UNLOCK THE FULL POTENTIAL OF THE WAYBACK MACHINE FOR BUG BOUNTY

 

INTRODUCTION

Bug hunting requires a combination of sharp skills and effective tools. One essential resource is the Wayback Machine a digital archive of the Internet. Once you have mastered its features you will be able to discover sensitive files, identify potential vulnerabilities, and take your ethical hacking skills to the next level. Here is a step-by-step guide to help you make the most of this powerful tool.


WHAT IS THE WAYBACK MACHINE

The Wayback Machine is a service provided by the Internet Archive, storing snapshots of websites over time. It acts as a time capsule for the web, preserving data that may no longer exist online.


How the Wayback Machine Helps in Bug Hunting

The Wayback Machine is more than just an archive it's a powerful tool for ethical hackers and bug bounty hunters. Here's how it can help you in your security research:


Hidden or Forgotten Files: Access old configurations or files that are no longer accessible.

Outdated Security Measures: Identify vulnerabilities in older systems.

Historical Data: Analyze previous states of a website for deeper insights.

THE WEBARCHIVE METHODS

Retrieving Passive URLs Using the CDX API

One of the most efficient methods to search for all URLs associated with a specific domain is by using the Wayback Machine's CDX API. This tool can return a comprehensive list of archived URLs for a target domain and its subdomains. Here's a simple command you can use:


https://web.archive.org/cdx/search/cdx?url=*.example.com/*&collapse=urlkey&output=text&fl=original

This command calls the cdx Wayback Machine API to get a plain text list of unique archive URLs for all target subdomains and paths under a specific domain (e.g. example.com). Make sure to change the domain to your target domain.



Searching for Sensitive Files

After getting the results you can search for endpoints, email and password leaks or look for all the files with sensitive extensions such as .db, .rar, .zip, .docx, .xls, .pdf and others


Curl-Based Data Retrieval

When dealing with large datasets from the Wayback Machine, using a browser can be slow and unreliable. A better approach is to use the CURL tool which allows you to download the results in seconds. Here's a simple CURL command:


curl -G "https://web.archive.org/cdx/search/cdx" - data-urlencode "url=*.example.com/*" - data-urlencode "collapse=urlkey" - data-urlencode "output=text" - data-urlencode "fl=original" > output.txt

After running this command all the domain outputs will be saved in the output.txt file. You can then use a grep command to extract email passwords and files with specific extensions.



And you can also use this command to grep files with juicy extensions that might contain sensitive information.


cat out.txt | uro | grep -E '\.xls|\.xml|\.xlsx|\.json|\.pdf|\.sql|\.doc|\.docx|\.pptx|\.txt|\.zip|\.tar\.gz|\.tgz|\.bak|\.7z|\.rar|\.log|\.cache|\.secret|\.db|\.backup|\.yml|\.gz|\.config|\.csv|\.yaml|\.md|\.md5|\.exe|\.dll|\.bin|\.ini|\.bat|\.sh|\.tar|\.deb|\.git|\.env|\.rpm|\.iso|\.img|\.apk|\.msi|\.dmg|\.tmp|\.crt|\.pem|\.key|\.pub|\.asc'


Fetching Files with Specific Extensions

For better results, you can use this command in your browser to fetch files with specific extensions. This method ensures faster and more refined results displayed directly on your browser screen.


https://web.archive.org/cdx/search/cdx?url=*.example.com/*&collapse=urlkey&output=text&fl=original&filter=original:.*\.(xls|xml|xlsx|json|pdf|sql|doc|docx|pptx|txt|zip|tar\.gz|tgz|bak|7z|rar|log|cache|secret|db|backup|yml|gz|git|config|csv|yaml|md|md5|exe|dll|bin|ini|bat|sh|tar|deb|rpm|iso|img|apk|msi|env|dmg|tmp|crt|pem|key|pub|asc)$


You can also download these results directly in the terminal using a CURL command. It will display the output on the screen while simultaneously saving the results to an output file.


curl "https://web.archive.org/cdx/search/cdx?url=*.example.com/*&collapse=urlkey&output=text&fl=original&filter=original:.*\.(xls|xml|xlsx|json|pdf|sql|doc|docx|pptx|txt|git|zip|tar\.gz|tgz|bak|7z|rar|log|cache|secret|db|backup|yml|gz|config|csv|yaml|md|md5|exe|dll|bin|ini|bat|sh|tar|deb|rpm|iso|img|env|apk|msi|dmg|tmp|crt|pem|key|pub|asc)$" | tee output.txt

Browser-Friendly Filtering

Another method, if you prefer not to use CDX search or download the output file, is to use the Wayback URL method. Simply replace the domain name in the URL and hit enter. This will display all the URLs associated with the target domain only.


https://web.archive.org/web/*/example.com/*


As you can see in the results, it will display all passive URLs associated with the target domain. In the search bar you can type specific extensions like .xls, .zip, or .pdf to filter and access only those file types. This method is easier to use in the browser so feel free to give it a try.


FIND SENSITIVE DATA IN PDFS

After using all the CURL one-liner commands I mentioned earlier, you can try this additional one-liner to search for sensitive data within PDF files.


cat output.txt | grep -Ea '\.pdf' | while read -r url; do curl -s "$url" | pdftotext - - | grep -Eaiq '(internal use only|confidential|strictly private|personal & confidential|private|restricted|internal|not for distribution|do not share|proprietary|trade secret|classified|sensitive|bank statement|invoice|salary|contract|agreement|non disclosure|passport|social security|ssn|date of birth|credit card|identity|id number|company confidential|staff only|management only|internal only)' && echo "$url"; done

This command will first grep all the PDF files from the output, then convert those PDF files into text. It will then search for any specified sensitive words within the text. If any of those given words are found in a PDF, the corresponding URLs will be displayed on the screen.after that check manully that pdfs and report to bug bounty program.



GOLDEN METHOD: ACCESSING DELETED FILES VIA 404 ERRORS

Now, let me share with you a golden method that I often use to find secret files. While many hunters are familiar with the Web Archive CDX method and have discovered sensitive information through it in bug bounty programs, I've rarely seen anyone focus on 404 files. Most hunters tend to move on when they encounter a 404 URL but I'm going to show you how you can access those deleted files from the server.



Go to https://web.archive.org/ and paste the 404 URL you found from the Wayback Archive into the search bar then hit enter. You'll see the entire timeline for that URL. Select an older snapshot from the timeline and click on it. This will provide you with a download link for the deleted files. Isn't that amazing? You can access all the deleted sensitive files from any website this way.


Exploring Robots.txt Files

Many websites have a robots.txt file which provides instructions to web crawlers on which parts of the site to access and which to avoid. Sometimes these files contain valuable data such as hidden endpoints or URLs that aren't meant for public. By examining older versions of the robots.txt file in the Wayback Machine you can uncover additional endpoints for security testing.



STEPS TO REMOVE CONTENT FROM THE WAYBACK MACHINE

If you are a website owner and want to remove sensitive content from the Wayback Machine, follow these steps:


Contact Internet Archive: Reach out to the Internet Archive support team to request the removal of specific pages or files

Update Robots.txt: Modify your website's robots.txt file to disallow the Wayback Machine from crawling certain pages.

Legal Takedown Notice: If content violates copyright or privacy laws, you can submit a DMCA takedown notice.

Only website owners or authorized parties can request content removal


Proven Results: How My Method Helped my Subscriber Earn a Bug Bounty


There are numerous success stories from my subscribers who have achieved amazing results using this method. You can check all the proof in my Telegram channel, and I've also uploaded the method on my YouTube channel (Lostsec) so feel free to watch it there as well!


CONCLUSION

The Wayback Machine is a powerful tool that can help bug bounty hunters uncover hidden vulnerabilities, recover deleted files, and analyze outdated security measures. By leveraging its full potential you can take your ethical hacking skills to the next level and enhance your bug bounty hunting efforts. Always remember to respect the ethical guidelines and follow the rules of the program you are participating in.


DISCLAIMER

The content provided in this article is for educational and informational purposes only. Always ensure you have proper authorization before conducting security assessments. Use this information responsibly

Post a Comment

Previous Post Next Post