Directory Traversal Pwned
Learn the core concepts of Directory Traversal Vulnerability
The article will teach you what directory traversal is, how to find it, how to exploit it and how to prevent it.
What the heck is Directory Traversal?
Directory Traversal also known as Path Traversal, is a vulnerability that allows an attacker to read files that the attacker shouldn't be able to.
It is a type of injection vulnerability because of which an attacker can inject a path to the vulnerable endpoint and be able to access the contents of the file specified in the path.
For Example:
There's a web application: vulnerable-web-app.com
The web app has an endpoint Images: /Images?filename=ProfilePic.png
Assuming that this image ProfilePic.png resides in /var/www/images
directory on the server.
The Images endpoint has a "filename
" parameter that fetches the user's profile picture, as shown in the endpoint "/Images?filename=ProfilePic.png
". Here the profile picture that's being fetched from the server filesystem is named "ProfilePic.png
".
But here the web app's endpoint is vulnerable to Directory Traversal. So attacker will replace the image's name with the file path that he wants to read.
Considering that the file attacker is trying to access from the server is "/etc/passwd
".
Note for beginners: /etc/passwd is a file in the Linux filesystem that contains a list of user accounts on the system and other details regarding them.
So the URL now looks like this with the payload "/etc/passwd" injected in it:
https://www.vulnerable-web-app.com/Images?filename=/etc/passwd
Below is a screenshot of request and response in burp suite to explain the above:
Focus on the highlighted parts only.
We have assumed Host ( web app ) to be: www.vulnerable-web-app.com
Current request URL is:
https://www.vulnerable-web-app.com/images?filename=ProfilePic.png
The screenshot above shows the request fetches the "ProfilePic.png" of the current user. It is a valid request, that the web app allows users to make.
Now we will alter this request and replace the image name with the file's path that we want to access "/etc/passwd".
Request with payload /etc/passwd
:
We've altered the request.
Now the request URL with payload looks like:
https://www.vulnerable-web-app.com/images?filename=/etc/passwd
.We'll send it to the server.
The server surprisingly responds with the content of the file "/etc/passwd" ( shown in the screenshot below ).
The server should have responded with a client-side error ( status codes in the range of 400 ) but instead it responds with a 200. Shown in the screenshot below.
This confirms that the web app's endpoint is vulnerable to Directory Traversal.
Response from the server:
Why /etc/passwd file?
Well, a standard user doesn't have much privileges, not as much as the root user or sudo user ( in Windows admin user ).
So as a standard user, we can't access or read a lot of files in the root directory ( or the system's directory ).
But there are a few files that even a standard user can read, one of which is
/etc/passwd
file.You don't need root privileges to read this file that is why we use this while checking for Directory Traversal vulnerability.
But What if the server is Windows Server, not Linux?
Well, then try to read different files, one of which is:
\WINDOWS\win.ini
And the traversal sequence has a backward slash in it unlike Linux servers:
..\..\WINDOWS\win.ini
NOTE: Traversal sequences are nothing but these: ../
in Linux and ..\
in Windows.
Hopefully, you all now get at least the basic concept behind the Directory Traversal. Let's move further now.
How to Find Directory Traversal Vulnerability?
An application can have a lot of endpoints or instances for manual testing, that's why the use of automated scanning tools like ZAP or burp suite ( intruder with wordlist ) is recommended along with manual testing.
The first thing that any tester should do is try to understand the web application's working, mapping the web app thoroughly is necessary.
While mapping the web app, identify and note down all the endpoints or instances where a file or directory name is present. In simpler words, note down the requests that initiate a request and fetch a file or directory from the server.
Now test each of these identified endpoints or instances with whatever payload you prefer, a wordlist or manually input custom paths and observe the response.
Don't worry about this much yet, you'll understand it more if you haven't already when you get hands-on in the Directory Traversal labs challenge attached at the end of the article.
How to exploit Directory Traversal?
There are a total of 6 main methods using which this vulnerability can be exploited:
- Simple Case
../../../../../etc/passwd --> in Linux
or
..\..\..\..\..\WINDOWS\win.ini --> in Windows
- In simple cases, we try to get out of the current directory that the image is being fetched from and reach the root path, and then from there, we move into the directories where the file that we want is located.
Assuming that the image that was being fetched resides in /var/www/images directory:
- We first used traverse sequence ../../../../../ to move out of "/var/www/images" and reach / "root" directory
NOTE: We can use as many traverse sequences as we'd like but it should not be less than the number we're required to get out and reach the root path. The extra traverse sequence gets ignored so no error will be thrown either. For example, to get out of /var/www/image we require 3 traverse sequences ( ../ ) to reach / but can use more traverse sequences if we suspect that we're even deeper in the filesystem than /var/www/images.
Then, we moved into the directory where passwd file resides which is /etc/passwd.
- Traversal Sequence blocked with absolute path bypass
/etc/passwd
In this case, developers try to put up a filter for any payload containing a traverse sequence in the request sent.
But still, we can access the file using the absolute path which is "/etc/passwd".
- Traversal Sequences stripped non-recursively
Developers put a filter for any traverse sequence in the path of the request.
So we forge a request with a payload "....//....//....//....//".
The highlighted parts get stripped as they are traverse sequence "..
../
/..../
/..../
/..../
/".So now we end up with .. which was before the traverse sequence stripped and a / which was after the traverse sequence stripped.
....//....//....//....//etc/passwd
../../../../etc/passwd
--> after the traverse sequense is stripped
- Traversal Sequences stripped with superfluous URL-decode
Sometimes, the web app is protected in such a way that disregards the entire request that contains such a custom path.
Then we try to bypass these using URL-encoding.
But there are WAFs ( Web Application Firewalls ) also that protect from such attacks, which blocks any URL-encoded request that contains a path in it as payload.
It checks the encoded URL by decoding it and analysing the request to see if there's any path in it.
But at times the WAF rules to block aren't that well written, There are chances that the WAF only blocks one URL-encoding, but what if we encoded the URL twice or even thrice? If the WAF only decodes the URL once, then this too can be bypassed.
Once encoded URL
Twice encoded URL:
- Validation of start of the path
/var/www/images/../../../etc/passwd
Some web app only checks if the request contains the expected start of the path or not.
For example, our current web app expects that the user trying to fetch
ProfilePic.png
will request to fetch the image from/var/www/images
directory.So we use that at the start of the path so that the web application allows this request to pass but then we use the traverse sequence to come back out to the root directory and then to the
/etc/passwd
file./var/www/images/../../../etc/passwd
And this is how we can bypass this as well.
Validation of file extension with null byte ( %00 ) bypass
%00 are called null bytes.
When we add these in the request, anything appearing after the null bytes gets ignored by the web app.
Some web app checks if the request contains an allowed file extension or not.
In our case, it would be jpg or png as the web app fetches images.
So we use a null byte to bypass this.
Even though the request contains a valid file extension at the end ".png", because of the fact that we have added %00 before the name of the image file "ProfilePic.png", the image file name gets ignored and only the part of the path that was appearing before the null byte gets passed to the server "
/etc/passwd
" in/images?filename=/etc/passwd
.
/images?filename=/etc/passwd%00ProfilePic.png
Those are all 6 methods of exploiting Directory Traversal. Now Let's understand how we protect against this attack.
How to Prevent it?
The best way to prevent such vulnerabilities is to avoid passing user-supplied input to filesystem APIs.
Note: Filesystem APIs are the mediator that allows interaction with resources on the filesystem. So here it is allowing us to access our images.
But if that's not possible, then we can put up a few layers of defence ( best if you use them all together ):
Use Whitelisting: Allow only those user input that matches any of the permitted values.
If that too is not possible, then make sure to allow only those user inputs that contain alphanumeric values and disregard or strip any request that contains any special characters etc.
Once the input has been validated, the application should add the input to the base directory and utilize a platform-specific filesystem API to obtain the canonicalized path. Subsequently, it should confirm that the canonicalized path indeed begins with the predetermined base directory as expected.
And that is how we can prevent this attack.
Great job, you did it!!
Challenge:
You've been given all the information that you need to solve these labs.
You will have to make an account on portswigger.net to be able to proceed further.
After you've made an account, visit the following link which will take you to all the 6 labs of Directory Traversal.
Now go and crush it, soldier.
https://portswigger.net/web-security/all-labs#directory-traversal