Basics of Website Hacking | How The Internet Works?

Hello Cyber Learners,

Welcome to my blog on website hacking and how the internet works. In this post, we will delve into the fascinating world of website hacking and explore the underlying principles of the internet that make it all possible. If you're new here, you have to first understand what is website hacking. so, be sure to check out my previous blog post on the lab setup for website hacking What is Website Hacking | Web Application Penetration Testing | Lab Setup Whether you're a curious beginner or an experienced professional, this blog post will provide you with valuable insights into the world of website hacking and the workings of the internet.

How the Internet Works?
The Client-server Model
The Domain Name System
Internet Ports and Their Role in Network Communication
HTTP Requests and Responses
Internet Security Controls
Content Encoding
JSON Web Tokens
The Same-Origin Policy
Conclusion
Commonly Asked Questions

How the Internet Works?

Before you start looking for website bugs, it's important to know how the internet works. Finding website vulnerabilities means finding weaknesses in internet technology. So, it's necessary for all hackers to have a solid understanding of it. If you want to be good at finding website bugs, you need to know how the internet works.

How the Internet Works

Have you ever wondered what happens when you type www.google.com into your browser? How does your browser know which web page you want to see? To understand this, let's start with a basic question and find out how your browser goes from a domain name, like google.com, to the specific webpage you want to visit.

The Client-Server Model

The internet is made up of two kinds of devices: clients and servers. When you want to visit a website, your browser acts as a client and asks the webserver to give you the web page you're looking for.

The Client-Server Model

A web page is actually a bunch of different files and resources that the server sends to your browser. You'll usually get a text file written in Hypertext Markup Language (HTML), which tells your browser what to display. You might also get some Cascading Style Sheets (CSS) files to make the website look nice, and JavaScript (JS) files that make the website more interactive without needing to talk to the server. JavaScript can do things like resize images as you scroll through the page or check what you've typed before you even send it to the server. Finally, your browser might receive some embedded resources like images or videos, which it combines with everything else to show you the web page you're looking at.

But servers don't just give web pages to users - they also let different applications talk to each other and share data in a controlled way using something called a Web API. For instance, Twitter's API lets other websites ask Twitter's servers for data like public tweets and their authors. APIs like this power all sorts of things on the internet, and we'll talk more about them (and their security issues) later on.

The Domain Name System

Have you ever wondered how your web browser knows where to find the resources you request? Well, every device on the internet has a unique Internet Protocol (IP) address, but these addresses are a combination of numbers and letters that are difficult for humans to remember. That's where the Domain Name System (DNS) comes in to make things easier.

The Domain Name System

Think of the DNS as the phone book for the internet. It translates domain names (like google.com) into their corresponding IP addresses (like 172.217.0.46). So, when you type a domain name into your browser, the DNS server is the one that converts it into an IP address that can be used to find the resources you're looking for.

For instance, if you type "google.com" into your browser, your browser will ask the DNS server, "What IP address is Google located at?" The DNS server will then respond with the corresponding IP address, which your browser will use to connect to the Google server and request the web page you're looking for.

Internet Ports and Their Role in Network Communication

When you visit a website, your browser needs to connect to the web server that hosts the website. To do this, your browser needs to know the unique IP address of the server. Once it has this information, it will try to connect to the server using a specific port.

Internet Ports and Their Role in Network Communication

A port is like a gate that allows your browser to access a specific service on the server. It's like a room number in a hotel that tells you where to find a specific guest. There are thousands of ports available on a server, and each one is identified by a unique number between 0 and 65,535.

Ports help servers, provide multiple services simultaneously, and they also help to quickly direct incoming traffic to the appropriate service. For example, if you connect to port 80 on a server, the server knows that you are trying to access its web services.

By default, web servers use port 80 for regular HTTP messages and port 443 for secure HTTPS messages. HTTPS is a secure version of HTTP that encrypts your data to protect your privacy and security online.

HTTP Requests and Responses

When you use your browser to interact with a server, they communicate using a set of rules called the HyperText Transfer Protocol (HTTP). This protocol specifies how to structure and interpret messages sent over the internet and how web clients and servers should share information. The most common types of HTTP requests are GET and POST, with GET requests retrieving data from the server and POST requests submitting data to it. Other methods include OPTIONS, PUT, and DELETE, with each serving a specific purpose.

HTTP Requests and Responses

Let's take a look at an example GET request that you might encounter:

GET / HTTP/1.1
Host: www.techofide.com
User-Agent: Mozilla/5.0
Accept: text/html,application/xhtml+xml,application/xml
Accept-Language: en-US
Accept-Encoding: gzip, deflate
Connection: close

This request is composed of three parts: the request line, request headers, and an optional request body.

The request line specifies the request method, the requested URL, and the version of HTTP used. In this example, the request is a GET request to the home page of www.techofide.com using HTTP version 1.1.

The request headers pass additional information about the request to the server, allowing it to customize its response to your needs. In this example, the Host header specifies the hostname of the request, and the User-Agent header contains the operating system and software version of your browser. The Accept, Accept-Language, and Accept-Encoding headers inform the server about the preferred format of responses. The Connection header tells the server whether to keep the connection open after it responds.

You may also encounter other common headers, such as the Cookie header, which sends cookies from the client to the server, or the Referer header, which specifies the address of the previous web page that linked to the current page. The Authorization header contains credentials to authenticate a user to a server.

When you request a web page, the server receives your request and tries to fulfill it. The server will send back an HTTP response containing several things: an HTTP status code, HTTP headers, and the HTTP response body, which is the actual web content you requested. The web content can include HTML code, CSS style sheets, JavaScript code, images, and more.

Basics of Website Hacking

Let's take a look at an example HTTP response:

HTTP/1.1 200 OK
2 Date: Tue, 31 Aug 2021 17:38:14 GMT
[...]
3 Content-Type: text/html; charset=UTF-8
4 Server: gws
5 Content-Length: 190532
<!doctype html>

[...]
<title>Google</title>
[...]
</html>

The first line of the response, "HTTP/1.1 200 OK," is the status code. A status code in the 200 range means the request was successful. Other status codes in the 300 range indicate a redirect, while codes in the 400 range mean the client made an error, and codes in the 500 range mean the server made an error. As a bug bounty hunter, understanding these status codes can give you insight into how the server operates.

After the status code, you'll see the HTTP response headers, which include information about the response. The headers can include the response time, content type, server version, and more. Other common response headers include Set-Cookie, Location, Access-Control-Allow-Origin, Content-Security-Policy, and X-Frame-Options.

Finally, the response body contains the actual content of the web page, like the HTML and JavaScript code. Once your browser receives all the information needed to construct the web page, it will render everything for you. Keep an eye on the status code and response headers to better understand the server's behavior and potential vulnerabilities.

So, the next time you use your browser to access a website, you'll know what's going on behind the scenes!

Internet Security Controls

So, now that you know the basics of how information travels over the internet, let's explore some essential security measures that keep it safe from attackers. As a bug bounty hunter, you'll need to think outside the box and find innovative ways to get around these measures, which is why it's crucial to have a good grasp of how they function.

Content Encoding

Did you know that when you visit a website, the information that's sent between your computer and the website isn't always plain text? Websites use different methods to protect the data from getting messed up during transmission. This is done to prevent any errors or issues when you're trying to access a website.

Content Encoding

Encoding involves using common characters that aren't typically used in internet protocols to represent the data being transferred. This helps ensure that the data arrives at its destination without any corruption. If you don't use encoding, your data could get messed up due to special characters being misinterpreted by the internet protocols.

Question: What is Encoding?

Ans: Encoding is like changing information into a secret code that computers can understand. It's done so that the information can be stored, sent, or used in a better way. For example, we can change a message into a code, so it can be sent to someone safely without anyone else understanding it. We can also change the format of things like pictures, videos or sound so that they can be viewed or played on different devices.

Base 64

One of the most popular encoding methods is Base64 encoding. It's often used to send images and encrypted information within web messages.

Here's an example: the string "Techofide" would be encoded as VGVjaG9maWRl= in Base64. The character set for Base64 includes uppercase and lowercase letters, numbers, and the characters "+" and "/", with "=" used for padding.

There's also a modified version of Base64 called Base64url encoding, which is used specifically for URLs. It's similar to Base64, but uses different non-alphanumeric characters and doesn't include padding.

Hexadecimal encoding

Hexadecimal encoding uses a base-16 format to represent characters, with characters ranging from 0 to F. While hex encoding takes up more space than Base64 encoding, it produces a more human-readable encoded string.

Check out this example: the string "Techofide" in hex encoding is 436f6e74656e7420456e636f64696e67, which is longer than the Base64 version we talked about earlier.

URL encoding

URL encoding is another encoding method that's commonly used on the internet. It's a way of converting characters into a format that can be easily transmitted over the web. Each character in a URL-encoded string is represented by its designated hex number preceded by a "%" symbol.

For example, the word "Techofide" would be encoded as %54%65%63%68%6f%66%69%64%65 .

So, the next time you're transferring data, remember the importance of encoding and the different methods available to help protect your data!

JSON Web Tokens

The JSON Web Token (JWT) is a widely used type of authentication token that consists of three components: a header, a payload, and a signature.

JSON Web Tokens

Header

The header component identifies the algorithm used to generate the signature. It is a base64url-encoded string that contains the algorithm name. Here's an example of a JWT header:

eyBhbGcgOiBIUzI1NiwgdHlwIDogSldUIH0K

This string is the base64url-encoded version of this text:

{ "alg" : "HS256", "typ" : "JWT" }

Payload

The payload section contains information about the user's identity. This section is also base64url encoded before being used in the token. Here's an example of the payload section, which is the base64url-encoded string of:

eyB1c2VyX25hbWUgOiBhZG1pbiB9Cg

{ "user_name" : "admin", }

Signature

Finally, the signature section validates that the user hasn't tampered with the token. It is calculated by concatenating the header with the payload, then signing it with the algorithm specified in the header and a secret key. Here's an example of a JWT signature:

4Hb/6ibbViPOzq9SJflsNGPWSk6B8F6EqVrkNjpXh7M

For this specific token, the signature was generated by signing the string eyBhbGcgOiBIUzI1NiwgdHlwIDogSldUIH0K.eyB1c2VyX25hbWUgOiBhZG1pbiB9Cg with the HS256 algorithm using the secret key.

The complete token concatenates each section (the header, payload, and signature), separating them with a period (.):

eyBhbGcgOiBIUzI1NiwgdHlwIDogSldUIH0K.eyB1c2VyX25hbWUgOiBhZG1pbiB9Cg.4Hb/6ibbViPOzq9SJflsNGPWSk6B8F6EqVrkNjpXh7M

When implemented correctly, JSON web tokens provide a secure way to identify the user. When the token arrives at the server, the server can verify that the token has not been tampered with by checking that the signature is correct. Then the server can deduce the user's identity by using the information contained in the payload section. And since the user does not have access to the secret key used to sign the token, they cannot alter the payload and sign the token themselves.

The Same-Origin Policy

The same-origin policy (SOP) is a rule that restricts how a script from one origin can interact with the resources of a different origin. In one sentence, the SOP is this: "a script from page A can access data from page B only if the pages are of the same origin." This rule protects modern web applications and prevents many common web vulnerabilities.

The Same-Origin Policy

Two URLs are said to have the same origin if they share the same protocol, hostname, and port number.

Let’s look at some examples. Page A is at this URL

https://techofide.com/@karitkhunt3r

It uses HTTPS, which, remember, uses port 443 by default. Now look at the following pages to determine which has the same origin as page A, according to the SOP:

https://techofide.com/
http://techofide.com/
https://twitter.com/@karitkhunt3r
https://techofide.com:8080/@karitkhunt3r

The https://techofide.com/ URL is of the same origin as page A, because the two pages share the same origin, protocol, hostname, and port number.

The other three pages do not share the same origin as page A. http://techofide.com/ is of a different origin from page A, because their protocols differ. https://techofide.com/ uses HTTPS, whereas http://techofide.com/ uses HTTP. https://twitter.com/@karitkhunt3r is of a different origin as well, because it has a different hostname. Finally, https://techofide.com:8080/@karitkhunt3r is of a different origin because it uses port 8080, instead of port 443.

Question: What is Session ID?

Ans: A session ID (session identifier) is a unique identifier that is assigned to a user's session when they connect to a website or web application. It is a way for the website or application to keep track of the user's activity during their visit, allowing them to navigate between pages or access different features without having to log in again each time.

Question: What is Cookies?

Ans: Cookies are small text files that are stored on a user's device (such as a computer, smartphone, or tablet) by a website or web application. They are used to remember user preferences, login information, and other data related to the user's activity on the site or application.
Cookies can be managed and deleted by the user through their browser settings, and many websites and applications allow users to control the types of cookies that are used and how they are stored.

Now let’s consider an example to see how SOP protects us. Imagine that you’re logged in to your banking site at onlinebank.com. Unfortunately, you click on a malicious site, attacker.com, in the same browser. The malicious site issues a GET request to onlinebank.com to retrieve your personal information. Since you’re logged into the bank, your browser automatically includes your cookies in every request you send to onlinebank.com, even if the request is generated by a script on a malicious site. Since the request contains a valid session ID, the server of onlinebank.com fulfils the request by sending the HTML page containing your info. The malicious script then reads and retrieves the private email addresses, home addresses, and banking information contained on the page.

Luckily, the SOP will prevent the malicious script hosted on attacker.com from reading the HTML data returned from onlinebank.com. This keeps the malicious script on page A from obtaining sensitive information embedded within page B.

Conclusion

In this blog post, we have covered a variety of topics related to web security, including how the internet works, how to request headers work, what DNS is, how JSON Web Tokens are used, and how content encoding works. It's important to understand these topics in order to effectively secure web applications.

In the next blog post, I will demonstrate our first vulnerability: SQL injection. Stay tuned for this informative and hands-on tutorial.

Commonly Asked Questions

Q1.What is TCP/IP and how does it enable communication across the internet?

Ans. TCP/IP is a set of protocols used for communication over the internet. TCP ensures reliable transmission of data and establishes a connection between sender and receiver, while IP is responsible for addressing and routing packets across the network. Together, they provide a standardized way for devices to transmit and receive data, enabling communication across the internet.

Q2.How does the DNS translate domain names into IP addresses?

Ans.The DNS (Domain Name System) translates domain names into IP addresses using a hierarchical system of domain name servers. When a user enters a domain name, the browser sends a query to a local DNS resolver, which then queries a root DNS server for the TLD server for the requested domain name. The TLD server responds with the address of the authoritative DNS server for the domain name, which is then queried for the IP address associated with the domain name. The resolver caches the IP address and returns it to the browser, which then uses it to establish a connection to the web server hosting the requested website.

Q3.What is the significance of the Accept-Encoding header in an HTTP request?

Ans. The Accept-Encoding header in an HTTP request specifies the content encoding schemes that the client can accept. Content encoding compresses or transforms data before transmission to reduce data sent and improve performance. The header allows negotiation between clients and servers to choose a supported encoding scheme. This improves the efficiency and speed of data transmission over the internet.

Q4.How do JWTs work?

Ans. JWTs (JSON Web Tokens) encode a JSON payload and digitally sign it using a secret key. The token is sent with subsequent requests as an HTTP header or URL parameter. When the server receives a JWT, it verifies the signature using the secret key to ensure the payload has not been tampered with. If the signature is valid, the server can extract the information from the payload to authenticate and authorize the user, enabling stateless authentication and authorization for web applications and APIs.