Network Traffic Analysis with Python: Detecting Malicious Activities

by Bogdan Turcanu
in Cybersecurity, Network Traffic Analysis
on 23 April 2023

A Deep Dive into a Custom Network Traffic Analysis Script

Table of Contents

Introduction

As a network administrator or security professional, monitoring and analyzing network traffic is crucial to maintaining a secure environment. With the increasing number of cyber threats and malicious activities, having the right tools at your disposal is essential. In this blog post, I will share a custom Python script I developed for network traffic analysis. This post will discuss its purpose, scope, technical details, challenges faced, and future plans with the focus of teaching and inspiring you to better detect various malicious activities within your network.

Purpose and Scope

I developed this network traffic analysis script to provide a comprehensive solution for detecting various types of malicious activities within a network. The script is designed to help security professionals and network administrators gain valuable insights into their network traffic and take appropriate actions to mitigate potential threats. By analyzing packet capture (PCAP) files, the script can identify different types of attacks and malicious behaviors, such as:

Malware delivery: This script searches for known signatures of malware delivery within the payload of TCP packets.
Phishing attempts: It checks DNS queries for known malicious domains and alerts if a match is found.
Brute force attacks: The script monitors RST flags within TCP packets and checks if the count of login attempts within a given time window exceeds a threshold, which is indicative of a brute force attack.
Distributed Denial of Service (DDoS) attacks: It detects potential DDoS attacks by monitoring IP packet counts within a given time window and checks if the count exceeds a specified threshold.
Command and Control (C&C) traffic: It checks DNS queries for known malicious domains and suspicious domain generation algorithm (DGA) patterns, It examines HTTP request URIs for suspicious patterns, and It inspects IRC traffic for suspicious commands.

Additionally, my script is capable of detecting a wide range of other malicious patterns:

Directory Traversal Attack
Cross-site Scripting (XSS) Attack
Email Address Harvesting
SQL Injection Attack

The script serves the following specific functions:

Fetch and cache malicious definitions: The script downloads the latest definitions of malicious IPs, domains, and signatures from specified sources in the malicious_definitions.json configuration file. These definitions are cached to improve performance during analysis.
Read pcap file: The script reads the provided pcap file, which contains captured network traffic, using the pyshark library.
Analyze traffic: Each packet in the pcap file is analyzed for signs of malicious activity. The script checks for known malicious IPs and domains, as well as traffic patterns indicative of attacks (e.g., brute force, DDoS).
Log detected threats: If a packet is found to be malicious, the script logs information about the packet, the type of threat it represents, and any additional details that may aid in understanding the threat (e.g., source and destination IPs, domain, attack signature).
Output results: The script provides a summary of detected threats, including the total number of threats and a breakdown of threats by category. This information can be printed to the console or saved to a specified output file.

Technical Details

To develop my network traffic analysis script, I leveraged Python’s powerful libraries and tools. In this section I will discuss the specific libraries and techniques I employed to achieve the script’s functionality, as well as the challenges I encountered.

I chose Python for its readability, ease of use, and extensive libraries that simplify the development of network analysis tools. Let’s examine the 10 essential libraries used and discuss their purpose while using code examples in order to provide a high-level overview of the script’s workflow. We will explain the different stages of the analysis process along the way.

argparse: This library is used to facilitate the input of command-line arguments and options, such as specifying input pcap files, output log files, or cache update settings. This allows users to easily customize the script execution and integrate it with other tools or workflows, enhancing its usability.

import argparse

parser = argparse.ArgumentParser(description="Network Traffic Analysis Script")
parser.add_argument("-i", "--input", help="Path to the pcap file", required=True)
parser.add_argument("-o", "--output", help="Path to the output file", required=False)
args = parser.parse_args()

pcap_file = args.input
output_file = args.output

This snippet creates an ArgumentParser object that will hold the command-line arguments. It defines two arguments -i (input) and -o (output), where the input is the path to the pcap file and the output is the path to the output file. In our script, we also define arguments for the user to specify a threshold and time window for both brute force and DDoS attack detection. These arguments are parsed and stored in the args object for later use.

requests: This library is used for making HTTP requests. The first stage of the analysis process involves fetching and catching the latest malicious definitions. The requests library is used to download this data from specified sources in the malicious_definitions.json configuration file. This allows us to fetch the latest threat intelligence data from various sources, such as IP, domain, and signature sources, ensuring that users have up-to-date information.

import requests

ip_source = "https://example.com/malicious_ips"
response = requests.get(ip_source)

if response.status_code == 200:
    malicious_ips = response.json()
else:
    logging.error("Failed to fetch malicious IPs")

This snippet sends an HTTP GET request to fetch malicious IPs from an example source. If the request is successful, the response is converted to a JSON object and stored in the malicious_ips variable. If the request fails, an error message is logged.

json: This library is used to work with JSON data. In this script, it is used to load the malicious_definitions JSON file. Conveniently loading and working with structured data in the form of malicious definitions, which can be updated and maintained separately helps us adapt to the ever-changing threat landscape.

import json

with open("malicious_definitions.json", "r") as f:
    definitions = json.load(f)

malicious_domains = definitions["domains"]

os: This library provides a way to interact with the operating system. In this script, it is used to check if a file exists, in our case the cache file.

import os

cache_file = "malicious_definitions_cache.pickle"

if os.path.exists(cache_file):
    with open(cache_file, "rb") as f:
        cached_data = pickle.load(f)

If the contents of the cache file exists, then the file is loaded using the pickle library. This makes it more efficient by reducing redundant operations and speeding up the analysis process for users.

pickle: This library is used for object serialization. In this script, it is used to store and retrieve cache data, ultimately saving time and resources for users without compromising the effectiveness of the script.

import pickle

with open(cache_file, "wb") as f:
    pickle.dump(cached_data, f)

pyshark: Once the malicious definitions have been fetched and cached, the script moves on to reading the provided pcap file containing captured network traffic. The ‘pyshark’ library, which is a wrapper for the popular Wireshark network protocol analyzer, is employed to read and parse pcap files, extracting vital packet information for further analysis.

import pyshark

capture = pyshark.FileCapture(pcap_file, display_filter="tcp")

for packet in capture:
    analyze_packet(packet)

This snippet creates a pyshark FileCapture object to read the pcap file and applies a display filter to only capture TCP packets. It then iterates over the packets and analyzes each one using a custom analyze_packet function.

re: With the pcap file loaded and parsed, the script needs to analyze each packet for signs of malicious activity. This library is used for working with regular expressions. In this script, it is used to search for patterns in payloads, detect malicious patterns, and clean the payload. This is important in our context because it enhances the script’s accuracy in identifying potential threats and helps users better understand and mitigate risks associated with malicious network activities.

import re

payload = packet.payload
cleaned_payload = re.sub(r"\W+", "", payload)
malicious_pattern = re.compile(r"some_malicious_pattern")

if malicious_pattern.search(cleaned_payload):
    logging.warning("Malicious pattern detected")

This snippet first removes all non-alphanumeric characters from the packet payload using a regular expression substitution. It then compiles a malicious pattern and checks if it is present in the cleaned payload. If found, a warning is logged. Let’s examine important script code directly from the code now:

def detect_malicious_patterns(packet_stream):
    patterns = [
        (re.compile(r'GET /(?:\.\./\w+)+', re.IGNORECASE), 'Directory Traversal Attack'),
        (re.compile(r'(?:\%3C|\x3C)[\w\s]*?(?:\%2F|\x2F)[\w\s]*?(?:\%3E|\x3E)', re.IGNORECASE), 'Cross-site Scripting (XSS) Attack'),
        (re.compile(r'[\w\.\-_]+@[\w\.\-_]+\.\w+', re.IGNORECASE), 'Email Address Harvesting'),
        (re.compile(r'(?:\%27|\x27|\'|\%2527|%5C)(?:\%45|\x45|E)(?:\%58|\x58|X)(?:\%50|\x50|R)', re.IGNORECASE), 'SQL Injection Attack')
    ]

    for packet in packet_stream:
        payload = packet['payload']
        
        for pattern, attack_name in patterns:
            if pattern.search(payload):
                logging.info(f"Malicious pattern detected! Attack type: {attack_name}, Packet: {packet}")
                break

The detect_malicious_patterns(packet_stream) function is designed to search for specific malicious patterns within the payload of packets in a given packet stream. The function uses the re library to work with regular expressions for pattern matching.

In the function, a list named patterns is defined, containing tuples that consist of compiled regular expression patterns and their corresponding attack names:

Directory Traversal Attack
Cross-site Scripting (XSS) Attack
Email Address Harvesting
SQL Injection Attack

Each pattern is created using re.compile() with a string representing the regular expression pattern and an optional re.IGNORECASE flag to make the search case-insensitive.

Here’s an in-depth explanation of each pattern:

Directory Traversal Attack: This attack aims to access restricted directories by exploiting insufficient security validation of user-supplied input file names. The pattern r'GET /(?:\.\./\w+)+' looks for HTTP GET requests by matching any strings starting with “GET /”, followed by one or more occurrences containing the “../” pattern and at least one word character (a letter, digit, or underscore), which is commonly used in directory traversal attacks to move up in the directory hierarchy. The expression captures requests that repeatedly use this pattern, indicating an attempt to access higher-level directories.
Cross-site Scripting (XSS) Attack: XSS attacks involve injecting malicious scripts into trusted websites. These scripts are executed by the victim’s browser, leading to various harmful consequences like cookie theft or account takeover. The pattern r'(?:\%3C|\x3C)[\w\s]*?(?:\%2F|\x2F)[\w\s]*?(?:\%3E|\x3E)' matches any string that starts with either “%3C” or “\x3C” (representing ‘<‘), followed by zero or more word characters and/or spaces, then either “%2F” or “\x2F” (representing ‘/’), followed by zero or more word characters and/or spaces, and ending with either “%3E” or “\x3E” (representing ‘>’). The pattern aims to identify cases where an attacker is attempting to inject a script using these tags (e.g., <script> or %3Cscript%3E).
Email Address Harvesting: This activity involves collecting email addresses from various sources (such as websites or databases) for use in spamming or phishing campaigns. The pattern r'[\w\.\-_]+@[\w\.\-_]+\.\w+' matches any string containing a valid email address format, with one or more word characters, periods, hyphens, or underscores followed by the “@” symbol, then one or more word characters, periods, hyphens, or underscores, and finally a period followed by one or more word characters. By detecting email addresses within packet payloads, this pattern can identify potential email address harvesting attempts.
SQL Injection Attack: This attack involves injecting malicious SQL code into queries, allowing an attacker to access, modify, or delete data from a database. The pattern r'(?:\%27|\x27|\'|\%2527|%5C)(?:\%45|\x45|E)(?:\%58|\x58|X)(?:\%50|\x50|R)' matches any string that starts with either “%27”, “\x27”, “‘”, “%2527”, or “%5C” (representing a single quote or an escape character), followed by either “%45”, “\x45”, or “E” (representing an ‘E’ character), then either “%58”, “\x58”, or “X” (representing an ‘X’ character), and finally either “%50”, “\x50”, or “R” (representing an ‘R’ character). In an SQL query, the presence of these characters might indicate an attempt to generate an error message that reveals sensitive information (e.g., SELECT * FROM users WHERE id='1' or '1'='1'; --).

The function then iterates through the packet stream, searching for these patterns within the payload of each packet. If a match is found, the function logs the detected malicious pattern, including the attack type and the packet details.

collections: Like re, this library is essential for analyzing traffic packets. The collections library is essential in our script for efficiently tracking and processing network traffic data using specialized container datatypes, such as Counter, defaultdict, and deque. This enables the detection of DDoS attacks and other malicious activities in a timely manner, allowing users to respond quickly to potential threats. For example:

from collections import Counter, defaultdict, deque

packet_counter = Counter()
ddos_detection = defaultdict(deque)

packet_counter["total_packets"] += 1
ddos_detection[source_ip].append(timestamp)

This snippet uses the Counter, defaultdict, and deque data structures from the collections library. The Counter object is used to count packets, while the defaultdict and deque are used for detecting DDoS attacks by keeping track of packet timestamps.

logging: This library is used for flexible event logging. Upon detecting a packet with malicious characteristics, the script logs pertinent information about the packet, the type of threat it represents, and any additional details that may aid in understanding the threat (e.g., source and destination IPs, domain, attack signature).

import logging

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
logging.info("Starting analysis...")

This snippet sets up the logging configuration, including the log level (INFO), and the format for the log messages. It then logs an informational message stating that the analysis has started. Providing informative messages isn’t only user friendly but can be invaluable for debugging purposes or understanding the script’s progress.

sys: This library provides access to some variables used or maintained by the interpreter. We use the sys library to enable the handling of exceptions and output redirection, throughout the script, making the script more robust and allowing users to save results in their preferred format. For example:

import sys

try:
    # Some code that may raise an exception
    pass
except Exception as e:
    sys.stderr.write(f"Error: {e}\n")
    sys.exit(1)

if output_file:
    sys.stdout = open(output_file, "w")

Future Plans and Improvements

As cybersecurity threats continue to evolve, it is essential to keep network traffic analysis tools updated and capable of handling new challenges. Below are some potential future improvements and enhancements to the script to make it more effective and versatile.

Support for additional network traffic: Currently, the script analyzes pcap files containing TCP packets, DNS queries, and HTTP requests. To broaden the scope of the script, future plans could include adding support for other network protocols such as UDP, ICMP, or even encrypted traffic such as HTTPS and SSL/TLS.
Machine learning for threat detection: Incorporating machine learning techniques can help improve the accuracy and efficiency of threat detection. By training a model on a dataset of malicious and benign network traffic, the script could learn to identify patterns that indicate malicious activities more effectively. This would also enable the script to adapt to new and emerging threats without the need for manual updates.
Integration with network monitoring tools: To streamline the network traffic analysis process, the script could be integrated with existing network monitoring tools, such as Security Information and Event Management (SIEM) systems, Intrusion Detection Systems (IDS), or firewalls. This would allow for real-time analysis and alerting, enabling security professionals to act quickly on detected threats.
Automation and scalability: Enhancing the script to run automatically at scheduled intervals or upon certain triggers would increase its usability in larger environments. Additionally, optimizing the script to handle large volumes of network traffic efficiently would enable it to scale more effectively for use in enterprise networks.
Customizable threat detection rules: Allowing users to create and manage their custom threat detection rules would enable the script to be tailored to specific environments and security requirements. This could involve creating a user-friendly interface for rule creation or importing rules from external sources, such as threat intelligence feeds.
Comprehensive reporting and visualization: Improving the script’s reporting capabilities by generating detailed reports and visualizations of detected threats would make it easier for security professionals to analyze the results and take appropriate actions. This could include creating graphs, charts, or even interactive dashboards that provide insights into the network’s security posture.

By implementing these improvements and staying up-to-date with the latest developments in cybersecurity, the script can continue to provide valuable insights and help protect networks against malicious activities.

Conclusion

The custom Python script for network traffic analysis offers a powerful and comprehensive solution for detecting various types of malicious activities within a network. By leveraging Python’s extensive libraries and tools, security professionals and network administrators can gain valuable insights into their network traffic, allowing them to act swiftly in mitigating potential threats. As cyber threats continue to evolve, it is essential to stay ahead by continually improving and enhancing the script to address new challenges and maintain a secure environment.

With plans to expand the script’s capabilities, integrate it with existing network monitoring tools, and incorporate machine learning techniques for improved threat detection, this script promises to be an invaluable asset in the ongoing battle against cybercrime. By sharing the development process, challenges faced, and future plans, I hope to inspire others in the cybersecurity community to develop their own tools, collaborate, and contribute to the collective effort of securing our digital world.

Stay vigilant, keep learning, and together we can build a safer and more secure cyberspace for everyone.

To access the complete Python script for network traffic analysis and to stay updated with the latest enhancements, visit the GitHub repository at https://github.com/bturcanu/Intrustion-Detection-System. We encourage you to explore, contribute, and collaborate on this project to help improve its capabilities and effectiveness.

Network Traffic Analysis with Python: Detecting Malicious Activities