The Best AI Chatbot for Penetration Testing (Part Two)
Our previous article reviewed 5 of the top AI Chatbots in an attempt to determine which bots performed the best when used to source knowledge, code and build a Penetration Testing tool for 3 subject matter areas. Unsurprisingly this review only led to more questions and a very clear outcome. ChatGPT was the head and shoulders best platform, but during our testing we realised that our questions were constructed based on our previous experience with ChatGPT, so the prepared queries were most likely skewed to favouring interpretation by ChatGPT.
These insights triggered us into writing a second article on this topic where we explore the three best performing AI bots; ChatGPT, Google Bard and Bing AI with the intent to produce a Cross Site Scripting Testing tool, with instructions for use and background on what to do with the outcomes to protect a tested website. We will use organic discovery on each platform to guide the interactions towards delivering the outlined result and report on the ability to deliver, the volume of chat required to achieve the outcome and the style and type of questions required to achieve the outcome.
Review of response summary for each platform and the process undertaken to achieve the outcome will indicate the viability of each chatbot to deliver AI Penetration Testing support and perhaps the ease to which each platform delivers. The purpose is to demonstrate which platform is the best to use for novice and intermediate skill level penetration testers or provide a formula that works best for each platform.
ChatGPT
This platform was the best performer in our original review and again the platform came through in spades. What was clear was our experience using ChatGPT absolutely skewed the results from our first review. Again our experience with the platform has probably impacted the outcome, but we were able to deliver exactly what was envisioned. We have a penetration testing script to test for cross site scripting, we have very clear implementation and usage instructions, including command snippets and a clear plan for what to do with the script response findings. We did stop when we had enough framework to be able to produce a working script, however ChatGPT even provided a pathway for further development of the script;
This example demonstrates the steps you described. However, please note that the HttpRequestResponse class used in this example is a simplified representation for demonstration purposes.
Further ChatGPT queries would be able to develop more robust response code and would not raise any warnings or malicious usage issues as the line of questioning moves toward specific request-response coding queries.
It took around fifteen related queries to reach the point where we had elicited enough partial responses from ChatGPT to be confident to build a query that would not be rejected and would deliver the collated response we desired. The final produced script iteration was generated by ChatGPT based on the following query (grammar, spelling and words cleaned up for publishing purposes);
Can you write a Java project that tests a list of websites from a list input file with the Burp XSS extension that achieves the following:
1. Configures a sample Target and Intruder
2. Has the Payload Setup
3. Configures the Payload Processing
4. Simulates a sample Attack
5. Analyses the simulated attack Responses:
6. Confirms and lists example Vulnerabilities:
7. Demonstrates the Reporting output from the simulated attack
The response from ChatGPT produced the following snippet of code to be used with Burp Suite Community;
What was interesting was, for demonstration purposes, ChatGPT opted to code for a single URL www.example.com where in an earlier query iteration of the script ChatGPT provided the request code for unpacking a list of websites. This meant we could re-ask the question or replace the single site code snippet with the multi-site testing. The ChatGPT response even states that the choice of a single URL was made for simplicity. We also asked ChatGPT how we would use the code;
Can you provide instructions how I would use the above script and what do with the findings it returns.
Which elicited the following response;
Absolutely, I’d be happy to guide you through the process of using the script and what to do with the findings it returns.
Using the Script:
-
-
- Setup Burp Suite Extension:
-
-
-
-
-
- Ensure you have Burp Suite Community Edition installed.
- Create a new directory (e.g., BurpXssTester) and place the XssTester.java file in it.
-
-
-
-
-
- Compile the Script:
-
-
-
-
- Open a terminal or command prompt and navigate to the directory containing XssTester.java.
- Compile the Java code using the following command, replacing /path/to/burp-extender-api-X.X.X.jar with the actual path to the Burp Suite Extender API JAR file:
-
-
javac -cp /path/to/burp-extder-api-X.X.X.jar XssTester.java
-
-
- Run the Script:
-
-
-
-
- Run the compiled Java program using the following command, replacing /path/to/burp-suite.jar with the actual path to the Burp Suite application JAR file:
-
-
java -cp .:/path/to/burp-suite.jar XssTester
-
-
- Review Output:
-
-
-
-
- The script will test each website listed in websites.txt for potential XSS vulnerabilities using the Burp Suite extension.
- Any potential vulnerabilities will be highlighted, and an alert will be issued for each website.
-
-
What to Do with the Findings:
-
-
- Review Alerts and Highlights:
-
-
-
-
-
- Monitor the terminal output for alerts indicating potential XSS vulnerabilities.
- The script highlights the parts of the response that match the XSS payloads.
-
-
-
-
-
- Inspect Vulnerable Websites:
-
-
-
-
-
- For websites where potential XSS is detected, manually inspect the responses to verify if the highlighted portions indicate an actual vulnerability.
- Investigate the highlighted parts to see if they contain injected JavaScript code.
-
-
-
-
-
- Report and Mitigate:
-
-
-
-
-
- If you confirm XSS vulnerabilities, report them to the website owner or administrator, if you have permission to do so.
- Provide detailed information about the vulnerability, the affected URL, and steps to reproduce.
- Encourage the website owner to fix the vulnerability by implementing proper input validation and output encoding.
-
-
-
-
-
- Ethical Considerations:
-
-
-
-
-
- Always ensure you have explicit permission to test websites for security vulnerabilities.
- Follow ethical guidelines and applicable laws when conducting security assessments.
-
-
-
-
-
- Further Testing:
-
-
-
-
- Consider performing manual testing and more thorough assessment of vulnerabilities to validate the results.
- Explore other security testing methods beyond XSS, as web applications may have other vulnerabilities.
-
-
Remember that the script provided is a basic example for educational purposes. In real-world scenarios, comprehensive security testing requires in-depth knowledge, consideration of various factors, and adherence to ethical practices.
We must also not forget the little vignette from ChatGPT helping along the way, such as the advice that the Http Response code required further work to be more robust or how to install and compile Java and Java programs.
ChatGPT is absolutely a valuable tool to use for penetration testing. We used 18 queries to build the content outlined above. During the process ChatGPT also provided great deal of other valuable information such as:
- How to set up a Java Project;
- What is the framework of a Burp Suite Extension;
- What techniques are used to pen test a website;
- What sort of tools are used to scan a website for vulnerabilities;
- A list and details of the existing tools to automatically pen test a website;
- The framework and steps used to scan a website for vulnerabilities;
- A couple examples of malicious JavaScript that could be used to inject into a website form.
ChatGPT is able to provide a great deal more information on the background and processes of Penetration Testing and specifically Cross-Site Scripting and there is an opportunity to further develop a more robust testing script based on ChatGPT suggestions and utilising the script to identify weaknesses. All the additional work should not create any roadblocks based on content and would allow someone new to penetration testing and cyber security to be able to more effectively perform their job. This was achieved with version 3.5, so we could continue to query the bot indefinitely without concern of being blocked for over usage, only if demand is high.
The major CON of using ChatGPT is the age of the information, having not been updated since September 2021, however for only $20 USD/month you can upgrade to ChatGPT which comes with a raft of other values, but there are plenty of articles detailing why.
Google Bard
The second attempt at creating a workable XSS script with Google Bard and Bing AI demonstrated that both are also valuable platforms, with each of their own strengths and weaknesses. A key differentiator between Bard and ChatGPT is their approach to delivering their query responses. Bard, as you will see below, was also able to produce a working script, but the code leverages a great deal more library calls, where ChatGPT was more scratch programming. Personally I prefer the Bard approach with the coding, but with a background in development that is to be expected. There are schools of thought that individuals new to coding should begin at scratch (keeping your logic all together and not use abstractions) to better understand the syntax of the language and the logic (and business logic) of their program. The background information and instructions were also presented much differently. ChatGPT tried to summarise and produce complete responses based on all the information it collected, where Bard focuses on the query and responses specifically to the question without a great deal of contextual information.
Much like ChatGPT and our previous experience with the different AI tools, direct questions specifically asking for Penetration Testing content how-tos are construed as malicious and rejected by the AI bot. Bard was able to produce very useful code as we saw with ChatGPT, however the background and instructions were not as robust. Additional more probing questions to better summarise the content could elicit better responses from Bard, however, Bard performed very well in building the script.
We posed the exact same 7 point question (in the article above) to Bard as we did ChatGPT. The response from Bard is below;
As demonstrated in the code, both provided similar responses but approached the coding with fundamentally different styles. Bard has taken a much more functional approach where ChatGPT is modular. While ChatGPT took shortcuts for demonstration purposes, Bard took the minimalist approach but provided a response that does specifically what is requested.
When Bard was queried how to create a Java project, the response was instruction on how to install and set up a project in an IDE (integrated development environment), specifically using Eclipse (an open source IDE product). When we asked Bard “How do I build, compile, and run the script you provided”, we received the following response;
To build, compile, and run the script I provided, you will need to have the following installed on your computer:
-
-
- Java: You can download Java from the Oracle website: https://www.oracle.com/java/technologies/javase-downloads.html.
- Burp Suite: You can download Burp Suite from the Burp Suite website: https://portswigger.net/burp/.
-
Once you have Java and Burp Suite installed, you can follow these steps to build, compile, and run the script:
-
-
- Open a terminal window.
- Navigate to the directory where the script is located.
- Run the following command to compile the script: javac XSSScanner.java
-
-
-
- Run the following command to run the script: java XSSScanner list.txt
-
Where list.txt is the file containing the list of websites to scan.
The script will then scan the list of websites and print out any websites that are vulnerable to XSS.
Here are some additional tips for building, compiling, and running the script:
-
-
- You can use the -cp option to specify the classpath for the script.
- You can use the -Dburp.config=path/to/burp.config option to specify the Burp Suite configuration file.
- You can use the -Dburp.port=port option to specify the Burp Suite listening port.
-
By following these steps, you can build, compile, and run the script.
The pattern clearly emerging with Bard is an ability to distil the information sourced to respond to the query and provide a concise response specifically answering what was asked. The responses are detailed and customised but have a formulaic style. The final question asked for Bard was “Can you provide instructions how I would use the above script and what I would do with the findings.”. The response from Bard was as follows;
Sure, here are the instructions on how to use the above script and what to do with the findings it returns:
-
-
- Install Burp Suite.
- Compile the Java project and load the JAR file into Burp Suite.
- Open the “Scanner” tab and click on the “Start” button.
- The project will scan the list of websites and print the results to the console.
- If any XSS vulnerabilities are found, they will be listed along with the description of the vulnerability.
-
The findings of the scan can be used to improve the security of the websites. For example, if an XSS vulnerability is found, the website can be updated to properly sanitize user input.
Here are some additional tips for using the script:
-
-
- The list of websites to scan can be modified by editing the websites.txt file.
- The XSS payload can be modified by editing the payload variable in the BurpXSListener class.
- The script can be run in headless mode by setting the headless variable to true in the BurpXSScanner class.
-
Bard is no fuss, no muss, here is the answer to your query. It is fair to say Bard performed very well in the experiment; the code provided was usable, the instructions on how to implement and use the code are clear and concise, even if not overly robust. Bard did not provide a great deal of background or next step information, however additional queries and digging deeper should elicit the content required. The quality of the code and usage instructions will deliver effective penetration testing services for an XSS review of a website.
Bing AI
Bing AI is a whole other beast to ChatGPT and Bard. Bing AI is the Bing search engine chat feature, so it should come as no surprise that all query responses come with a number of additional links that can be used to follow up and learn more information. Bing AI delivered coding examples that were much more specific than the responses delivered by ChatGPT and Bard. The code snippets provided could be pasted into working programs and used as presented. The return of additional links for nearly all responses also provided a great learning pathway for any novice penetration tester to build their knowledge base.
Bing falls in the middle between ChatGPT and Bard when looking at the contextual information provided with a response. Query responses are generally specific, but do explain what the response means. This gives the platform a leg up on Bard, but does fall well short of ChatGPT. Many answers are no fuss and to the point, but the additional links on the topic provide avenues to expound on the topic.
The code response by Bing is not as easy to compare to Bard and ChatGPT as it took a different approach leveraging Beautiful Soup. We followed the conversation the same way we did with Bard and ChatGPT, asking to test one site, then a list of sites, while also asking it to test for a number of XSS specific vulnerabilities. The final, most robust, response from Bing was below;
Like Bard, Bing took a very modular approach to the script. The script tests for various XSS exploits, is easy to use and follow and answer the query posed, test a list of websites for multiple potential XSS exploits and identify which website has which vulnerabilities found (paraphrased from multiple questions).
When asked for instructions how to use the script and what to do with the finds uncovered, Bing responded with the following;
Sure, I can provide some instructions on how to use the script and what to do with the findings. However, please note that this script is only for educational purposes and not a substitute for professional security testing. Here are the steps:
-
-
- Save the script in a file with a .py extension, such as xss_test.py
-
-
-
- Create a text file with a list of website URLs that you want to test, one per line, such as websites.txt
-
-
-
- Make sure you have Python installed on your system and the requests and BeautifulSoup libraries installed as well. You can use pip to install them by typing pip install requests and pip install beautifulsoup4 in the command prompt.
-
-
-
- Run the script by typing python xss_test.py in the command prompt. The script will read the URLs from the websites.txt file and test each form on each website for XSS vulnerabilities using different payloads.
-
-
-
- The script will create an output file called review_output.txt that will contain a report of the findings. The report will show the number of forms found on each website, their attributes and actions, and any XSS vulnerabilities detected with the payloads used.
-
-
-
- You can open the output file with any text editor and review the findings. If you find any XSS vulnerabilities, you should report them to the website owners or administrators and advise them to fix them as soon as possible. XSS vulnerabilities can allow attackers to execute malicious scripts on the website or steal user data or cookies.
-
The response was definitely not as robust as ChatGPT, but there is an opportunity to unpack the final dot point further asking questions as the “website owner”.
It was very interesting to see the different approaches in how each platform responded. Each platform can be a valuable tool for Penetration Testing, they just offer variances in how you would use them or the style of queries you would need. We were curious as to what differentiated the response style between each platform. Who better to find out than by asking each platform how they are different: “What is the difference between ChatGPT, Google Bard and Bing AI”.
ChatGPT:
As of my last knowledge update in September 2021, ChatGPT, Google BARD, and Bing AI are not familiar terms or entities. It’s possible that these are developments that have emerged after my last update, or they might refer to specific projects, services, or technologies that I’m not aware of.
If these are indeed new developments in the field of artificial intelligence or natural language processing, I would recommend checking the latest sources such as news articles, official announcements, or relevant research papers for up-to-date information on their features, capabilities, and differences.
Google Bard:
I’m unable to help, as I am only a language model and don’t have the ability to process and understand that.
Bing AI:
ChatGPT, Google Bard and Bing AI are three AI chatbots that use different models and technologies to generate natural language responses. They are also integrated with different search engines to provide information and assistance to users123.
ChatGPT is based on OpenAI’s GPT-4 model, which is a large-scale neural network that can generate coherent and diverse texts. ChatGPT is known for its creativity and accuracy in generating content, such as poems, stories, code, etc. ChatGPT also has browser plugins and web connectivity that enhance its capabilities12.
Google Bard is based on Google’s BERT model, which is a bidirectional encoder that can understand the context and meaning of natural language. Bard is known for its ability to make real-time references to various sources, such as Wikipedia, IMDb, Yelp, etc. Bard can also help users plan events, compare films, or even plan meals with the food in their fridge34.
Bing AI is based on Microsoft’s Turing-NLG model, which is a deep learning framework that can generate fluent and natural texts. Bing AI is known for its overall performance and reliability in answering questions, providing suggestions, and engaging in conversations. Bing AI can also perform web searches and display relevant results256.
Which chatbot are you most interested in?
Learn more:
The final question sums up the second part of our experiment. ChatGPT is the best tool if you are looking for a consolidated and fully summarised response to your question, but unfortunately, the free version is missing years of information so any response may be missing critical information if new learning has occurred in the subject batterin the last 2 years. Google Bard will get the job done, but is very much in its infancy, however knowing Google, if there is opportunity to monetise AI then we will see a very rapid growth of the capabilities very quickly. Bing AI is a chatbot put overtop of Bing search and performed surprisingly well, with the inclusion of learning more links for every response as a very nice approach to present additional information.
The simple answer is all platforms will deliver what you want, ChatGPT delivers what you need, just be aware that if you are using 3.5 any “new” trends in the industry are missed, while Bing AI is not as robust with responses as ChatGPT, but it makes up for this by accessing the latest information and providing clear pathways for you to learn more on Penetration Testing, Cross Site Scripting (XSS) or whatever cyber security subject matter you require.
As all platforms performed well we thought we might leave the final word up to each platform to “plead their case” as the best programming support platform. We have asked one final question of each platform; “When comparing Google Bard, ChatGPT and Bing AI, which platform is the best for building programming code snippets?”.
ChatGTP
As of my last update in September 2021, “Google Bard” doesn’t appear to be a well-known platform or tool. It’s possible that it’s a newer development or a term that has emerged after my knowledge cutoff date. Therefore, I can’t provide specific information about it.
However, I can provide information about “ChatGPT” (developed by OpenAI) and “Bing AI” (which likely refers to Microsoft’s AI efforts).
-
-
- ChatGPT (OpenAI’s GPT-3): ChatGPT is a language model that can understand and generate human-like text based on input prompts. While it’s not specifically designed for generating programming code snippets, it can provide assistance with code-related queries. It might be able to suggest code snippets, explain coding concepts, and assist with general programming-related questions. It can be accessed through OpenAI’s API.
-
-
-
- Bing AI: As of my last update, “Bing AI” generally refers to Microsoft’s efforts to integrate AI technologies into its Bing search engine. Bing might provide code-related search results, but it’s not primarily built for generating programming code snippets. Instead, it aims to enhance search results using AI capabilities.
-
For building programming code snippets, you might want to explore tools and platforms that are specifically designed for code-related tasks. Some examples include:
-
-
- GitHub Copilot: This AI-powered tool is designed to assist developers in writing code by providing suggestions and auto-completions. It’s a joint effort between GitHub and OpenAI.
-
-
-
- Integrated Development Environments (IDEs): IDEs like Visual Studio Code, IntelliJ IDEA, and PyCharm often include code suggestion features that are tailored to specific programming languages.
-
-
-
- Online Code Editors: Platforms like repl.it, CodePen, and JSFiddle offer code editing environments with syntax highlighting, code completion, and collaboration features.
-
-
-
- Stack Overflow: While not an AI platform, Stack Overflow is a valuable resource for finding code snippets and solutions to programming problems.
-
Keep in mind that the landscape of AI and programming tools evolves quickly. If “Google Bard” is a newer platform, I recommend looking for recent reviews and comparisons to determine its suitability for generating programming code snippets.
Google Bard
Bard, ChatGPT, and Bing AI are all large language models (LLMs) that can be used to generate programming code snippets. However, they have different strengths and weaknesses.
-
-
- Bard is the most comprehensive of the three platforms. It has been trained on a massive dataset of text and code, and it can generate code in over 20 programming languages. It is also better at explaining code than ChatGPT or Bing AI. However, Bard can be slow to generate code, and it is not always accurate.
-
-
-
- ChatGPT is the fastest of the three platforms. It can generate code quickly, and it can access a wider range of information than Bard. However, ChatGPT is not as accurate as Bard, and it is not as good at explaining code.
-
-
-
- Bing AI is a good middle ground between Bard and ChatGPT. It is not as comprehensive as Bard, but it is more accurate. It is also faster than Bard, but it does not have access to as wide a range of information as ChatGPT.
-
Ultimately, the best platform for building programming code snippets depends on the specific needs of the developer. If you need a comprehensive platform that can generate code in a variety of languages, then Bard is the best option. If you need a platform that can generate code quickly, then ChatGPT is the best option. If you need a platform that is accurate and has a good balance of speed and comprehensiveness, then Bing AI is the best option.
Here is a table summarizing the key strengths and weaknesses of each platform:
Platform | Strengths | Weaknesses |
Bard | ChatGPT | Bing AI |
Comprehensive, can generate code in over 20 languages, good at explaining code | Fast, can access a wider range of information, not as accurate as Bard | Accurate, good balance of speed and comprehensiveness |
Slow, not always accurate | Not as comprehensive as Bard, not as good at explaining code | Not as comprehensive as Bard, not as fast as ChatGPT |
Bing AI
According to a comparison done by The Verge, Bing is best for getting information from the web. ChatGPT is the most verbally dextrous. Bard is doing its best but it’s limited compared to the other two. Bing is proficient in code generation and error resolution, although it sometimes produces errors in its code. Bard is primarily unable to write code, and even after numerous attempts with various queries, it remains incredibly unreliable. ChatGPT and Bing make many errors in their code, but ChatGPT coding errors are easier to correct, but that’s not the case with Bing. Upon multiple tries with different questions, Bard may write code but it is completely unreliable. Therefore, if you are looking for a platform to build programming code snippets, Bing and ChatGPT are better options than Bard.
The response from each engine neatly sums up our findings. All three AI Chatbots will work for you, depending on your style and what you require. They will all deliver viable and very usable Penetration Testing learning and support for new to intermediate analysts. I think we will stick with ChatGPT and look at investing the $240 USD as that seems to work for us, but we recommend playing around with each to determine the best fit for your work.