top of page

Sanitizing User Input: How OWASP Protects Your Database from Malicious HTML

  • Autorenbild: Daniel Hirtenlehner
    Daniel Hirtenlehner
  • 24. Juli
  • 4 Min. Lesezeit

User-generated content is everywhere, and with it comes the need for robust input sanitization. Without proper sanitization, your app becomes vulnerable to attacks like XSS or SQL injection. Sanitizing user input is not just best practice — it’s essential for protecting your data and users. In this article, we'll explore how the OWASP Java HTML Sanitizer library can streamline this process, offering a secure and efficient method to empower users while maintaining control over the content.

An image of a friendly wasp, sweeping. To playfully illustrate the idea of OWASP keeping code lean.
OWASP helps you keep your code clean and secure.

Understanding the Challenge 


One of our clients wanted more flexibility with their dashboard content. Previously, updates had to go through us — requiring a full redeploy on their environment. This process was slow and inefficient.


To solve this, we implemented a simple yet secure solution that allowed selected users to edit dashboard content directly, while still protecting the system against malicious inputs. Here’s the solution we implemented:


  1. Implement Quill.js as a WYSIWYG editor, as it works well with our Angular frontend.

  2. The content entered in Quill.js is stored in an MS SQL database.

  3. The content stored in the MS SQL database gets loaded dynamically whenever the app is started, ensuring that any changes are immediately visible to users.

  4. Only certain (super)users at the client have the rights to edit the dashboard content.


While basic security was already in place through write-access restrictions for superusers, the system still needed protection against malicious content. The superuser-generated content comes in the form of HTML, and processing it generally without proper validation can lead to various security vulnerabilities such as Cross-Site Scripting (XSS).


While HTML is the backbone of web content, its ability to include dynamic user-generated content also opens the door to potential security risks. Malicious actors can embed harmful scripts or exploit vulnerabilities within HTML, which, when executed on other users' browsers, can lead to compromised user data and system integrity.


Therefore, sanitizing HTML input is not just about preserving the appearance of web content, but also about neutralizing these threats and ensuring the safety and privacy of all users interacting with the application. So, the goal was to allow certain HTML elements and attributes to allow for text formatting and displaying images while removing or neutralizing potentially harmful code.


When sanitizing HTML, there are two main approaches: blocklisting (removing disallowed elements) and allowlisting (explicitly permitting only specific elements). Blocklisting can be risky — new or obscure threats might still slip through. That’s why we chose the safer route: an allowlist that defines exactly which HTML tags, attributes, and URL protocols are permitted. This gives us full control over what’s allowed in the content and reduces potential vulnerabilities, even when inputs come from trusted superusers.


The OWASP Java HTML Sanitizer Library 


The provided Java code leverages the OWASP Java HTML Sanitizer library to implement a robust HTML sanitization process. This library offers a flexible and customizable way to define a policy for allowing or disallowing specific HTML elements and attributes depending on the user's needs.


```java
public static String sanitizeHTML(String dashboardText) {
	PolicyFactory policy = new HtmlPolicyBuilder()
		.allowElements(
			"a", "p", "img", "strong", "em", "i", "u", "s", "blockquote",
			"h1", "h2", "h3", "h4", "h5", "h6", "ol", "ul", "li", "span", "br"
		)
		.allowAttributes("src", "alt").onElements("img")
		.allowAttributes("href", "rel", "target").onElements("a")
		.allowAttributes("style").globally()
		.allowAttributes("class").globally()
		.allowUrlProtocols("https", "data", "mailto")
		.requireRelNofollowOnLinks()
		.toFactory();
	return policy.sanitize(dashboardText);
}
```

Method 1: Sanitizing HTML 

The sanitizeHTML method is the heart of the sanitization process. It takes an input string, representing the saved HTML content, and applies a predefined policy to sanitize it. Let's break down the key components of the policy:


Allowed Elements: Certain HTML elements are explicitly allowed, including common ones like paragraphs (p), images (img), links (a), and various heading tags (h1 to h6), among other tags like "br" for line breaks.

Allowed Attributes: For specific elements like images and links, only certain attributes (src, alt for images; href, rel, target for links) are permitted.

Global Attributes: The policy globally allows the usage of style and class attributes across all elements.

URL Protocols: Only specified URL protocols (https, data, mailto) are allowed for links. Data has to be allowed because we also want to display pictures. Mailto is allowed for email links.

nofollow Requirement: All links are required to have the rel attribute set to "nofollow."

The result is a sanitized HTML string that retains only the allowed elements and attributes while removing any potentially harmful content.


```java
public static int countCharacters(String dashboardText) {
	if (dashboardText == null || dashboardText.isEmpty()) {
		return 0;
	}
	PolicyFactory policy = new HtmlPolicyBuilder()
		.requireRelNofollowOnLinks()
		.toFactory();
	return policy.sanitize(dashboardText).length();
}
```

Method 2: Counting Characters 

Our customer also wanted a limit on how many characters can be saved. The countCharacters method complements the sanitization process by providing a way to assess the length of the sanitized content. First, it checks if the input is null or empty. If not, it sanitizes the input (removing unwanted HTML) and returns the length of the remaining characters.


Conclusion 

In today’s security-conscious development world, sanitizing user-generated content is non-negotiable. The OWASP Java HTML Sanitizer library, as demonstrated in this provided Java code, offers a powerful and customizable solution for mitigating the risks associated with untrusted HTML input.


By carefully defining a policy that allows only the necessary elements and attributes, developers can strike a balance between user interactivity and security. Whether you're building a blog, forum, or any web application that handles user inputs, incorporating HTML sanitization is a proactive step toward creating a safer online experience for users.


Want to empower your users without compromising on security? Let’s talk. At open200, we build tailored solutions that combine flexibility and best-in-class security. Contact us to learn how we can support your next project.

bottom of page