DEV Community

Excalibra
Excalibra

Posted on

1

Summary of Methods to Obtain MIME Types of Files in Java

Preface

In daily work, it's often necessary to determine a file's type. This summary outlines two common principles for identifying file types:

  1. Based on File Extension

    • Advantages: Fast, simple code.
    • Disadvantages: Cannot detect the true file type for forged files or files without extensions.
  2. Based on the First Few Characters in the File Stream

    • Advantages: Can identify the true file type.
    • Disadvantages: Slower, more complex code.

To illustrate, tests were conducted with the following files:

  • test.png: A standard PNG file named test.png.
  • test.doc: A copy of test.png renamed to test.doc.

1. Using Files.probeContentType

Introduced in Java 7, the Files.probeContentType method detects MIME types.

public static void test() throws IOException {
    Path path = new File("d:/test.png").toPath();
    String mimeType = Files.probeContentType(path);
    System.out.println(mimeType);
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc application/msword
  • Mechanism: Uses OS-specific FileTypeDetector implementations to determine MIME types.
  • Limitation: Accuracy depends on the operating system.

Conclusion: This method relies on file extensions.


2. Using URLConnection

The URLConnection class offers several APIs for detecting MIME types.

2.1 Using getContentType

public void test() {
    File file = new File("d:/test.png");
    URLConnection connection = file.toURL().openConnection();
    String mimeType = connection.getContentType();
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Detects the true file type, but is slow.

2.2 Using guessContentTypeFromName

public void test() {
    File file = new File("d:/test.png");
    String mimeType = URLConnection.guessContentTypeFromName(file.getName());
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc null ❌ Please refer to 2.4 below for details

This method uses the internal FileNameMap to determine the MIME type.

  • Conclusion: Relies on file extensions.

2.3 Using guessContentTypeFromStream

public static void test() throws Exception {
    FileInputStream inputFile = new FileInputStream("d:/test.doc");
    String mimeType = URLConnection.guessContentTypeFromStream(new BufferedInputStream(inputFile));
    System.out.println(mimeType);
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Detects the true file type by analyzing the file stream.

2.4 Using getFileNameMap

public void test() {
    File file = new File("d:/test.png");
    FileNameMap fileNameMap = URLConnection.getFileNameMap();
    String mimeType = fileNameMap.getContentTypeFor(file.getName());
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc null

The method returns the MIME type table used by all instances of URLConnection. This table is then used to determine the type of input files.

When it comes to URLConnection, the built-in table of MIME types is quite limited.

By default, this class uses the content-types.properties file, located in the JRE_HOME/lib directory. However, we can extend it by specifying a user-specific table using the content.types.user.table property:

System.setProperty("content.types.user.table","<path-to-file>");
Enter fullscreen mode Exit fullscreen mode

Conclusion: Relies on file extensions.


3. Using MimeTypesFileTypeMap

Available in Java 6, this class uses a predefined mime.types file for MIME type detection.

public void test() {
    File file = new File("product.png");
    MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
    String mimeType = fileTypeMap.getContentType(file.getName());
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc application/octet-stream

Here we can pass either the filename or the File instance itself as a parameter to the function. However, the function that takes the File instance internally calls an overloaded method, which accepts the filename as a parameter.

Internally this method looks for a file named mime.types to resolve the type. It is important to note that this method searches for the file in a specific order:

  1. Entries added programmatically to the MimetypesFileTypeMap instance
  2. mime.types in the user's home directory
  3. <java.home>/lib/mime.types
  4. A resource named META-INF/mime.types
  5. A resource named META-INF/mimetypes.default (usually found only in the activation.jar file)

If the file cannot be found, the method will return application/octet-stream as the response.

Conclusion: The file type is determined based on the file extension.


4. Using jMimeMagic

jMimeMagic is a third-party library for detecting MIME types.

Configure Maven Dependency:

Dependency

<dependency>
    <groupId>net.sf.jmimemagic</groupId>
    <artifactId>jmimemagic</artifactId>
    <version>0.1.5</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

We can find the latest version of this library on Maven Central.

Next, let’s explore how to use this library:

public void test() {
    File file = new File("d:/test.doc");
    MagicMatch match = Magic.getMagicMatch(file, false);
    System.out.println(match.getMimeType());
}
Enter fullscreen mode Exit fullscreen mode

The library can handle data streams, so the file does not need to exist in the file system.

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Detects true file types based on file streams.

5. Using Apache Tika

Apache Tika is a toolkit that can detect and extract metadata and text from various files. It features a rich and powerful API, and with tika-core, we can use it to detect the MIME type of files.

Configuring Maven Dependency:

<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>1.18</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

Next, we will use the detect() method to parse the type:

public void whenUsingTika_thenSuccess() {
    File file = new File("d:/test.doc");
    Tika tika = new Tika();
    String mimeType = tika.detect(file);
}
Enter fullscreen mode Exit fullscreen mode

Results

File Result Conclusion
test.png image/png ✔️
test.doc image/png ✔️
  • Conclusion: Accurately detects true file types using file streams.

Summary

The classification based on the detection principles is summarized as follows:

Detection Principle Methods
Based on File Extension 1. Files.probeContentType 2. URLConnection.guessContentTypeFromName 3. URLConnection.getFileNameMap 4. MimeTypesFileTypeMap
Based on File Stream 1. URLConnection.getContentType 2. URLConnection.guessContentTypeFromStream 3. jMimeMagic 4. Apache Tika

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

Great read:

Is it Time to go Back to the Monolith?

History repeats itself. Everything old is new again and I’ve been around long enough to see ideas discarded, rediscovered and return triumphantly to overtake the fad. In recent years SQL has made a tremendous comeback from the dead. We love relational databases all over again. I think the Monolith will have its space odyssey moment again. Microservices and serverless are trends pushed by the cloud vendors, designed to sell us more cloud computing resources.

Microservices make very little sense financially for most use cases. Yes, they can ramp down. But when they scale up, they pay the costs in dividends. The increased observability costs alone line the pockets of the “big cloud” vendors.

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay