Preface
In daily work, it's often necessary to determine a file's type. This summary outlines two common principles for identifying file types:
- 
Based on File Extension - Advantages: Fast, simple code.
- Disadvantages: Cannot detect the true file type for forged files or files without extensions.
 
- 
Based on the First Few Characters in the File Stream - Advantages: Can identify the true file type.
- Disadvantages: Slower, more complex code.
 
To illustrate, tests were conducted with the following files:
- 
test.png: A standard PNG file namedtest.png.
- 
test.doc: A copy oftest.pngrenamed totest.doc.
  
  
  1. Using Files.probeContentType
Introduced in Java 7, the Files.probeContentType method detects MIME types.
public static void test() throws IOException {
    Path path = new File("d:/test.png").toPath();
    String mimeType = Files.probeContentType(path);
    System.out.println(mimeType);
}
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | application/msword | ❌ | 
- 
Mechanism: Uses OS-specific FileTypeDetectorimplementations to determine MIME types.
- Limitation: Accuracy depends on the operating system.
Conclusion: This method relies on file extensions.
  
  
  2. Using URLConnection
The URLConnection class offers several APIs for detecting MIME types.
  
  
  2.1 Using getContentType
public void test() {
    File file = new File("d:/test.png");
    URLConnection connection = file.toURL().openConnection();
    String mimeType = connection.getContentType();
}
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | image/png | ✔️ | 
- Conclusion: Detects the true file type, but is slow.
  
  
  2.2 Using guessContentTypeFromName
public void test() {
    File file = new File("d:/test.png");
    String mimeType = URLConnection.guessContentTypeFromName(file.getName());
}
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | null | ❌ Please refer to 2.4 below for details | 
This method uses the internal FileNameMap to determine the MIME type.
- Conclusion: Relies on file extensions.
  
  
  2.3 Using guessContentTypeFromStream
public static void test() throws Exception {
    FileInputStream inputFile = new FileInputStream("d:/test.doc");
    String mimeType = URLConnection.guessContentTypeFromStream(new BufferedInputStream(inputFile));
    System.out.println(mimeType);
}
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | image/png | ✔️ | 
- Conclusion: Detects the true file type by analyzing the file stream.
  
  
  2.4 Using getFileNameMap
public void test() {
    File file = new File("d:/test.png");
    FileNameMap fileNameMap = URLConnection.getFileNameMap();
    String mimeType = fileNameMap.getContentTypeFor(file.getName());
}
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | null | ❌ | 
The method returns the MIME type table used by all instances of URLConnection. This table is then used to determine the type of input files.  
When it comes to URLConnection, the built-in table of MIME types is quite limited.  
By default, this class uses the content-types.properties file, located in the JRE_HOME/lib directory. However, we can extend it by specifying a user-specific table using the content.types.user.table property:
System.setProperty("content.types.user.table","<path-to-file>");
Conclusion: Relies on file extensions.
  
  
  3. Using MimeTypesFileTypeMap
Available in Java 6, this class uses a predefined mime.types file for MIME type detection.
public void test() {
    File file = new File("product.png");
    MimetypesFileTypeMap fileTypeMap = new MimetypesFileTypeMap();
    String mimeType = fileTypeMap.getContentType(file.getName());
}
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | application/octet-stream | ❌ | 
Here we can pass either the filename or the File instance itself as a parameter to the function. However, the function that takes the File instance internally calls an overloaded method, which accepts the filename as a parameter.  
Internally this method looks for a file named mime.types to resolve the type. It is important to note that this method searches for the file in a specific order:
- Entries added programmatically to the MimetypesFileTypeMapinstance
- 
mime.typesin the user's home directory
- 
<java.home>/lib/mime.types
- A resource named META-INF/mime.types
- A resource named META-INF/mimetypes.default(usually found only in theactivation.jarfile)
If the file cannot be found, the method will return application/octet-stream as the response.  
Conclusion: The file type is determined based on the file extension.
  
  
  4. Using jMimeMagic
jMimeMagic is a third-party library for detecting MIME types.
Configure Maven Dependency:
Dependency
<dependency>
    <groupId>net.sf.jmimemagic</groupId>
    <artifactId>jmimemagic</artifactId>
    <version>0.1.5</version>
</dependency>
We can find the latest version of this library on Maven Central.
Next, let’s explore how to use this library:
public void test() {
    File file = new File("d:/test.doc");
    MagicMatch match = Magic.getMagicMatch(file, false);
    System.out.println(match.getMimeType());
}
The library can handle data streams, so the file does not need to exist in the file system.
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | image/png | ✔️ | 
- Conclusion: Detects true file types based on file streams.
5. Using Apache Tika
Apache Tika is a toolkit that can detect and extract metadata and text from various files. It features a rich and powerful API, and with tika-core, we can use it to detect the MIME type of files.
Configuring Maven Dependency:
<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-core</artifactId>
    <version>1.18</version>
</dependency>
Next, we will use the detect() method to parse the type:
public void whenUsingTika_thenSuccess() {
    File file = new File("d:/test.doc");
    Tika tika = new Tika();
    String mimeType = tika.detect(file);
}
Results
| File | Result | Conclusion | 
|---|---|---|
| test.png | image/png | ✔️ | 
| test.doc | image/png | ✔️ | 
- Conclusion: Accurately detects true file types using file streams.
Summary
The classification based on the detection principles is summarized as follows:
| Detection Principle | Methods | 
|---|---|
| Based on File Extension | 1. Files.probeContentType2.URLConnection.guessContentTypeFromName3.URLConnection.getFileNameMap4.MimeTypesFileTypeMap | 
| Based on File Stream | 1. URLConnection.getContentType2.URLConnection.guessContentTypeFromStream3.jMimeMagic4.Apache Tika | 
 

 
    
Top comments (0)