org.weborganic.ox.util
Class CharsetDetector

java.lang.Object
  extended by org.weborganic.ox.util.CharsetDetector

public final class CharsetDetector
extends Object

This class can be used to detect the Charset for a file.

Version:
4 December 2012
Author:
Christophe Lauret

Nested Class Summary
static class CharsetDetector.ByteOrderMark
          An enumeration of byte order marks supported by this class.
 
Method Summary
static CharBuffer decode(File f)
          Returns the encoding of the specified file by examining its content.
static Charset getFromBOM(File f)
          Returns the encoding of the specified file based on the byte-order mark to see if it matches a charset defined in this class.
static Charset getFromContent(File f)
          Returns the encoding of the specified file by examining its content.
static void main(String[] args)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getFromBOM

public static Charset getFromBOM(File f)
                          throws IOException
Returns the encoding of the specified file based on the byte-order mark to see if it matches a charset defined in this class. Returns null if there is no byte order mark.

Parameters:
f - The file to check the encoding for.
Returns:
The corresponding charset.
Throws:
IOException - If thrown while attempting to read the file.

getFromContent

public static Charset getFromContent(File f)
                              throws IOException
Returns the encoding of the specified file by examining its content. This method will do the following: - Check if there are any non-ASCII characters, if not return US-ASCII - Attempts to read as a Unicode UTF-8 or UTF-16 - Default to the machine default

Parameters:
f - The file to check the encoding for.
Returns:
The corresponding charset.
Throws:
IOException - If thrown while attempting to read the file.

decode

public static CharBuffer decode(File f)
                         throws IOException
Returns the encoding of the specified file by examining its content. This method will do the following: - Check if there are any non-ASCII characters, if not return US-ASCII - Attempts to read as a Unicode UTF-8 or UTF-16 - Default to the machine default

Parameters:
f - The file to check the encoding for.
Returns:
The corresponding charset.
Throws:
IOException - If thrown while attempting to read the file.

main

public static void main(String[] args)