A Simple Java Program
Let’s look more closely at one of the simplest Java programs you can have one that merely prints a message to console:
public class FirstSample
{
public static void main(String[] args)
{
System.out.println("We will not use 'Hello,
World!'");
}
}
It is worth spending all the time you need to become comfortable with the framework of this sample; the pieces will recur in all applications. First and foremost, Java is case sensitive. If you made any mistakes in capitalization (such as typing Main instead of main), the program will not run.
Now let’s look at this source code line by line. The keyword public is called an access modifier; these modifiers control the level of access other parts of a program have to this code. The keyword class reminds you that everything in a Java program lives inside a class. For now, think of a class as a container for the program logic that defines the behavior of an application. Classes are the building blocks with which all Java applications are built. Everything in a Java program must be inside a class.
Following the keyword class is the name of the class. The rules for class names in Java are quite generous. Names must begin with a letter, and after that, they can have any combination of letters and digits. The length is essentially unlimited. You cannot use a Java reserved word (such as public or class) for a class name.
The standard naming convention (used in the name FirstSample) is that class names are nouns that start with an uppercase letter. If a name consists of multiple words, use an initial uppercase letter in each of the words. This use of uppercase letters in the middle of a name is sometimes called “camel case” or, self-referentially, “CamelCase.”
You need to make the file name for the source code the same as the name of the public class, with the extension .java appended. Thus, you must store this code in a file called FirstSample.java. (Again, case is important—don’t use firstsample.java.)
You compile the file with the command
javac FirstSample.java
If you have named the file correctly and not made any typos in the source code, you end up with a file containing the bytecodes for this class. The Java compiler names the bytecode file FirstSample.class and stores it in the same directory as the source file. Finally, launch the program by issuing the following command:
java FirstSample
(Remember to leave off the .class extension.) When the program executes, it simply displays the string We will not use 'Hello, World!' on the console. When you use
java ClassName
to run a compiled program, the Java virtual machine always starts execution with the code in the main method in the class you indicate. (The term “method” is Java-speak for a function.) Thus, you must have a main method in the source of your class for your code to execute. You can, of course, add your own methods to a class and call them from the main method.
Notice the braces { } in the source code. In Java, as in C/C++, braces delineate the parts (usually called blocks) in your program. In Java, the code for any method must be started by an opening brace { and ended by a closing brace }.
Brace styles have inspired an inordinate amount of useless controversy. As whitespace is irrelevant to the Java compiler, you can use whatever brace style you like.
For now, don’t worry about the keywords static void just think of them as part of what you need to get a Java program to compile. The point to remember for now is that every Java application must have a main method that is declared in the following way:
public class ClassName
{
public static void main(String[] args)
{
program statements
}
}
The byzantine complexity of public static void main is about to give way to something simpler. Java 21 has a preview feature, described in Java Enhancement Proposal (JEP) 445, that allows you to write the main method like this:
class FirstSample
{
void main()
{
System.out.println("We will not use 'Hello,
World!'");
}
}
That is nicer. No public, no static, no String[] args.
For now, you need to compile and run the program
with a command-line flag:
javac --enable-preview --source 21 FirstSample.java
java --enable-preview FirstSample
Even simpler, since there is a single source file, you
can skip the javac step:
java --enable-preview --source 21 FirstSample.java
for very simple programs, JEP 445 allows you to omit the class, provided, of course, there is only one. The file FirstSample.java can simply contain:
void main()
{
System.out.println("We will not use 'Hello,
World!'");
}
Since these features are still in preview, we won't use them, but in the future, the first sample program will be a bit simpler.
As a C++ programmer, you know what a class is. Java classes are similar to C++ classes, but there are a few differences that can trap you. For example, in Java all functions are methods of some class. (The standard terminology refers to them as methods, not member functions.) Thus, in Java you must have a shell class for the main method. You may also be familiar with the idea of static member functions in C++. These are member functions defined inside a class that do not operate on objects.
The main method in Java is always static. Finally, as in C/C++, the void keyword indicates that this method does not return a value. Unlike C/C++, the main method does not return an “exit code” to the operating system. If the main method exits normally, the Java program has the exit code 0, indicating successful completion. To terminate the program with a different exit code, call System.exit(code).
Now that you have seen the basic structure of all Java
programs, turn your attention to the contents of the main method:
{
System.out.println("We will not use 'Hello, World!'");
}
Braces mark the beginning and end of the body of the method. This method has only one statement in it. As with most programming languages, you can think of Java
statements as sentences of the language. In Java, every statement must end with a semicolon. In particular, carriage returns do not mark the end of a statement, so statements can span multiple lines if need be.
The body of the main method contains a statement that outputs a single line of text to the console.
Here, we are using the System.out object and calling its println method. Notice the periods used to invoke a method. Java uses the general syntax
object.method(arguments)
as its equivalent of a function call.
In this case, the println method receives a string argument. The method displays the string argument on the console. It then terminates the output line, so that each call to println displays its output on a new line. Notice that Java, like C/C++, uses double quotes to delimit strings.
Methods in Java, like functions in any programming language, can use zero, one, or more arguments. Even if a method has no arguments, you must still use empty
parentheses. For example, a variant of the println method with no arguments just prints a blank line. You invoke it with the call
System.out.println();
System.out also has a print method that doesn’t add a newline character to the output. For example, System.out.print("Hello") prints Hello without a newline. The next output appears immediately after the letter o.
Comments
Comments in Java, as in most programming languages, do not show up in the executable program. Thus, you can add as many comments as needed without fear of bloating the code. Java has three ways of marking comments. The most common form is a //. Use this for a comment that runs from the // to the end of the line.
System.out.println("We will not use 'Hello, World!'"); // is this too cute?
When longer comments are needed, you can mark each line with a //, or you can use the /* and / comment delimiters that let you block off a longer comment.
Finally, a third kind of comment is used to generate documentation automatically. This comment uses a /* to start and a */ to end.
/* / comments do not nest in Java. That is, you might not be able to deactivate code simply by surrounding it with / and */ because the code you want to deactivate might itself contain a */ delimiter.
Data Types
Java is a strongly typed language. This means that every variable must have a declared type. There are eight primitive types in Java. Four of them are integer types; two are floating-point number types; one is the character type char, used for UTF-16 code units in the Unicode encoding scheme; and one is a boolean type for truth values.
Java has an arbitrary-precision arithmetic package. However, “big numbers,” as they are called, are Java objects and not a primitive Java type.
Integer Types
The integer types are for numbers without fractional parts. Negative values are allowed. Java provides the four integer types:
byte 1 byte –128 to 127
short 2 bytes –32,768 to 32,767
int 4 bytes –2,147,483,648 to 2,147,483,647 (just over 2 billion)
long 8 bytes –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
In most situations, the int type is the most practical. If you want to represent the number of inhabitants of our planet, you’ll need to resort to a long. The byte and short types are mainly intended for specialized applications, such as low-level file handling, or for large arrays when storage space is at a premium.
Under Java, the ranges of the integer types do not depend on the machine on which you will be running the Java code. This alleviates a major pain for the programmer who wants to move software from one platform to another, or even between operating systems on the same platform. In contrast, C and C++ programs use the most efficient integer type for each processor. As a result, a C program that runs well on a 32-bit processor may exhibit integer overflow on a 16-bit system. Since Java programs must run with the same results on all machines, the ranges for the various types are fixed.
Long integer numbers have a suffix L or l (for example, 4000000000L). Hexadecimal numbers have a prefix 0x or 0X (for example, 0xCAFE). Octal numbers have a prefix 0 (for example, 010 is 8)—naturally, this can be confusing, and few programmers use octal constants.
You can write numbers in binary, with a prefix 0b or 0B. For example, 0b1001 is 9. You can add underscores to number literals, such as 1_000_000 (or 0b1111_0100_0010_0100_0000) to denote one million. The underscores are for human eyes only. The Java compiler simply removes them.
In C and C++, the sizes of types such as int and long depend on the target platform. On a 16-bit processor such as the 8086, integers are 2 bytes, but on a 32-bit processor like a Pentium or SPARC they are 4-byte quantities. Similarly, long values are 4-byte on 32-bit processors and 8-byte on 64-bit processors. These differences make it challenging to write cross-platform programs. In Java, the sizes of all numeric types are platform-independent.
Note that Java does not have any unsigned versions of the int, long, short, or byte types. If you work with integer values that can never be negative and you really need an additional bit, you can, with some care, interpret signed integer values as unsigned. For example, instead of having a byte value b represent the range from –128 to 127, you may want a range from 0 to 255. You can store it in a byte. Due to the nature of binary arithmetic, addition, subtraction, and multiplication will work provided they don't overflow. For other operations, call Byte.toUnsignedInt(b) to get an int value between 0 and 255, then process the integer value and cast back to byte. The Integer and Long classes have methods for unsigned division and remainder.
Floating-Point Types
The floating-point types denote numbers with fractional parts. The two floating-point types:
float 4 bytes Approximately ±3.40282347 ×10^38 (6–7 significant decimal digits)
double 8 bytes Approximately ±1.79769313486231570 ×10^308 (15 significant decimal digits)
The name double refers to the fact that these numbers have twice the precision of the float type. (Some people call these double-precision numbers.) The limited precision of float (6-7 significant digits) is simply not sufficient for many situations. Use float values only when you work with a library that requires them, or when you need to store a very large number of them.
Java 20 adds a couple of methods (Float.floatToFloat16 and Float.float16toFloat) for storing “half-precision” 16-bit floating-point numbers in short values. These are used for implementating neural networks.
Numbers of type float have a suffix F or f (for example, 3.14F). Floating-point numbers without an F suffix (such as 3.14) are always considered to be of type double. You can optionally supply the D or d suffix (for example, 3.14D). An E or e denotes a decimal exponent. For example, 1.729E3 is the same as 1729.
You can specify floating-point literals in hexadecimal. For example, 0.125 = 2^–3 can be written as 0x1.0p-3. In hexadecimal notation, you use a p, not an e, to denote the exponent. (An e is a hexadecimal digit.) Note that the mantissa is written in hexadecimal and the exponent in decimal. The base of the exponent is 2, not 10.
All floating-point computations follow the IEEE 754 specification. In particular, there are three special floating-point values to denote overflows and errors:
- Positive infinity
- Negative infinity
- NaN (not a number)
For example, the result of dividing a positive floating-point number by 0 is positive infinity. Dividing 0.0 by 0 or the square root of a negative number yields NaN.
The constants Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY, and Double.NaN (as well as corresponding Float constants) represent these special values, but they are rarely used in practice. In particular, you cannot test
if (x == Double.NaN) // is never true
to check whether a particular result equals Double.NaN. All “not a number” values are considered
distinct. However, you can use the Double.isNaN method:
if (Double.isNaN(x)) // check whether x is "not a
number"
Floating-point numbers are not suitable for financial calculations in which roundoff errors cannot be tolerated. For example, the command System.out.println(2.0 - 1.1) prints 0.8999999999999999, not 0.9 as you would expect. Such roundoff errors are caused by the fact that floating-point numbers are represented in the binary number system. There is no precise binary representation of the fraction 1/10, just as there is no accurate representation of the fraction 1/3 in the decimal system. If you need precise numerical computations without roundoff errors, use the BigDecimal class.
The char Type
The char type was originally intended to describe individual characters. However, this is no longer the case. Nowadays, some Unicode characters can be described with one char value, and other Unicode characters require two char values.
Literal values of type char are enclosed in single quotes. For example, 'A' is a character constant with value 65. It is different from "A", a string containing a single character. Values of type char can be expressed as hexadecimal values that run from \u0000 to \uFFFF.
Besides the \u escape sequences, there are several escape sequences for special characters. You can use these escape sequences inside quoted character literals and strings, such as '\u005B' or "Hello\n". The \u escape sequence (but none of the other escape sequences) can even be used _outside _quoted character constants and strings. For example,
public static void main(String\u005B\u005D args)
is perfectly legal \u005B and \u005D are the encodings for [ and ].
\b Backspace \u0008
\t Tab \u0009
\n Line feed \u000a
\r Carriage return \u000d
\f Form feed \u000c
\" Double quote \u0022
\' Single quote \u0027
\\ Backslash \u005c
\s Space. Used in text blocks to retain trailing whitespace. \u0020
\newline In text blocks only: Join this line with the next —
Unicode escape sequences are processed before the code is parsed. For example, "\u0022+\u0022" is not a string consisting of a plus sign surrounded by quotation marks (U+0022). Instead, the \u0022 are converted into " before parsing, yielding ""+"", or an empty string.
Even more insidiously, you must beware of \u inside comments. The comment
// \u000A is a newline
yields a syntax error since \u000A is replaced with a newline when the program is read. Similarly, a comment
// look inside c:\users
yields a syntax error because the \u is not followed by four hex digits.
You can have any number of u in a Unicode escape sequence: \u00E9 and \uuu00E9 both denote the character é. There is a reason for this oddity. Consider a programmer happily coding in Unicode who is forced, for some archaic reason, to check in code as ASCII only. A conversion tool can turn any character > U+007F into a Unicode escape and add a u to every existing Unicode escape. That makes the conversion reversible. For example, \uD800 é is turned into \uuD800 \u00E9 and can be converted back to \uD800 é.
Unicode and the char Type
To fully understand the char type, you have to know about the Unicode encoding scheme. Before Unicode, there were many different character encoding standards: ASCII in the United States, ISO 8859-1 for Western European languages, KOI-8 for Russian, GB18030 and BIG-5 for Chinese, and so on. This caused two problems. First, a particular code value corresponds to different letters in the different encoding schemes. Second, the encodings for languages with large character sets have variable length: Some common characters are encoded as single bytes, others require two or more bytes.
Unicode was designed to solve both problems. When the
unification effort started in the 1980s, a fixed 2-byte code was more than sufficient to encode all characters used in all languages in the world, with room to spare for future expansion or so everyone thought at the time. In 1991, Unicode 1.0 was released, using slightly less than half of the available 65,536 code values. Java was designed from
the ground up to use 16-bit Unicode characters, which was a major advance over other programming languages that used 8-bit characters.
Unfortunately, over time, the inevitable happened. Unicode grew beyond 65,536 characters, primarily due to the addition of a very large set of ideographs used for Chinese, Japanese, and Korean. Now, the 16-bit char type is insufficient to describe all Unicode characters.
We need a bit of terminology to explain how this problem is resolved in Java. A code point is an integer value associated with a character in an encoding scheme. In the Unicode standard, code points are written in hexadecimal and prefixed with U+, such as U+0041 for the code point of the Latin letter A. Unicode has code points that are grouped into 17 code planes, each holding 65536 characters. The first code plane, called the basic multilingual plane, consists of the “classic” Unicode characters with code points U+0000 to U+FFFF. Sixteen additional planes, with code points U+10000 to U+10FFFF, hold many more characters called supplementary characters.
How a Unicode code point (that is, an integer ranging from 0 to hexadecimal 10FFF) is represented in bits depends on the character encoding. You could encode each character as a sequence of 21 bits, but that is impractical for computer hardware. The UTF-32 encoding simply places each code point into 32 bits, where the top 11 bits are zero. That is rather wasteful. The most common encoding on the Internet is UTF-8, using between one and four bytes per character.
Java strings use the UTF-16 encoding. It encodes all Unicode code points in a variable-length code of 16-bit units, called code units. The characters in the basic multilingual plane are encoded as a single code unit. All other characters are encoded as consecutive pairs of code units. Each of the code units in such an encoding pair falls into a range of 2048 unused values of the basic multilingual plane, called the surrogates area ('\uD800' to '\uDBFF' for the first code unit, '\uDC00' to '\uDFFF' for the second code unit). This is rather clever, because you can immediately tell whether a code unit encodes a single character or it is the first or second part of a supplementary character. For example, the beer mug emoji 🍺 has code point U+1F37A and is encoded by the two code units '\uD83C' and '\uDF7A'. Each code unit is stored as a char value. The details are not important. All you need to know is that a single Unicode character may require one or two char values.
You cannot ignore characters with code units above U+FFFF. Your customers may well write in a language where these characters are needed, or they may be fond of putting emojis such as 🍺 into their messages.
Nowadays, Unicode has become so complex that even code
points no longer correspond to what a human viewer would perceive as a single character or symbol. This happens with languages whose characters are made from smaller building blocks, with emojis that can have modifiers for gender and skin tone, and with an ever-growing number of other compositions.
Consider the ITALY flag. You perceive a single symbol: the flag of Italy. However, this symbol is composed of two Unicode code points: U+1F1EE (regional indicator symbol letter I) and U+1F1F9 (regional indicator symbol letter T). About 250 flags can be formed with these regional indicators. The pirate flag, on the other hand, is composed of U+1F3F4 (waving black flag), U+200D (zero width joiner), U+2620 (skull and crossbones), and U+FE0F (variation selector-16). In Java, you need four char values to represent the first flag, five for the second.
In summary, a visible character or symbol is encoded as a sequence of some number of char values, and there is almost never a need to look at the individual values. Always work with strings and don't worry about their representation as char sequences.
The boolean Type
The boolean type has two values, false and true. It is used for evaluating logical conditions. You cannot convert between integers and boolean values.
In C++, numbers and even pointers can be used in place of boolean values. The value 0 is equivalent to the bool value false, and a nonzero value is equivalent to true. This is not the case in Java. Thus, Java programmers are shielded from accidents such as
if (x = 0) // oops... meant x == 0
In C++, this test compiles and runs, always evaluating to false. In Java, the test does not compile because the integer expression x = 0 cannot be converted to a boolean value.
Top comments (0)