I need to read a large text file of around 5-6 GB line by line using Java.
How can I do this quickly?
A common pattern is to use
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
}
You can read the data faster if you assume there is no character encoding. e.g. ASCII-7 but it won't make much difference. It is highly likely that what you do with the data will take much longer.
EDIT: A less common pattern to use which avoids the scope of line
leaking.
try(BufferedReader br = new BufferedReader(new FileReader(file))) {
for(String line; (line = br.readLine()) != null; ) {
// process the line.
}
// line is not visible here.
}
UPDATE: In Java 8 you can do
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
}
NOTE: You have to place the Stream in a try-with-resource block to ensure the #close method is called on it, otherwise the underlying file handle is never closed until GC does it much later.
Look at this blog:
Java Read File Line by Line - Java Tutorial
The buffer size may be specified, or the default size may be used. The default is large enough for most purposes.
// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
fstream.close();
DataInputStream
, and the wrong stream is closed. Nothing wrong with the Java Tutorial, and no need to cite arbitrary third-party Internet rubbish like this.
Once Java 8 is out (March 2014) you'll be able to use streams:
try (Stream<String> lines = Files.lines(Paths.get(filename), Charset.defaultCharset())) {
lines.forEachOrdered(line -> process(line));
}
Printing all the lines in the file:
try (Stream<String> lines = Files.lines(file, Charset.defaultCharset())) {
lines.forEachOrdered(System.out::println);
}
StandardCharsets.UTF_8
, use Stream<String>
for conciseness, and avoid using forEach()
and especially forEachOrdered()
unless there's a reason.
forEach(this::process)
, but it gets ugly if you write blocks of code as lambdas inside forEach()
.
forEachOrdered
in order to execute in-order. Be aware that you won't be able to parallelize the stream in that case, although I've found that parallelization doesn't turn on unless the file has thousands of lines.
Here is a sample with full error handling and supporting charset specification for pre-Java 7. With Java 7 you can use try-with-resources syntax, which makes the code cleaner.
If you just want the default charset you can skip the InputStream and use FileReader.
InputStream ins = null; // raw byte-stream
Reader r = null; // cooked reader
BufferedReader br = null; // buffered for readLine()
try {
String s;
ins = new FileInputStream("textfile.txt");
r = new InputStreamReader(ins, "UTF-8"); // leave charset out for default
br = new BufferedReader(r);
while ((s = br.readLine()) != null) {
System.out.println(s);
}
}
catch (Exception e)
{
System.err.println(e.getMessage()); // handle exception
}
finally {
if (br != null) { try { br.close(); } catch(Throwable t) { /* ensure close happens */ } }
if (r != null) { try { r.close(); } catch(Throwable t) { /* ensure close happens */ } }
if (ins != null) { try { ins.close(); } catch(Throwable t) { /* ensure close happens */ } }
}
Here is the Groovy version, with full error handling:
File f = new File("textfile.txt");
f.withReader("UTF-8") { br ->
br.eachLine { line ->
println line;
}
}
ByteArrayInputStream
fed by a string literal have to do with reading a large text file?
I documented and tested 10 different ways to read a file in Java and then ran them against each other by making them read in test files from 1KB to 1GB. Here are the fastest 3 file reading methods for reading a 1GB test file.
Note that when running the performance tests I didn't output anything to the console since that would really slow down the test. I just wanted to test the raw reading speed.
1) java.nio.file.Files.readAllBytes()
Tested in Java 7, 8, 9. This was overall the fastest method. Reading a 1GB file was consistently just under 1 second.
import java.io..File;
import java.io.IOException;
import java.nio.file.Files;
public class ReadFile_Files_ReadAllBytes {
public static void main(String [] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-1GB.txt";
File file = new File(fileName);
byte [] fileBytes = Files.readAllBytes(file.toPath());
char singleChar;
for(byte b : fileBytes) {
singleChar = (char) b;
System.out.print(singleChar);
}
}
}
2) java.nio.file.Files.lines()
This was tested successfully in Java 8 and 9 but it won't work in Java 7 because of the lack of support for lambda expressions. It took about 3.5 seconds to read in a 1GB file which put it in second place as far as reading larger files.
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.stream.Stream;
public class ReadFile_Files_Lines {
public static void main(String[] pArgs) throws IOException {
String fileName = "c:\\temp\\sample-1GB.txt";
File file = new File(fileName);
try (Stream linesStream = Files.lines(file.toPath())) {
linesStream.forEach(line -> {
System.out.println(line);
});
}
}
}
3) BufferedReader
Tested to work in Java 7, 8, 9. This took about 4.5 seconds to read in a 1GB test file.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class ReadFile_BufferedReader_ReadLine {
public static void main(String [] args) throws IOException {
String fileName = "c:\\temp\\sample-1GB.txt";
FileReader fileReader = new FileReader(fileName);
try (BufferedReader bufferedReader = new BufferedReader(fileReader)) {
String line;
while((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
}
}
You can find the complete rankings for all 10 file reading methods here.
System.out.print/println()
here; you are also assuming the file will fit into memory in your first two cases.
In Java 8, you could do:
try (Stream<String> lines = Files.lines (file, StandardCharsets.UTF_8))
{
for (String line : (Iterable<String>) lines::iterator)
{
;
}
}
Some notes: The stream returned by Files.lines
(unlike most streams) needs to be closed. For the reasons mentioned here I avoid using forEach()
. The strange code (Iterable<String>) lines::iterator
casts a Stream to an Iterable.
Iterable
this code is definitively ugly although useful. It needs a cast (ie (Iterable<String>)
) to work.
for(String line : (Iterable<String>) lines.skip(1)::iterator)
Stream
features, using Files.newBufferedReader
instead of Files.lines
and repeatedly calling readLine()
until null
instead of using constructs like (Iterable<String>) lines::iterator
seems to be much simpler…
Unlike readAllLines, [File.lines] does not read all lines into a List, but instead populates lazily as the stream is consumed... The returned stream encapsulates a Reader.
What you can do is scan the entire text using Scanner and go through the text line by line. Of course you should import the following:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public static void readText throws FileNotFoundException {
Scanner scan = new Scanner(new File("samplefilename.txt"));
while(scan.hasNextLine()){
String line = scan.nextLine();
//Here you can manipulate the string the way you want
}
}
Scanner basically scans all the text. The while loop is used to traverse through the entire text.
The .hasNextLine()
function is a boolean that returns true if there are still more lines in the text. The .nextLine()
function gives you an entire line as a String which you can then use the way you want. Try System.out.println(line)
to print the text.
Side Note: .txt is the file type text.
BufferedReader.readLine()
, and he asked for the best-performing method.
FileReader won't let you specify the encoding, use InputStreamReader
instead if you need to specify it:
try {
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath), "Cp1252"));
String line;
while ((line = br.readLine()) != null) {
// process the line.
}
br.close();
} catch (IOException e) {
e.printStackTrace();
}
If you imported this file from Windows, it might have ANSI encoding (Cp1252), so you have to specify the encoding.
In Java 7:
String folderPath = "C:/folderOfMyFile";
Path path = Paths.get(folderPath, "myFileName.csv"); //or any text file eg.: txt, bat, etc
Charset charset = Charset.forName("UTF-8");
try (BufferedReader reader = Files.newBufferedReader(path , charset)) {
while ((line = reader.readLine()) != null ) {
//separate all csv fields into string array
String[] lineVariables = line.split(",");
}
} catch (IOException e) {
System.err.println(e);
}
StandardCharsets.UTF_8
to avoid the checked exception in Charset.forName("UTF-8")
In Java 8, there is also an alternative to using Files.lines()
. If your input source isn't a file but something more abstract like a Reader
or an InputStream
, you can stream the lines via the BufferedReader
s lines()
method.
For example:
try (BufferedReader reader = new BufferedReader(...)) {
reader.lines().forEach(line -> processLine(line));
}
will call processLine()
for each input line read by the BufferedReader
.
For reading a file with Java 8
package com.java.java8;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;
/**
* The Class ReadLargeFile.
*
* @author Ankit Sood Apr 20, 2017
*/
public class ReadLargeFile {
/**
* The main method.
*
* @param args
* the arguments
*/
public static void main(String[] args) {
try {
Stream<String> stream = Files.lines(Paths.get("C:\\Users\\System\\Desktop\\demoData.txt"));
stream.forEach(System.out::println);
}
catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
You can use Scanner class
Scanner sc=new Scanner(file);
sc.nextLine();
Scanner
is fine, but this answer does not include the full code to use it properly.
BufferedReader.readLine()
is certainly several times as fast. If you think otherwise please provide your reasons.
Java 9:
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
}
System.getProperty("os.name").equals("Linux")
==
!
You need to use the readLine()
method in class BufferedReader
. Create a new object from that class and operate this method on him and save it to a string.
The clear way to achieve this,
For example:
If you have dataFile.txt
on your current directory
import java.io.*;
import java.util.Scanner;
import java.io.FileNotFoundException;
public class readByLine
{
public readByLine() throws FileNotFoundException
{
Scanner linReader = new Scanner(new File("dataFile.txt"));
while (linReader.hasNext())
{
String line = linReader.nextLine();
System.out.println(line);
}
linReader.close();
}
public static void main(String args[]) throws FileNotFoundException
{
new readByLine();
}
}
https://i.stack.imgur.com/W3xLx.jpg
BufferedReader br;
FileInputStream fin;
try {
fin = new FileInputStream(fileName);
br = new BufferedReader(new InputStreamReader(fin));
/*Path pathToFile = Paths.get(fileName);
br = Files.newBufferedReader(pathToFile,StandardCharsets.US_ASCII);*/
String line = br.readLine();
while (line != null) {
String[] attributes = line.split(",");
Movie movie = createMovie(attributes);
movies.add(movie);
line = br.readLine();
}
fin.close();
br.close();
} catch (FileNotFoundException e) {
System.out.println("Your Message");
} catch (IOException e) {
System.out.println("Your Message");
}
It works for me. Hope It will help you too.
You can use streams to do it more precisely:
Files.lines(Paths.get("input.txt")).forEach(s -> stringBuffer.append(s);
I usually do the reading routine straightforward:
void readResource(InputStream source) throws IOException {
BufferedReader stream = null;
try {
stream = new BufferedReader(new InputStreamReader(source));
while (true) {
String line = stream.readLine();
if(line == null) {
break;
}
//process line
System.out.println(line)
}
} finally {
closeQuiet(stream);
}
}
static void closeQuiet(Closeable closeable) {
if (closeable != null) {
try {
closeable.close();
} catch (IOException ignore) {
}
}
}
By using the org.apache.commons.io package, it gave more performance, especially in legacy code which uses Java 6 and below.
Java 7 has a better API with fewer exceptions handling and more useful methods:
LineIterator lineIterator = null;
try {
lineIterator = FileUtils.lineIterator(new File("/home/username/m.log"), "windows-1256"); // The second parameter is optionnal
while (lineIterator.hasNext()) {
String currentLine = lineIterator.next();
// Some operation
}
}
finally {
LineIterator.closeQuietly(lineIterator);
}
Maven
<!-- https://mvnrepository.com/artifact/commons-io/commons-io -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
You can use this code:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class ReadTextFile {
public static void main(String[] args) throws IOException {
try {
File f = new File("src/com/data.txt");
BufferedReader b = new BufferedReader(new FileReader(f));
String readLine = "";
System.out.println("Reading file using Buffered Reader");
while ((readLine = b.readLine()) != null) {
System.out.println(readLine);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
You can also use Apache Commons IO:
File file = new File("/home/user/file.txt");
try {
List<String> lines = FileUtils.readLines(file);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
FileUtils.readLines(file)
is a deprecated method. Additionally, the method invokes IOUtils.readLines
, which uses a BufferedReader and ArrayList. This is not a line-by-line method, and certainly not one that would be practical for reading several GB.
You can read file data line by line as below:
String fileLoc = "fileLocationInTheDisk";
List<String> lines = Files.lines(Path.of(fileLoc), StandardCharsets.UTF_8).collect(Collectors.toList());
OP
asked for it to be done quickly, which this also doesn't answer because processing line by line would be much more efficient
Success story sharing
for(String line = br.readLine(); line != null; line = br.readLine())
Btw, in Java 8 you can dotry( Stream<String> lines = Files.lines(...) ){ for( String line : (Iterable<String>) lines::iterator ) { ... } }
Which is hard not to hate.