mapreduce - hadoop - how would input splits form if a file has only one record and the size of file is more than block size? -


example explain question -

i have file of size 500mb (input.csv)

the file contains 1 line (record) in it

so how file stored in hdfs blocks , how input splits computed ?

you have check link: how hadoop process records split across block boundaries? pay attention 'remote read' mentioned.

the single record mentioned in question stored across many blocks. if use textinputformat read, mapper have perform remote-reads across blocks process record.


Comments

Popular posts from this blog

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -

Sass watch command compiles .scss files before full sftp upload -