mapreduce - hadoop - how would input splits form if a file has only one record and the size of file is more than block size? -
example explain question -
i have file of size 500mb (input.csv)
the file contains 1 line (record) in it
so how file stored in hdfs blocks , how input splits computed ?
you have check link: how hadoop process records split across block boundaries? pay attention 'remote read' mentioned.
the single record mentioned in question stored across many blocks. if use textinputformat read, mapper have perform remote-reads across blocks process record.
Comments
Post a Comment