hadoop - How to reduce number of mappers, when I am running hive query? -


i using hive ,

i have 24 json files total size of 300mb (in 1 folder), have created 1 external table(i.e table1) , loaded data(i.e 24 files ) external table.

when running select query on top of external table(i.e table1), observed 3 mappers , 1 reducer running.

after have created 1 more external table(i.e table2).

i have compressed input files (folder contains 24 files ).

example : bzip2

so compress data 24 files created extension “.bzip2” (i.e..file1.bzp2,…..file24.bzp2).

after , have load compressed files external table .

now, when running select query , taking 24 mappers , 1 reducer. , observed cpu time taking more time when compared uncompressed data(i.e files) .

how can reduce number of mappers, if data in compressed format(i.e table2 select query )?

how can reduce cpu time , if data in compressed format(i.e table2 select query )? how cpu time affect performance?

the number of mappers can less number of files if files on same data node. if files located on different datanodes, number of mappers never less number of files. concatenate /some files , put them table location. use cat command concatenating non-compressed files. got 24 mappers because have 24 files.parameters mapreduce.input.fileinputformat.split.minsize / maxsize splitting bigger files.


Comments

Popular posts from this blog

gridview - Yii2 DataPorivider $totalSum for a column -

java - Suppress Jboss version details from HTTP error response -

Sass watch command compiles .scss files before full sftp upload -