hadoop - Java: Downloading .Zip files from an FTP and extracting the contents without saving the files on local system -
i have requirement in need download .zip files ftp server, , push contents of archive(contents xml files) hdfs(hadoop distributed file system). hence of now, i'm using acpache ftpclient connect ftp server , downloading files local machine first. later unzipping same , giving out folder path method iterate local folder , push files hdfs. easy understanding, i'm attaching code snippets below.
//gives me active ftpclient ftpclient ftpcilent = getactiveftpconnection(); ftpcilent.changeworkingdirectory(remotedirectory); ftpfile[] ftpfiles = ftpcilent.listfiles(); if(ftpfiles.length <= 0){ logger.info("unable find files in given location!!"); return; } //iterate files for(ftpfile eachftpfile : ftpfiles){ string ftpfilename = eachftpfile.getname(); //skips files if not .zip files if(!ftpfilename.endswith(".zip")){ continue; } system.out.println("reading file -->" + ftpfilename); /* * location path on local system given user * loaded property file. * * create archivelocation archived files * downloaded ftp. */ string archivefilelocation = location + file.separator + ftpfilename; string localdirname = ftpfilename.replaceall(".zip", ""); /* * localdirlocation location folder created * name of archive in ftp , files copied * respective folders. * */ string localdirlocation = location + file.separator + localdirname; file localdir = new file(localdirlocation); localdir.mkdir(); file archivefile = new file(archivefilelocation); fileoutputstream archivefileoutputstream = new fileoutputstream(archivefile); ftpcilent.retrievefile(ftpfilename, archivefileoutputstream); archivefileoutputstream.close(); //delete archive file after coping it's contents fileutils.forcedeleteonexit(archivefile); //read archive file archivefilelocation. zipfile zip = new zipfile(archivefilelocation); enumeration entries = zip.entries(); while(entries.hasmoreelements()){ zipentry entry = (zipentry)entries.nextelement(); if(entry.isdirectory()){ logger.info("extracting directory " + entry.getname()); (new file(entry.getname())).mkdir(); continue; } logger.info("extracting file: " + entry.getname()); ioutils.copy(zip.getinputstream(entry), new fileoutputstream( localdir.getabsolutepath() + file.separator + entry.getname())); } zip.close(); /* * iterates folder location provided , load files hdfs */ loadfilestohdfs(localdirlocation); } disconnectftp();
now, problem approach is, app taking lot of time download files local path, unzip , load them hdfs. there better way in can extract contents of zip ftp on fly , give stream of contents directly method loadfilestohdfs()
rather path local system?
use zip stream. see here: http://www.oracle.com/technetwork/articles/java/compress-1565076.html
specifically see code sample 1 there.
Comments
Post a Comment