hadoop - Java: Downloading .Zip files from an FTP and extracting the contents without saving the files on local system -


i have requirement in need download .zip files ftp server, , push contents of archive(contents xml files) hdfs(hadoop distributed file system). hence of now, i'm using acpache ftpclient connect ftp server , downloading files local machine first. later unzipping same , giving out folder path method iterate local folder , push files hdfs. easy understanding, i'm attaching code snippets below.

 //gives me active ftpclient     ftpclient ftpcilent = getactiveftpconnection();     ftpcilent.changeworkingdirectory(remotedirectory);      ftpfile[] ftpfiles = ftpcilent.listfiles();     if(ftpfiles.length <= 0){     logger.info("unable find files in given location!!");     return;     }     //iterate files     for(ftpfile eachftpfile : ftpfiles){         string ftpfilename = eachftpfile.getname();          //skips files if not .zip files         if(!ftpfilename.endswith(".zip")){            continue;         }      system.out.println("reading file -->" + ftpfilename);     /*      * location path on local system given user      * loaded property file.      *      * create archivelocation archived files      * downloaded ftp.      */     string archivefilelocation = location + file.separator + ftpfilename;     string localdirname = ftpfilename.replaceall(".zip", "");     /*      * localdirlocation location folder created      * name of archive in ftp , files copied      * respective folders.      *      */     string localdirlocation = location + file.separator + localdirname;     file localdir = new file(localdirlocation);     localdir.mkdir();      file archivefile = new file(archivefilelocation);      fileoutputstream archivefileoutputstream = new fileoutputstream(archivefile);      ftpcilent.retrievefile(ftpfilename, archivefileoutputstream);     archivefileoutputstream.close();      //delete archive file after coping it's contents     fileutils.forcedeleteonexit(archivefile);      //read archive file archivefilelocation.            zipfile zip = new zipfile(archivefilelocation);     enumeration entries = zip.entries();      while(entries.hasmoreelements()){     zipentry entry = (zipentry)entries.nextelement();      if(entry.isdirectory()){         logger.info("extracting directory " + entry.getname());         (new file(entry.getname())).mkdir();         continue;     }      logger.info("extracting file: " + entry.getname());     ioutils.copy(zip.getinputstream(entry), new fileoutputstream(     localdir.getabsolutepath() + file.separator + entry.getname()));     }      zip.close();    /*     * iterates folder location provided , load files hdfs     */         loadfilestohdfs(localdirlocation);     }     disconnectftp(); 

now, problem approach is, app taking lot of time download files local path, unzip , load them hdfs. there better way in can extract contents of zip ftp on fly , give stream of contents directly method loadfilestohdfs() rather path local system?

use zip stream. see here: http://www.oracle.com/technetwork/articles/java/compress-1565076.html

specifically see code sample 1 there.


Comments

Popular posts from this blog

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -

Sass watch command compiles .scss files before full sftp upload -