Package com.gpudb.filesystem.common
Class FileOperation
- java.lang.Object
-
- com.gpudb.filesystem.common.FileOperation
-
- Direct Known Subclasses:
FileDownloader,FileUploader
public class FileOperation extends Object
This is an internal class and not meant to be used by the end users of thefilesystemAPI. The consequences of using this class directly in client code is not guaranteed and maybe undesirable.This is the base class from which the classes
FileUploaderandFileDownloaderare derived. The purpose of this class is to model certain basic functions like searching directories for specific file patterns, retrieving file attributes and determining transfer strategies (single vs multi-part).
-
-
Field Summary
Fields Modifier and Type Field Description protected GPUdbdbprotected StringdirNameTarget directory name (KIFS for upload, local for download).protected GPUdbFileHandler.OptionsfileHandlerOptionsprotected List<String>fileNamesThe list of source file names/patterns to be processed.protected FullFileBatchManagerfullFileBatchManagerprotected List<String>fullFileListList of file names that can be downloaded in fullprotected List<String>multiPartListList of file names that are multi part uploads/downloadsprotected List<String>multiPartRemoteFileNamesprotected Set<String>namesOfFilesUploadedThe list of files uploaded, used by theFileIngestorto insert records from the uploaded files into the database.protected OpModeopModeprotected booleanrecursiveIndicates whether the search for files is to be recursive through a local directory hierarchy.
-
Constructor Summary
Constructors Constructor Description FileOperation(GPUdb db, OpMode opMode, List<String> fileNames, String dirName, boolean recursive, GPUdbFileHandler.Options fileHandlerOptions)Constructs a new file operation instance, managing the transfer of a set of files to a target directory.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected voiddecideMultiPart()Resolves file names and categorizes them into single-part or multi-part transfers.protected List<KifsFileInfo>getFileInfoFromServer(String path)Retrieves the file stats for the files residing in KIFS.static StringgetKifsPathSeparator()Deprecated, for removal: This API element is subject to removal in a future version.Set<String>getNamesOfFilesUploaded()This method returns the list of file names uploaded without the path information.static booleanlocalDirExists(String localDirName)Checks if a local directory exists or not.static booleanlocalFileExists(String localFileName)Checks if a local file exists or not.static List<org.apache.commons.lang3.tuple.Triple<String,String,String>>parseFileNames(List<String> fileNamesToParse)Parses the given file paths into structured path components.protected org.apache.commons.lang3.tuple.Pair<List<String>,List<String>>searchLocalDirectories(String baseDir, String pattern)Searches the local filesystem starting from a specified base directory using standard Java NIO glob patterns.protected voidsortFilesIntoFullAndMultipartLists(List<String> fileList, List<String> remoteFileList)Buckets files to be uploaded into full or multi-part groups, based on local file sizes.
-
-
-
Field Detail
-
opMode
protected final OpMode opMode
-
db
protected final GPUdb db
-
recursive
protected final boolean recursive
Indicates whether the search for files is to be recursive through a local directory hierarchy.
-
namesOfFilesUploaded
protected Set<String> namesOfFilesUploaded
The list of files uploaded, used by theFileIngestorto insert records from the uploaded files into the database.
-
dirName
protected String dirName
Target directory name (KIFS for upload, local for download).
-
multiPartList
protected List<String> multiPartList
List of file names that are multi part uploads/downloads
-
fullFileBatchManager
protected FullFileBatchManager fullFileBatchManager
-
fileHandlerOptions
protected final GPUdbFileHandler.Options fileHandlerOptions
-
-
Constructor Detail
-
FileOperation
public FileOperation(GPUdb db, OpMode opMode, List<String> fileNames, String dirName, boolean recursive, GPUdbFileHandler.Options fileHandlerOptions) throws GPUdbException
Constructs a new file operation instance, managing the transfer of a set of files to a target directory.- Parameters:
db- TheGPUdbinstance used to access KiFS.opMode- Indicates whether this is an upload or download operation.fileNames- List of source file names.dirName- Name of the local/remote target directory depending upon whether it is a download/upload operation.recursive- Indicates whether any directories given infileNamesshould be searched for files recursively.fileHandlerOptions- Options for setting up the files for transfer.- Throws:
GPUdbException- propagates exceptions raised from various argument validations.
-
-
Method Detail
-
getNamesOfFilesUploaded
public Set<String> getNamesOfFilesUploaded()
This method returns the list of file names uploaded without the path information. This used by theFileIngestor.ingestFromFiles()method to prepare the list of file names to be passed on to theGPUdb.insertRecordsFromFiles(InsertRecordsFromFilesRequest)endpoint.- Returns:
- A set of file names without the path information.
-
decideMultiPart
protected void decideMultiPart() throws GPUdbExceptionResolves file names and categorizes them into single-part or multi-part transfers.- Throws:
GPUdbException
-
sortFilesIntoFullAndMultipartLists
protected void sortFilesIntoFullAndMultipartLists(List<String> fileList, List<String> remoteFileList)
Buckets files to be uploaded into full or multi-part groups, based on local file sizes.- Parameters:
fileList- A list of source file names to triage by size.remoteFileList- A list of target file names for the given sources.
-
searchLocalDirectories
protected org.apache.commons.lang3.tuple.Pair<List<String>,List<String>> searchLocalDirectories(String baseDir, String pattern) throws IOException
Searches the local filesystem starting from a specified base directory using standard Java NIO glob patterns. This method resolves local file paths and calculates their corresponding target remote paths, preserving the relative directory structure found during the search.This method leverages
FileSystem.getPathMatcher(String)and supports the full standard glob syntax.Supported Glob Patterns
*.java- Matches any file ending with the specific extension.*- Matches any number of characters (e.g.,*.csvmatches all CSV files).**- Matches any number of directories. Used implicitly if the recursive flag is set, but can be used explicitly (e.g.,**\/test/*.xml).?- Matches exactly one character (e.g.,data_?.txtmatchesdata_1.txtbut notdata_10.txt).{sun,moon,stars}- Matches any of the comma-separated subpatterns (e.g.,*.{jpg,png}matches both JPG and PNG files).[A-Z]- Matches any uppercase character (e.g.,grade_[A-F].txt).[0-9]- Matches any digit (e.g.,file[0-9].log).
Remote Path Calculation
For every file found, the method calculates a "Remote Path" to ensure the directory structure is mirrored on the destination (KIFS).Logic: TargetRemoteDir + (FoundFilePath - BaseSearchDir) Example: Base Dir: /data/logs Found File: /data/logs/2023/jan/access.log Target Remote Dir: /kifs/backup Result: /kifs/backup/2023/jan/access.log
- Parameters:
baseDir- The absolute or relative path to the local directory where the search begins.pattern- The glob pattern to match against file names (e.g.,"*.csv","data_2023_*.{json,xml}").- Returns:
- A
Pairwhere:- Left: A list of absolute local file paths found.
- Right: A list of corresponding full remote target paths.
- Throws:
IOException- If an I/O error occurs during the file walk (e.g., permission denied).- See Also:
FileSystem.getPathMatcher(String)
-
parseFileNames
public static List<org.apache.commons.lang3.tuple.Triple<String,String,String>> parseFileNames(List<String> fileNamesToParse)
Parses the given file paths into structured path components. It resolves the file names, normalizes them and returns a corresponding list of absolute paths.- Parameters:
fileNamesToParse- List of file names to parse.- Returns:
- A list of
Tripleobjects where the first element is the root of the file path, the second the full path without the file name and the third just the file name itself.
-
localDirExists
public static boolean localDirExists(String localDirName)
Checks if a local directory exists or not.- Parameters:
localDirName- Name of the local directory to check for.- Returns:
- True if the directory exists, and false if it doesn't exist or
if the
localDirNameis null or empty.
-
localFileExists
public static boolean localFileExists(String localFileName)
Checks if a local file exists or not. If the file name is a wildcard pattern, it skips the check.- Parameters:
localFileName- Name of the file.- Returns:
- True if the file exists, false otherwise.
-
getKifsPathSeparator
@Deprecated(since="7.2.3", forRemoval=true) public static String getKifsPathSeparator()
Deprecated, for removal: This API element is subject to removal in a future version.UseGPUdbFileHandler.KIFS_PATH_SEPARATORdirectly instead.- Returns:
- The separator character used by KiFS between a directory and the files it contains; can also be used in file names to create "virtual" subdirectories.
-
getFileInfoFromServer
protected List<KifsFileInfo> getFileInfoFromServer(String path) throws GPUdbException
Retrieves the file stats for the files residing in KIFS.- Parameters:
path- Name of the KIFS file or directory of files to retrieve info on.- Returns:
- List of
KifsFileInfoobjects for the KiFS file(s) found. - Throws:
GPUdbException- If the KiFS lookup fails.
-
-