Archiver
ArchiverError
Bases: Exception
Base exception for archiver operations.
This is the parent class for all archiver-related exceptions. Concrete implementations should raise more specific exceptions that inherit from this base class.
Examples:
>>> try:
... archive.read_file("nonexistent.txt")
... except ArchiverError as e:
... print(f"Archive operation failed: {e}")
ArchiverReadError
Bases: ArchiverError
Raised when reading from archive fails.
This exception is raised when:
- A file doesn't exist in the archive
- The archive is corrupted or unreadable
- Permission issues prevent reading
- The archive format is unsupported
Examples:
>>> try:
... content = archive.read_file("missing.txt")
... except ArchiverReadError as e:
... print(f"Failed to read file: {e}")
ArchiverWriteError
Bases: ArchiverError
Raised when writing to archive fails.
This exception is raised when:
- The archive is read-only
- Disk space is insufficient
- Permission issues prevent writing
- The archive format doesn't support the operation
Examples:
>>> try:
... archive.write_file("new.txt", "content")
... except ArchiverWriteError as e:
... print(f"Failed to write file: {e}")
Archiver(path: Path)
Bases: ABC
Abstract base class for archive operations.
Provides a common interface for reading, writing, and managing files within different archive formats. This class defines the contract that all concrete archive implementations must follow.
The class is designed to work with various archive formats including but not limited to ZIP, RAR, TAR, etc. Each concrete implementation should handle the specifics of its archive format while maintaining the same public interface.
Features
- Context manager support for automatic resource cleanup
- Unified error handling with specific exception types
- Support for both text and binary data
- Batch operations for multiple files
- Archive-to-archive copying functionality
- Built-in file existence checking
Thread Safety
The base class does not provide thread safety guarantees. Concrete implementations should document their thread safety characteristics and implement appropriate locking if needed.
Performance Considerations
- File operations are performed individually; batch operations may be more efficient for large numbers of files
- The get_filename_list() method may be expensive for large archives
- Consider caching file lists in concrete implementations
| ATTRIBUTE | DESCRIPTION |
|---|---|
IMAGE_EXT_RE |
Compiled regex for matching image file extensions. Matches: .jpg, .jpeg, .png, .webp, .gif (case-insensitive)
|
Examples:
Implementing a concrete archiver:
>>> class MyArchiver(Archiver):
... def read_file(self, archive_file: str) -> bytes:
... # Implementation specific to your archive format
... pass
...
... def write_file(self, archive_file: str, data: str | bytes) -> bool:
... # Implementation specific to your archive format
... pass
...
... # ... implement other abstract methods
Initialize an Archiver with the specified path.
Creates a new archiver instance that will operate on the specified archive file. The path validation is performed during initialization, but the actual archive operations are deferred until method calls.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to the archive file. Can be an existing file or a path where a new archive will be created. The path should include the appropriate file extension for the archive format.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the archive file doesn't exist and the archiver is expected to perform read operations on an existing archive. This is determined by the is_write_operation_expected() method. |
Note
The constructor does not immediately open or create the archive. The actual file operations are performed when methods are called. This allows for lazy initialization and better error handling.
Examples:
>>> from pathlib import Path
>>>
>>> # For existing archives
>>> archiver = MyArchiver(Path("existing.cbz"))
>>>
>>> # For new archives to be created
>>> archiver = MyArchiver(Path("new_archive.cbz"))
Attributes
path: Path
property
Get the path associated with this archiver.
| RETURNS | DESCRIPTION |
|---|---|
Path
|
The Path object representing the archive file location. |
Examples:
>>> archiver = MyArchiver(Path("example.cbz"))
>>> print(archiver.path) # Output: example.cbz
Functions
__enter__()
Context manager entry.
Enables the archiver to be used in a 'with' statement for automatic resource management. The archiver will be properly initialized and any necessary resources will be acquired.
Examples:
>>> with MyArchiver(Path("archive.cbz")) as archive:
... content = archive.read_file("file.txt")
... # Archive is automatically closed when exiting the block
__exit__(exc_type: object, exc_val: object, exc_tb: object) -> None
Context manager exit.
Performs cleanup when exiting a 'with' statement. This method should be overridden by concrete implementations to release any resources such as file handles, network connections, or temporary files.
| PARAMETER | DESCRIPTION |
|---|---|
exc_type
|
The exception type if an exception was raised, None otherwise.
TYPE:
|
exc_val
|
The exception value if an exception was raised, None otherwise.
TYPE:
|
exc_tb
|
The exception traceback if an exception was raised, None otherwise.
TYPE:
|
Note
This base implementation does nothing. Concrete implementations should override this method to perform proper cleanup.
copy_from_archive(other_archive: Archiver) -> bool
abstractmethod
Copy files from another archive to this archive.
Copies all files from the source archive to this archive. This is useful for converting between archive formats, merging archives, or creating backups.
| PARAMETER | DESCRIPTION |
|---|---|
other_archive
|
Source archive to copy files from. Must be a valid Archiver instance that can read files.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if all files were successfully copied, False if any copy operation failed. The operation may be partially successful. |
Examples:
>>> # Copy from ZIP to 7Z
>>> with ZipArchiver(Path("source.cbz")) as source:
... with SevenZipArchiver(Path("destination.7z")) as dest:
... success = dest.copy_from_archive(source)
>>>
>>> # Merge two archives
>>> with ZipArchiver(Path("archive1.cbz")) as arch1:
... with ZipArchiver(Path("archive2.cbz")) as arch2:
... success = arch2.copy_from_archive(arch1)
Note
- Files with the same name will be overwritten in the destination
- The source archive must be readable and the destination writable
- Large archives may take significant time to copy
- Consider implementing progress callbacks in concrete classes
exists(archive_file: str) -> bool
Check if a file exists in the archive.
Determines whether a specific file exists within the archive without actually reading its contents. This is useful for conditional operations and avoiding exceptions when checking for file presence.
| PARAMETER | DESCRIPTION |
|---|---|
archive_file
|
Path of the file to check within the archive. Should use forward slashes as path separators.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the file exists in the archive, False otherwise. |
Examples:
>>> # Check before reading
>>> if archive.exists("config.txt"):
... content = archive.read_file("config.txt")
... else:
... print("Config file not found")
>>>
>>> # Conditional writing
>>> if not archive.exists("backup.txt"):
... archive.write_file("backup.txt", "backup data")
Performance Note
This method calls get_filename_list() internally, which may be expensive for large archives. Consider caching the file list if you need to check existence of multiple files.
get_filename_list() -> list[str]
abstractmethod
Get a list of all files in the archive.
Returns a complete list of all files contained in the archive. The returned paths use forward slashes as separators and are relative to the archive root.
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
List of file paths within the archive. The list is sorted alphabetically by most implementations. Returns an empty list if the archive is empty or doesn't exist. |
Examples:
>>> files = archive.get_filename_list()
>>> print(files)
>>> # Output: ['config.txt', 'data/users.json', 'images/logo.png']
>>>
>>> # Check if archive is empty
>>> if not files:
... print("Archive is empty")
>>>
>>> # Filter for specific file types
>>> text_files = [f for f in files if f.endswith('.txt')]
>>> image_files = [f for f in files if archive.IMAGE_EXT_RE.search(f)]
Performance Note
This method may be expensive for large archives as it typically requires reading the archive's central directory. Consider caching the result if you need to call this method multiple times.
is_write_operation_expected() -> bool
Check if this archiver is expected to be used for write operations.
This method helps determine whether the archiver will be used primarily for writing (creating new archives) or reading (accessing existing archives). It's used during path validation to determine if a missing file should trigger a warning.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if write operations are expected (default), False if the archiver is read-only or primarily intended for reading existing archives. |
Note
Override this method in read-only implementations to return False. This will change the validation behavior to expect existing files.
Examples:
>>> class ReadOnlyArchiver(Archiver):
... def is_write_operation_expected(self) -> bool:
... return False # This archiver only reads existing archives
read_file(archive_file: str) -> bytes
abstractmethod
Read the contents of a file from the archive.
Extracts and returns the complete contents of the specified file from the archive. The file path should use forward slashes as separators regardless of the operating system.
| PARAMETER | DESCRIPTION |
|---|---|
archive_file
|
Path of the file within the archive. Should use forward slashes as path separators (e.g., "folder/file.txt"). The path is relative to the archive root.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bytes
|
The complete file contents as bytes. For text files, you'll need to decode the bytes using the appropriate encoding. |
| RAISES | DESCRIPTION |
|---|---|
ArchiverReadError
|
If the file cannot be read. This includes cases where the file doesn't exist, the archive is corrupted, or there are permission issues. |
Examples:
>>> # Reading a text file
>>> content = archive.read_file("config.txt")
>>> config_text = content.decode('utf-8')
>>>
>>> # Reading a binary file
>>> image_data = archive.read_file("images/photo.jpg")
>>> with open("extracted_photo.jpg", "wb") as f:
... f.write(image_data)
Note
The entire file is loaded into memory. For very large files, consider implementing streaming read methods in concrete classes.
remove_files(filename_list: list[str]) -> bool
abstractmethod
Remove multiple files from the archive.
Batch operation to remove multiple files from the archive in a single call.
| PARAMETER | DESCRIPTION |
|---|---|
filename_list
|
List of file paths to remove from the archive. Each path should use forward slashes as separators.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if ALL files were successfully removed (or didn't exist), False if ANY file removal failed. The operation may be partially successful - some files may be removed even if the method returns False. |
Examples:
>>> # Remove multiple files at once
>>> files_to_remove = ["old1.txt", "old2.txt", "temp/cache.dat"]
>>> success = archive.remove_files(files_to_remove)
>>> if success:
... print("All files removed successfully")
>>> else:
... print("Some files may not have been removed")
Note
The atomic nature of this operation depends on the archive format and implementation. Some formats may support atomic batch operations while others may process files individually.
test() -> bool
abstractmethod
Test whether the archive is valid.
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the file is a valid archive, False otherwise.
TYPE:
|
write_file(archive_file: str, data: str | bytes) -> bool
abstractmethod
Write data to a file in the archive.
Creates or overwrites a file in the archive with the provided data. If the file already exists, it will be replaced. Directory structure within the archive is created automatically as needed.
| PARAMETER | DESCRIPTION |
|---|---|
archive_file
|
Path of the file within the archive. Should use forward slashes as path separators (e.g., "folder/file.txt"). The path is relative to the archive root.
TYPE:
|
data
|
Data to write to the file. Can be either a string (which will be encoded as UTF-8) or bytes for binary data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if the write operation was successful, False otherwise. Note that returning False doesn't necessarily mean an error occurred - check the logs for detailed error information. |
| RAISES | DESCRIPTION |
|---|---|
ArchiverWriteError
|
If the write operation fails due to serious errors such as disk full, permission denied, or archive format limitations. |
Examples:
>>> # Writing text content
>>> success = archive.write_file("config.txt", "setting=value")
>>>
>>> # Writing binary content
>>> with open("image.jpg", "rb") as f:
... image_data = f.read()
>>> success = archive.write_file("images/photo.jpg", image_data)
>>>
>>> # Writing JSON data
>>> import json
>>> data = {"name": "example", "version": "1.0"}
>>> success = archive.write_file("data.json", json.dumps(data))
Note
String data is automatically encoded as UTF-8 bytes. For other encodings, encode the string manually before passing it to this method.