BlobStore

From GnuCash
Jump to: navigation, search

GnuCash Paperless-ngx Integration

Overview

Integrate GnuCash with Paperless-ngx for document management. PDFs and images attached to transactions and invoices are uploaded to Paperless-ngx; the document ID is stored in KVP. Launches retrieve document list from Paperless, with options to view individual documents or access Paperless document management UI.

Key properties:

  • Upload via Paperless API at document attach time
  • Store Paperless document_id in transaction/invoice KVP
  • Launchable: fetch attached doc list, view doc or open Paperless UI
  • Configurable per-book or global: hostname, API port, authentication
  • No local blob storage; full outsourcing to Paperless

Configuration

Global Preferences (GConf/gsettings)

Path: org.gnucash.general.paperless

paperless_enabled     : boolean = false
paperless_hostname    : string  = "localhost"
paperless_port        : integer = 8000
paperless_api_token   : string  = ""  [encrypted in gconf]
paperless_use_https   : boolean = false

Example:

paperless_hostname = "paperless.example.com"
paperless_port = 8000
paperless_api_token = "abc123def456..."
paperless_use_https = true

Per-Book Override (Optional)

Book KVP at /kvp/paperless-config/:

/kvp/paperless-config/hostname  → "paperless.example.com"
/kvp/paperless-config/port      → "8000"
/kvp/paperless-config/api_token → "abc123def456..."
/kvp/paperless-config/use_https → "true"

Priority: Per-book config overrides global prefs (allows multi-Paperless setups).

Configuration UI

New tab in Edit → Preferences → Paperless:

┌─ Preferences: Paperless Integration ─────────────┐
│                                                    │
│ ☑ Enable Paperless integration                   │
│                                                    │
│ Hostname:      [ localhost            ]          │
│ Port:          [ 8000                 ]          │
│ Use HTTPS:     ☐                                  │
│                                                    │
│ API Token:     [ ••••••••••••••••••   ]          │
│                [ Show ]  [ Generate... ]          │
│                                                    │
│ [ Test Connection ]                              │
│                                                    │
│ ☐ Override per-book (advanced)                   │
│                                                    │
│          [ OK ]  [ Cancel ]  [ Help ]            │
└────────────────────────────────────────────────────┘

Test Connection button: POST to /api/documents/ with empty filter; show "✓ Connected" or error message.

---

API Layer

PaperlessClient Class

C++ wrapper around Paperless REST API.

class PaperlessClient {
public:
    // Lifecycle
    BlobStore(QofBook* book);
    ~BlobStore();
    
    // Connection & auth
    gboolean test_connection(char** out_error);
    // -> GET /api/documents/ with token in Authorization header
    //    Returns TRUE if 200 OK, FALSE + error message otherwise
    
    // Upload a document
    gint upload_document(const char* filepath, const char* title,
                         const char* filename, char** out_error);
    // -> POST /api/documents/upload/ with file in multipart/form-data
    //    Optional: query params ?title=... &filename=...
    //    Returns document_id (int) on success, -1 on error + out_error
    //    Blocks until upload completes (may be slow for large PDFs)
    
    // Retrieve document info
    gboolean get_document_info(gint doc_id, 
                               char** out_title, char** out_filename,
                               char** out_error);
    // -> GET /api/documents/{id}/
    //    Returns title and original_filename for UI display
    
    // Build URLs for user
    char* get_document_url(gint doc_id);
    // -> "https://paperless.example.com:8000/documents/{id}/"
    
    char* get_management_url();
    // -> "https://paperless.example.com:8000/documents/"
    
    // List documents (optional, for future search/filter UI)
    GList* list_documents(const char* query, char** out_error);
    // -> GET /api/documents/?query=...
    //    Returns list of (doc_id, title, filename) tuples
};

Implementation notes:

  • Use libcurl for HTTP requests
  • Parse JSON responses with json-glib or similar
  • Store API token in memory (gconf stores encrypted); clear on shutdown
  • Synchronous calls (blocks main thread); wrap in timeout dialog if concern

---

Storage Model

Transaction-Level Attachment

Paperless document ID stored in transaction KVP:

/kvp/attachments/<index>   → "doc_id:12345"

Or (simpler):

/kvp/paperless_docs       → ["12345", "12346", ...]  (JSON array of doc IDs)

Design decision: Store only the document_id integer in KVP. Title, filename, and URL are fetched from Paperless on demand (live data; user can rename in Paperless and GnuCash reflects it).

Invoice-Level Attachment

Same pattern for invoices (if GnuCash models invoices as objects with KVP):

/kvp/paperless_docs       → ["12345"]

Or per-line-item:

/kvp/paperless_line_items/<item-id>/doc_id  → "12346"

---

Attachment Workflow

User Attaches PDF to Transaction

  1. User clicks Attach Document in transaction editor
  2. File picker dialog opens
  3. User selects invoice_2024_q1.pdf
  4. Dialog: optional title override for Paperless (default: filename)
  Title in Paperless: [ invoice_2024_q1.pdf ]
  
  1. User clicks "Attach"
  2. Blocking upload:
 * Progress dialog: "Uploading to Paperless..." with cancel button
 * Call paperless_client->upload_document()
 * Paperless returns doc_id = 12345
 * Store in transaction KVP: /kvp/paperless_docs["12345"]
 * Close dialog
  1. UI shows attachment icon next to transaction (same as before)

User Views/Launches Attachment

  1. User clicks attachment icon (📎) in transaction view
  2. Blob Viewer Dialog opens:
  ┌─ Transaction Attachments ──────────────────┐
  │ TX: 2024-01-15 Invoice from Acme Corp      │
  │                                             │
  │ Attached Documents:                        │
  │ ─────────────────────────────────────────  │
  │ ☑ [12345] invoice_2024_q1.pdf              │
  │      [View]  [Manage in Paperless]         │
  │                                             │
  │ ☑ [12346] receipt_support.jpg              │
  │      [View]  [Manage in Paperless]         │
  │                                             │
  │ [ + Attach New ]  [ - Remove ]  [Close]   │
  └─────────────────────────────────────────────┘
  
  1. User clicks [View] → opens Paperless download URL in browser/PDF viewer
  https://paperless.example.com:8000/api/documents/12345/download/
  
  1. User clicks [Manage in Paperless] → opens document edit UI in browser
  https://paperless.example.com:8000/documents/12345/
  
  1. User clicks [+ Attach New] → repeats attach workflow above
  1. User clicks [- Remove] for a doc → detaches from transaction (marks doc stale locally, does NOT delete from Paperless)

---

KVP Structure

Minimal Design (Recommended)

Transaction KVP:

/kvp/paperless_docs    → JSON array: "[12345, 12346, ...]"

Rationale:

  • Simple, flat structure
  • No need to track per-doc metadata (fetch from Paperless)
  • No orphan cleanup logic (docs live in Paperless independently)

Alternative: Verbose Design

/kvp/paperless/<doc_id>/uploaded_at    → "2024-01-15T10:30:00Z"
/kvp/paperless/<doc_id>/local_title    → "Original filename" (optional)

(More complex; unlikely needed for MVP.)


Detachment & Lifecycle

User Detaches Document

  1. User clicks [- Remove] in Blob Viewer dialog
  2. Local KVP entry removed: /kvp/paperless_docs updates to remove doc_id
  3. Document stays in Paperless (user must delete manually there if desired)
  4. Transaction saved to disk

Rationale: Paperless is the source of truth. GnuCash only tracks which Paperless docs are relevant to a transaction. User can manage doc lifecycle in Paperless separately (e.g., re-use a doc across multiple transactions).

Book Close

  1. No special cleanup needed
  2. KVP with doc IDs persists

Paperless Outage/Unavailability

  1. At attach time: if Paperless unreachable, error dialog; user retries or cancels
  2. At view time: if Paperless unreachable, error dialog; doc URL shown but not accessible
  3. KVP reference remains; will work again when Paperless comes back up

UI Components

Transaction Editor

Add Attachments section below date/description/amount:

┌─ Transaction Editor ────────────────────┐
│ Date:    [ 2024-01-15 ]                 │
│ Account: [ Assets:Bank ]                │
│ Memo:    [ Invoice from Acme ]          │
│ Amount:  [ 1000.00 ]                    │
│                                          │
│ Attachments:                             │
│ 📎 [1] invoice_2024_q1.pdf [✕]          │
│    [ + Add Document ]                   │
│                                          │
│ [ OK ]  [ Cancel ]                      │
└──────────────────────────────────────────┘

Clicking on the PDF → opens Blob Viewer dialog. Clicking [✕] → removes from KVP. Clicking [+ Add Document] → file picker + upload.

Blob Viewer Dialog

Standalone modal (reusable for transactions, invoices, splits):

┌─ Attachments ──────────────────────────────┐
│ Parent: TX 2024-01-15 Acme Invoice        │
│                                             │
│ Documents:                                 │
│ ┌─────────────────────────────────────────┐
│ │ [12345] invoice_2024_q1.pdf             │
│ │ ┌─────────────────────────────────────┐ │
│ │ │ [View] [Open in Paperless] [Remove] │ │
│ │ └─────────────────────────────────────┘ │
│ │                                         │
│ │ [12346] receipt_support.jpg             │
│ │ ┌─────────────────────────────────────┐ │
│ │ │ [View] [Open in Paperless] [Remove] │ │
│ │ └─────────────────────────────────────┘ │
│ └─────────────────────────────────────────┘
│                                             │
│ [ + Attach New ]  [Close]                 │
└──────────────────────────────────────────────┘

[View]PaperlessClient::get_document_url() → open in browser/viewer [Open in Paperless]PaperlessClient::get_management_url() → doc edit page [Remove] → delete from KVP array [+ Attach New] → file picker + upload workflow

Preferences Dialog

(See Configuration section above.)


Error Handling

Upload Failure

┌─ Upload Error ────────────────────┐
│ Failed to upload to Paperless:    │
│                                    │
│ [Connection refused]               │
│ (Is Paperless running?)            │
│                                    │
│ Hostname: paperless.example.com   │
│ Port:     8000                     │
│                                    │
│ [ Retry ]  [ Cancel ]              │
└────────────────────────────────────┘

Common errors:

  • Connection refused → Paperless not running
  • 401 Unauthorized → invalid API token
  • 413 Payload Too Large → file too big
  • Timeout → slow network / large file

Missing Configuration

  1. User clicks "Attach Document" but Paperless is disabled
  2. Dialog: "Paperless integration not enabled. Configure in Preferences."
  3. Offer quick link to Preferences

Stale Document References

  1. User opens transaction with doc_id that no longer exists in Paperless
  2. Blob Viewer shows: "[12345] (document not found)"
  3. [View] and [Open in Paperless] buttons disabled
  4. User can still [Remove] the KVP entry locally

---

Implementation Roadmap

Phase 1: Core

  1. PaperlessClient: `test_connection()`, `upload_document()`, `get_document_url()`
  2. Preferences UI + gconf storage
  3. Transaction editor: [+ Attach] button + file picker
  4. Minimal Blob Viewer: list docs with [View] and [Remove] buttons
  5. KVP: `/kvp/paperless_docs` → JSON array

Phase 2: Polish

  1. [Open in Paperless] button (management UI link)
  2. Invoice attachment support
  3. Upload progress dialog with cancel
  4. Error messages + retry logic
  5. Paperless doc title caching in KVP (optional metadata)

Phase 3: Future

  1. Search/filter attached docs by title
  2. Drag-and-drop attachment to transactions
  3. Paperless tag integration (tag tx based on Paperless tags)
  4. Bulk upload from file picker
  5. Scheduled sync: periodic check for orphaned docs

---

Paperless API Reference (Minimal) =

Test Connection

``` GET /api/documents/?page_size=1

Headers:

 Authorization: Token <api_token>
 

Response (200 OK): {

 "count": 42,
 "results": [...]

} ```

Upload Document

``` POST /api/documents/upload/

Headers:

 Authorization: Token <api_token>
 

Body (multipart/form-data):

 document=<binary file>
 title=<optional string>
 filename=<original filename>
 

Response (200 OK): {

 "id": 12345,
 "title": "invoice_2024_q1",
 "original_file_name": "invoice_2024_q1.pdf",
 ...

} ```

Get Document Info

``` GET /api/documents/{id}/

Headers:

 Authorization: Token <api_token>
 

Response (200 OK): {

 "id": 12345,
 "title": "invoice_2024_q1",
 "original_file_name": "invoice_2024_q1.pdf",
 "created": "2024-01-15T10:30:00Z",
 "updated": "2024-01-15T10:30:00Z",
 ...

} ```

Download Document

``` GET /api/documents/{id}/download/

Response (200 OK):

 <binary PDF/image content>
 

Or redirect to:

 /documents/{id}/ (web UI)

```

Document Management UI

``` https://<hostname>:<port>/documents/{id}/ ```

Allows user to edit title, tags, archive, delete, etc.

---

Configuration File Format (Optional) =

If per-book config stored in KVP becomes unwieldy, alternative: plaintext config file.

    • File:** `<book-path>.gnucash.paperless`

```ini [paperless] hostname = paperless.example.com port = 8000 use_https = true api_token = abc123def456... ```

    • Loader:**

```cpp gboolean load_paperless_config(const char* book_path, PaperlessConfig* cfg); // Tries <book-path>.paperless first; falls back to gconf global ```

(Avoids cluttering book KVP but requires file management; recommend KVP for simplicity.)

---

Security Considerations

API Token Storage

  • **In memory:** Keep decrypted token only in PaperlessClient; never log
  • **In GConf:** Store encrypted (use gnome-keyring if available)
  • **In KVP:** Never store; reference global config only
  • **Cleanup:** Clear token on app exit (destructor)

HTTPS

  • Default: `use_https = false` (local Paperless on `localhost:8000`)
  • Production: enable `use_https = true`
  • Validate SSL cert (libcurl default)

API Token Scoping

  • Paperless token is global (no per-transaction auth)
  • Assume trusted GnuCash environment (user has API token access)
  • No per-user/per-doc ACLs (GnuCash talks to Paperless as single identity)

Filename Validation

  • Trust Paperless-returned filenames (already sanitized by Paperless)
  • No symlink or path traversal risk

---

Testing Strategy

Unit Tests

  • Mock PaperlessClient: stub upload, return fake doc IDs
  • KVP serialization: attach 2 docs, save/reload, verify IDs persist
  • Error cases: 401, 500, timeout → verify error dialogs

Integration Tests

  • Spin up real Paperless (Docker) for test suite
  • Upload file → verify doc appears in Paperless UI
  • Detach → verify KVP updated, doc stays in Paperless
  • View → verify browser opens correct URL

Manual Testing

  • Configure against live Paperless instance
  • Attach PDF to transaction → verify upload succeeds
  • Open Preferences → click [Test Connection] → verify ✓
  • Disable Paperless → attach button disabled
  • Restart GnuCash → re-open transaction → docs still listed

---

Open Questions

  1. **Large file handling:** Progress bar for slow uploads? Recommend max file size?
  2. **Batch upload:** Support drag-and-drop multiple files at once?
  3. **Paperless search:** Integrate Paperless search into GnuCash doc picker?
  4. **Tagging:** Sync Paperless tags ↔ transaction memo or custom KVP field?
  5. **Archiving:** When user archives doc in Paperless, should GnuCash warn?
  6. **Offline mode:** Graceful degradation if Paperless unavailable?

---

Migration (If Existing Blobs) =

If GnuCash already has local blob storage, migration strategy:

  1. Read old blob storage: `/kvp/attachments/` with SHA256 refs
  2. For each blob:
 * Read file from disk
 * Upload to Paperless → get doc_id
 * Replace KVP entry: SHA256 ref → Paperless doc_id
  1. Clean up local blobs directory

CLI tool: `gnucash --migrate-blobs-to-paperless /path/to/book.gnucash`

---

Appendix: Sample KVP After Attach =

Transaction KVP after attaching two Paperless docs:

``` /kvp/paperless_docs = "[12345, 12346]" /kvp/notes = "Invoice from vendor" ```

Or verbose:

``` /kvp/paperless_docs/12345 = "{}" (empty; metadata on-demand from Paperless) /kvp/paperless_docs/12346 = "{}" ```

When GnuCash loads the transaction:

  • Fetch `/kvp/paperless_docs`
  • Parse JSON array: `[12345, 12346]`
  • On UI render: call `paperless_client->get_document_info(12345, ...)` → title, filename
  • Display in transaction view with attachment icon

---

Conclusion

    • Advantages over local blob storage:**
  • ✅ No local filesystem management
  • ✅ Deduplication handled by Paperless (across books)
  • ✅ Full-text search in Paperless
  • ✅ Tagging, archiving, deletion in Paperless UI
  • ✅ Backup: Paperless is separate backup target
  • ✅ Scalability: Paperless handles large collections
    • Disadvantages:**
  • ❌ Requires Paperless running + network access
  • ❌ Lost access if Paperless down or deleted docs
  • ❌ Latency on upload (especially large files)
    • Best for:** Professional workflows with Paperless already in use; document-heavy orgs; multi-computer setup where Paperless is centralized.