Files
LexiChain/docs/03-database-schema.md
2026-05-13 21:08:27 +01:00

231 lines
7.3 KiB
Markdown

# 03 — Database Schema
**ORM:** Prisma
**Database:** PostgreSQL
**Schema file:** `prisma/schema.prisma`
---
## Entity Relationship Diagram
```
User
├──< Contract (one user → many contracts)
│ ├──< ContractRagChunk (one contract → many RAG chunks)
│ ├──< Notification (one contract → many notifications)
│ └──< BlockchainTransaction (one contract → many blockchain txs)
├──< Notification (one user → many notifications)
└──< BlockchainTransaction (one user → many blockchain txs)
```
---
## Models
### `User`
Stores platform users, synchronized from Clerk via webhook.
| Column | Type | Notes |
|--------|------|-------|
| `id` | `String` (cuid) | Primary key |
| `clerkId` | `String` | Unique Clerk user ID |
| `email` | `String` | Unique email address |
| `firstName` | `String?` | Optional first name |
| `lastName` | `String?` | Optional last name |
| `imageUrl` | `String?` | Clerk profile image URL |
| `createdAt` | `DateTime` | Record creation timestamp |
| `updatedAt` | `DateTime` | Auto-updated on changes |
**Indexes:** `clerkId`, `email`
**Sync:** Created/updated by `app/api/webhooks/clerk/` in response to Clerk `user.created` and `user.updated` events.
---
### `Contract`
The core entity — one record per uploaded document.
**File Information (from upload):**
| Column | Type | Notes |
|--------|------|-------|
| `id` | `String` (cuid) | Primary key |
| `userId` | `String` | FK → User.id (cascade delete) |
| `fileName` | `String` | Original file name |
| `fileUrl` | `String` | UploadThing CDN URL |
| `fileSize` | `Int` | File size in bytes |
| `mimeType` | `String` | `application/pdf` or `image/*` |
**AI-Extracted Fields (filled after analysis):**
| Column | Type | Notes |
|--------|------|-------|
| `title` | `String?` | Contract title |
| `type` | `ContractType?` | Enum (see below) |
| `provider` | `String?` | Insurance company / bank name |
| `policyNumber` | `String?` | Policy or contract reference number |
| `startDate` | `DateTime?` | Contract start date |
| `endDate` | `DateTime?` | Contract end / expiry date |
| `premium` | `Decimal(10,2)?` | Annual/monthly premium amount |
| `extractedText` | `String?` (Text) | Full text extracted from document |
| `summary` | `String?` (Text) | 4-6 sentence AI summary |
| `keyPoints` | `Json?` | Guarantees, exclusions, explainability |
**Processing:**
| Column | Type | Notes |
|--------|------|-------|
| `status` | `ContractStatus` | Default: `UPLOADED` |
**Blockchain Proof:**
| Column | Type | Notes |
|--------|------|-------|
| `documentHash` | `String?` | SHA-256 hash of the file (0x-prefixed) |
| `txHash` | `String?` | Ethereum transaction hash |
| `blockNumber` | `Int?` | Block number where tx was mined |
| `blockTimestamp` | `DateTime?` | Block timestamp = legal proof date |
| `blockchainNetwork` | `String?` | `"hardhat"` or `"sepolia"` |
| `contractAddress` | `String?` | DocumentRegistry contract address |
**Indexes:** `userId`, `status`, `type`, `endDate`
**Relations:** `ContractRagChunk[]`, `Notification[]`, `BlockchainTransaction[]`
---
### `ContractRagChunk`
Stores text chunks and their vector embeddings for RAG (Retrieval-Augmented Generation).
| Column | Type | Notes |
|--------|------|-------|
| `id` | `String` (cuid) | Primary key |
| `contractId` | `String` | FK → Contract.id (cascade delete) |
| `chunkIndex` | `Int` | Position in the document |
| `content` | `String` | Raw text content of the chunk |
| `contentHash` | `String` | SHA-256 of content (deduplication) |
| `embedding` | `Float[]` | Gemini text-embedding-004 vector |
| `createdAt` | `DateTime` | — |
| `updatedAt` | `DateTime` | — |
**Unique constraint:** `(contractId, chunkIndex)` — no duplicate chunks per contract.
**Indexes:** `contractId`, `contentHash`, `chunkIndex`
> **Note:** PostgreSQL stores `Float[]` as a native array column. Cosine similarity is computed in application code, not via pgvector. A future enhancement could migrate to pgvector for large-scale similarity search.
---
### `Notification`
In-app notifications for users.
| Column | Type | Notes |
|--------|------|-------|
| `id` | `String` (cuid) | Primary key |
| `userId` | `String` | FK → User.id (cascade delete) |
| `contractId` | `String?` | Optional FK → Contract.id (set null on delete) |
| `type` | `NotificationType` | Enum: SUCCESS, WARNING, ERROR, INFO, DEADLINE |
| `title` | `String` | Short notification title |
| `message` | `String` | Full notification message |
| `icon` | `String?` | Icon identifier for UI rendering |
| `actionType` | `String?` | e.g., `"RENEWAL_REMINDER"`, `"UPLOAD_SUCCESS"` |
| `actionData` | `Json?` | Additional action-specific payload |
| `read` | `Boolean` | Default: `false` |
| `createdAt` | `DateTime` | — |
| `expiresAt` | `DateTime?` | Optional expiration for auto-cleanup |
**Indexes:** `userId`, `contractId`, `type`, `read`, `createdAt`
---
### `BlockchainTransaction`
Audit log of every on-chain document registration.
| Column | Type | Notes |
|--------|------|-------|
| `id` | `String` (cuid) | Primary key |
| `userId` | `String` | FK → User.id (cascade delete) |
| `contractId` | `String` | FK → Contract.id (cascade delete) |
| `documentHash` | `String` | SHA-256 hash of the document |
| `txHash` | `String` (unique) | Ethereum transaction hash |
| `blockNumber` | `Int` | Block number where tx was mined |
| `blockTimestamp` | `DateTime` | Block timestamp |
| `network` | `String` | `"hardhat"` or `"sepolia"` |
| `contractAddress` | `String` | Smart contract address |
| `status` | `String` | Default: `"CONFIRMED"` (PENDING/CONFIRMED/FAILED) |
| `createdAt` | `DateTime` | — |
**Indexes:** `userId`, `contractId`, `txHash`, `network`
---
## Enums
### `ContractType`
| Value | Description |
|-------|-------------|
| `INSURANCE_AUTO` | Auto/vehicle insurance |
| `INSURANCE_HOME` | Home/property insurance |
| `INSURANCE_HEALTH` | Health insurance |
| `INSURANCE_LIFE` | Life insurance |
| `LOAN` | Bank loan / mortgage |
| `CREDIT_CARD` | Credit card agreement |
| `INVESTMENT` | Investment / fund agreement |
| `OTHER` | Any other contract type |
### `ContractStatus`
| Value | Description |
|-------|-------------|
| `UPLOADED` | File received, awaiting analysis |
| `PROCESSING` | AI analysis in progress |
| `COMPLETED` | Analysis done and blockchain registered |
| `FAILED` | Analysis or registration failed |
### `NotificationType`
| Value | Description |
|-------|-------------|
| `SUCCESS` | Successful operation (analysis complete, etc.) |
| `WARNING` | Non-critical issue or approaching deadline |
| `ERROR` | Failed operation |
| `INFO` | General informational message |
| `DEADLINE` | Contract expiry deadline approaching |
---
## Database Operations
### Migrations
```bash
# Create a new migration in development
npm run db:migrate:dev -- --name <migration_name>
# Apply pending migrations in production
npm run db:migrate
# Regenerate Prisma client after schema changes
npm run db:generate
```
### Prisma Studio (visual DB browser)
```bash
npm run db:studio
```
### Connection Pooling (Production)
For high-traffic production deployments, use **PgBouncer** or **Prisma Accelerate** to pool PostgreSQL connections. The `DATABASE_URL` format with connection pooling:
```
DATABASE_URL="postgresql://USER:PASS@HOST:PORT/DB?pgbouncer=true&connection_limit=1"
```
Set `connection_limit=1` when using PgBouncer in transaction mode.