About CloudDriveRAG
A powerful open-source RAG (Retrieval-Augmented Generation) application that transforms your Google Drive documents into an intelligent knowledge base.
What is CloudDriveRAG?
CloudDriveRAG is a self-hosted application that enables you to:
-
1
Connect to Google Drive: Securely authorize CloudDriveRAG to access your documents (PDFs, Excel files, Word documents)
-
2
Ingest Documents: Extract and process text from your files, automatically chunked and converted to vector embeddings
-
3
Build Knowledge Base: Store embeddings in Qdrant (local or remote vector database) for semantic search
-
4
Query with AI: Ask questions about your documents and receive contextual answers powered by LLMs
How It Works
Google Drive Connection
You authorize CloudDriveRAG to access your Google Drive using OAuth 2.0. Your password is never stored—only secure access tokens.
Document Ingestion
Select documents from your Google Drive. CloudDriveRAG downloads and parses PDFs, Excel sheets, and Word documents to extract text.
Embedding & Storage
Text is split into chunks, converted to vector embeddings using your chosen LLM (OpenAI, Gemini, or Ollama), and stored in Qdrant.
RAG Query & Response
Ask questions in the Chat tab. CloudDriveRAG searches your knowledge base, retrieves relevant chunks, and uses an LLM to generate contextual answers.
Technical Architecture
User → Google Drive
OAuth2 authentication & token storage
Document Parser
PDF: pdf-parse | Excel: xlsx | DOCX: mammoth
Text Chunker
Smart chunking with overlap (1000 chars, 200 overlap)
Embedding Generator
OpenAI (1536-dim) | Gemini (3072-dim) | Ollama (768-dim)
Qdrant Vector DB
Local or remote (cloud) storage with cosine similarity search
RAG Pipeline
Query embedding → Search → Context assembly → LLM response
Key Features
Self-Hosted
Run entirely on your own infrastructure—no data sent to our servers
Multi-LLM Support
Use OpenAI, Google Gemini, or local Ollama—switch providers easily
Persistent Authentication
Stay logged in across sessions—secure token storage
Configurable Settings
Manage API keys, Qdrant connection, and LLM provider in Settings page
Real-Time Ingestion
Live progress updates as documents are processed and stored
Source Attribution
Chat responses include source documents and relevance scores
Open Source
Free to use, modify, and distribute under open-source license
Technology Stack
Backend
- Node.js + Express
- Google APIs (Drive, OAuth)
- Qdrant Client
- PDF Parse, XLSX, Mammoth
LLM Integration
- OpenAI SDK
- Google Generative AI
- Ollama
- Custom RAG Pipeline
Frontend
- HTML5 + Vanilla JS
- Tailwind CSS
- Responsive Design
- Server-Sent Events (SSE)
Vector Database
- Qdrant (Local/Cloud)
- Cosine Similarity
- Vector Indexing
- Payload Storage
Security
- OAuth 2.0
- Session Management
- Encrypted Storage
- HTTPS/TLS
Data Handling
- Document Parsing
- Text Chunking
- Embedding Generation
- Vector Search
Getting Started
→ Clone or Download: Get CloudDriveRAG from GitHub
→ Configure: Copy .env.example to .env and add your credentials
→
Install: Run npm install
→
Start: Run npm start
→
Open: Visit http://localhost:3000
→ Configure Settings: Add API keys and Qdrant details
→ Connect Drive: Authorize access to your documents
→ Ingest Documents: Select and process your files
→ Start Chatting: Ask questions about your documents
Frequently Asked Questions
Is my data safe?
Yes. CloudDriveRAG is self-hosted on your infrastructure. Your documents never leave your control. OAuth tokens and API keys are stored locally and securely.
Can I use it offline?
You need internet to authenticate with Google Drive initially. For chat, you can use local Ollama entirely offline, but OpenAI/Gemini require internet access.
What file types are supported?
PDF, Excel (.xlsx, .xls), Word (.docx, .doc), and Google Docs/Sheets (exported formats).
How much does it cost?
CloudDriveRAG itself is free (open-source). You only pay for third-party APIs you use (OpenAI embeddings/chat, Google Gemini, Qdrant Cloud if you use it).
Can I modify the source code?
Yes! CloudDriveRAG is open-source. You can modify, extend, and redistribute it according to the license terms.
Built with ❤️ for knowledge workers, researchers, and data enthusiasts