About CloudDriveRAG

A powerful open-source RAG (Retrieval-Augmented Generation) application that transforms your Google Drive documents into an intelligent knowledge base.

What is CloudDriveRAG?

CloudDriveRAG is a self-hosted application that enables you to:

How It Works

1

Google Drive Connection

You authorize CloudDriveRAG to access your Google Drive using OAuth 2.0. Your password is never stored—only secure access tokens.

2

Document Ingestion

Select documents from your Google Drive. CloudDriveRAG downloads and parses PDFs, Excel sheets, and Word documents to extract text.

3

Embedding & Storage

Text is split into chunks, converted to vector embeddings using your chosen LLM (OpenAI, Gemini, or Ollama), and stored in Qdrant.

4

RAG Query & Response

Ask questions in the Chat tab. CloudDriveRAG searches your knowledge base, retrieves relevant chunks, and uses an LLM to generate contextual answers.

Technical Architecture

User → Google Drive
OAuth2 authentication & token storage

Document Parser
PDF: pdf-parse | Excel: xlsx | DOCX: mammoth

Text Chunker
Smart chunking with overlap (1000 chars, 200 overlap)

Embedding Generator
OpenAI (1536-dim) | Gemini (3072-dim) | Ollama (768-dim)

Qdrant Vector DB
Local or remote (cloud) storage with cosine similarity search

RAG Pipeline
Query embedding → Search → Context assembly → LLM response

Key Features

Self-Hosted

Run entirely on your own infrastructure—no data sent to our servers

Multi-LLM Support

Use OpenAI, Google Gemini, or local Ollama—switch providers easily

Persistent Authentication

Stay logged in across sessions—secure token storage

Configurable Settings

Manage API keys, Qdrant connection, and LLM provider in Settings page

Real-Time Ingestion

Live progress updates as documents are processed and stored

Source Attribution

Chat responses include source documents and relevance scores

Open Source

Free to use, modify, and distribute under open-source license

Technology Stack

Backend

  • Node.js + Express
  • Google APIs (Drive, OAuth)
  • Qdrant Client
  • PDF Parse, XLSX, Mammoth

LLM Integration

  • OpenAI SDK
  • Google Generative AI
  • Ollama
  • Custom RAG Pipeline

Frontend

  • HTML5 + Vanilla JS
  • Tailwind CSS
  • Responsive Design
  • Server-Sent Events (SSE)

Vector Database

  • Qdrant (Local/Cloud)
  • Cosine Similarity
  • Vector Indexing
  • Payload Storage

Security

  • OAuth 2.0
  • Session Management
  • Encrypted Storage
  • HTTPS/TLS

Data Handling

  • Document Parsing
  • Text Chunking
  • Embedding Generation
  • Vector Search

Getting Started

Clone or Download: Get CloudDriveRAG from GitHub

Configure: Copy .env.example to .env and add your credentials

Install: Run npm install

Start: Run npm start

Open: Visit http://localhost:3000

Configure Settings: Add API keys and Qdrant details

Connect Drive: Authorize access to your documents

Ingest Documents: Select and process your files

Start Chatting: Ask questions about your documents

Frequently Asked Questions

Is my data safe?

Yes. CloudDriveRAG is self-hosted on your infrastructure. Your documents never leave your control. OAuth tokens and API keys are stored locally and securely.

Can I use it offline?

You need internet to authenticate with Google Drive initially. For chat, you can use local Ollama entirely offline, but OpenAI/Gemini require internet access.

What file types are supported?

PDF, Excel (.xlsx, .xls), Word (.docx, .doc), and Google Docs/Sheets (exported formats).

How much does it cost?

CloudDriveRAG itself is free (open-source). You only pay for third-party APIs you use (OpenAI embeddings/chat, Google Gemini, Qdrant Cloud if you use it).

Can I modify the source code?

Yes! CloudDriveRAG is open-source. You can modify, extend, and redistribute it according to the license terms.

Built with ❤️ for knowledge workers, researchers, and data enthusiasts