Profile

Danish Khan

Full Stack Developer

Scroll - PDF Chat
All Projects
AI ToolShipped

Scroll - PDF Chat

AI Chatbot for PDFs

ClientPersonal
Year2024
RoleSolo Developer
Duration6 weeks
// Impact

500pg

Max PDF size supported

Sub-second responses

95%

Accuracy on QA benchmarks

200+

Beta users

Students & researchers

Top 10

Product Hunt daily ranking

// Overview

Scroll transforms the way people interact with documents. Upload any PDF — research papers, legal contracts, textbooks — and start asking questions in natural language. The AI understands context, provides citations, and can summarize entire sections on demand.

// The Problem

Reading through lengthy PDFs to find specific information is time-consuming and error-prone. Existing search tools only match keywords, missing the semantic meaning behind queries.

// The Solution

We built a RAG (Retrieval-Augmented Generation) pipeline that chunks documents, creates vector embeddings, and retrieves the most relevant passages to answer user queries. The chat interface provides page references and highlighted excerpts for every answer.

// How I Built It
01

Discovery

Talked to 15 researchers and students about their PDF workflows. The clearest pain: skimming a 60-page paper for one specific finding.

02

RAG Pipeline

Implemented a chunking strategy with overlapping windows to preserve context across chunk boundaries. Embeddings stored in Pinecone, retrieval tuned with MMR to reduce redundancy.

03

Chat Interface

Built a streaming chat UI in React with citation cards that deep-link to the exact PDF page. Source highlighting was the feature users loved most.

04

Launch

Soft-launched on Twitter, got picked up by a PhD student community, hit Product Hunt top 10 organically within 48 hours of going public.

// Key Results
Supports PDFs up to 500 pages with sub-second response times
95% accuracy on factual question-answering benchmarks
Used by 200+ students and researchers in beta
Featured in Product Hunt's daily top 10
// Tech Stack
ReactNode.jsLangChainPineconeOpenAI APIExpressMongoDB
// Learnings & Reflection

Chunking strategy matters more than model choice. Naive fixed-size chunking caused the model to miss answers that spanned chunk boundaries. Overlapping windows with semantic re-ranking fixed 80% of the accuracy issues before we touched the LLM at all.

THANK YOU

" First solve the problem.
Then write the code."

~ John Johnson