README

RAGFlowChainR is an R package that brings Retrieval-Augmented Generation (RAG) capabilities to R, inspired by LangChain. It enables intelligent retrieval of documents from a local vector store (DuckDB), optional web search, and seamless integration with Large Language Models (LLMs).

Installation

install.packages("RAGFlowChainR")

Development version

To get the latest features or bug fixes, you can install the development version of RAGFlowChainR from GitHub:

# If needed
install.packages("remotes")

remotes::install_github("knowusuboaky/RAGFlowChainR")

🔐 Environment Setup

Sys.setenv(TAVILY_API_KEY    = "your-tavily-api-key")
Sys.setenv(OPENAI_API_KEY    = "your-openai-api-key")
Sys.setenv(GROQ_API_KEY      = "your-groq-api-key")
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")

Usage

1. Data Ingestion

library(RAGFlowChainR)

local_files <- c("tests/testthat/test-data/sprint.pdf", 
                 "tests/testthat/test-data/introduction.pptx",
                 "tests/testthat/test-data/overview.txt")
website_urls <- c("https://www.r-project.org")
crawl_depth <- 1

response <- fetch_data(
  local_paths = local_files,
  website_urls = website_urls,
  crawl_depth = crawl_depth
)

response
#>                                source                                      title ...
#> 1                 documents/sprint.pdf                                       <NA> ...
#> 2          documents/introduction.pptx                                       <NA> ...
#> 3               documents/overview.txt                                       <NA> ...
#> 4            https://www.r-project.org R: The R Project for Statistical Computing ...
#> ...

cat(response$content[1])
#> Getting Started with Scrum\nCodeWithPraveen.com ...

2. Vector Store & Semantic Search

con <- create_vectorstore("tests/testthat/test-data/my_vectors.duckdb", overwrite = TRUE)

docs <- data.frame(head(response))  # reuse from fetch_data()

insert_vectors(
  con = con,
  df = docs,
  embed_fun = embed_openai(),
  chunk_chars = 12000
)

build_vector_index(con, type = c("vss", "fts"))

response <- search_vectors(con, query_text = "Tell me about R?", top_k = 5)

response
#>    id page_content                                                dist
#> 1   5 [Home]\nDownload\nCRAN\nR Project...\n...                0.2183
#> 2   6 [Home]\nDownload\nCRAN\nR Project...\n...                0.2183
#> ...

cat(response$page_content[1])
#> [Home]\nDownload\nCRAN\nR Project\nAbout R\nLogo\n...

3. RAG Chain Querying

rag_chain <- create_rag_chain(
  llm = call_llm,
  vector_database_directory = "tests/testthat/test-data/my_vectors.duckdb",
  method = "DuckDB",
  embedding_function = embed_openai(),
  use_web_search = FALSE
)

response <- rag_chain$invoke("Tell me about R")

response
#> $input
#> [1] "Tell me about R"
#>
#> $chat_history
#> [[1]] $role: "human", $content: "Tell me about R"
#> [[2]] $role: "assistant", $content: "R is a programming language..."
#>
#> $answer
#> [1] "R is a programming language and software environment commonly used for statistical computing and graphics..."

cat(response$answer)
#> R is a programming language and software environment commonly used for statistical computing and graphics...

LLM Support

call_llm(
  prompt = "Summarize the capital of France.",
  provider = "groq",
  model = "llama3-8b",
  temperature = 0.7,
  max_tokens = 200
)

📦 Related Package: chatLLM

The chatLLM package (now available on CRAN 🎉) offers a modular interface for interacting with LLM providers including OpenAI, Groq, Anthropic, DeepSeek, DashScope, and GitHub Models.

install.packages("chatLLM")

RAGFlowChainR

Overview

Installation

Development version

🔐 Environment Setup

Usage

1. Data Ingestion

2. Vector Store & Semantic Search

3. RAG Chain Querying

LLM Support

License

RAGFlowChainR

Overview

Installation

Development version

🔐 Environment Setup

Usage

1. Data Ingestion

2. Vector Store & Semantic Search

3. RAG Chain Querying

LLM Support

📦 Related Package: chatLLM

License

📦 Related Package: `chatLLM`