RAGFlowChainR is an R package that brings Retrieval-Augmented Generation (RAG) capabilities to R, inspired by LangChain. It enables intelligent retrieval of documents from a local vector store (DuckDB), enhanced with optional web search, and seamless integration with Large Language Models (LLMs).
Features include:
For the Python version, see: RAGFlowChain
(PyPI)
π GitHub (R): RAGFlowChainR
π GitHub (Python): RAGFlowChain
# Install from GitHub
if (!requireNamespace("remotes")) install.packages("remotes")
::install_github("knowusuboaky/RAGFlowChainR") remotes
To use features like web search (Tavily
) and LLMs
(OpenAI
, Groq
, Anthropic
), youβll
need to set up your API keys as environment variables. This ensures that
sensitive credentials are never hardcoded in your
scripts.
# Add these to your .Renviron file or run once per session
Sys.setenv(TAVILY_API_KEY = "your-tavily-api-key")
Sys.setenv(OPENAI_API_KEY = "your-openai-api-key")
Sys.setenv(GROQ_API_KEY = "your-groq-api-key")
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")
π‘ Tip: To persist these keys across sessions, add them to a
~/.Renviron
file (not tracked by git) instead of your code.
Place this in a file named .Renviron
in your home
directory:
TAVILY_API_KEY=your-tavily-api-key
OPENAI_API_KEY=your-openai-api-key
GROQ_API_KEY=your-groq-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
Then restart R for the changes to take effect.
fetch_data()
library(RAGFlowChainR)
# Read local files and websites
<- c("documents/sample.pdf", "documents/sample.txt")
local_files <- c("https://www.r-project.org")
website_urls <- 1
crawl_depth
<- fetch_data(local_paths = local_files, website_urls = website_urls, crawl_depth = crawl_depth)
data head(data)
<- create_vectorstore("my_vectors.duckdb", overwrite = TRUE)
con
<- data.frame(
docs source = "Test Source",
title = "Test Title",
author = "Test Author",
publishedDate = "2025-01-01",
description = "Test Description",
content = "Hello world",
url = "https://example.com",
source_type = "txt",
stringsAsFactors = FALSE
)
insert_vectors(
con = con,
df = docs,
embed_fun = embed_openai(), # Or embed_ollama()
chunk_chars = 12000
)
build_vector_index(con, type = c("vss", "fts"))
<- search_vectors(con, query_text = "Who is Messi?", top_k = 5)
results print(results)
dbDisconnect(con)
<- create_rag_chain(
rag_chain llm = call_llm,
vector_database_directory = "my_vectors.duckdb",
method = "DuckDB",
embedding_function = embed_openai(),
use_web_search = FALSE
)
# Ask a question
<- rag_chain$invoke("Tell me about Messi")
response cat(response$answer)
# Get related documents
<- rag_chain$custom_invoke("Tell me about Messi")
context print(context$documents)
# Review and clear chat history
print(rag_chain$get_session_history())
$clear_history()
rag_chain$disconnect() rag_chain
RAGFlowChainR includes built-in support for calling LLMs from
providers such as OpenAI, Groq, and
Anthropic via the call_llm()
utility:
call_llm(
prompt = "Summarize the capital of France.",
provider = "groq",
model = "llama3-8b",
temperature = 0.7,
max_tokens = 200
)
chatLLM
Weβre developing a standalone R package,
chatLLM
, that will offer a unified,
modular interface for interacting with popular LLM
providersβOpenAI, Groq, and
Anthropicβvia a clean, extensible API.
Features planned:
openai
,
groq
, anthropic
)RAGFlowChainR
Stay tuned on GitHub for updates!