swiss_army_llama

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

Visit WebsiteView on GitHub
1.1k
Stars
+88
Stars/month
0
Releases (6m)

Overview

Swiss Army Llama is a comprehensive FastAPI service that streamlines semantic text search and document processing using local LLMs. It automatically generates and caches text embeddings for various file types including PDFs (with OCR support), Word documents, and audio files through Whisper transcription. The tool leverages llama_cpp for local LLM integration and employs a high-performance Rust-based library for advanced similarity measures like Spearman correlation, Kendall tau, and Hoeffding's D statistic. Beyond basic cosine similarity, it offers sophisticated semantic search capabilities through FAISS vector indexing with multiple embedding pooling methods including mean pooling, SVD, and Independent Component Analysis. The service intelligently caches embeddings in SQLite to prevent redundant computations and supports optional RAM disk usage for faster LLM loading. All functionality is exposed through REST endpoints with an integrated Swagger UI, making it easy to integrate into existing applications. This makes it particularly valuable for organizations wanting to implement semantic search and document analysis capabilities while maintaining full control over their data through local deployment.

Pros

  • + Comprehensive document processing pipeline that handles diverse file types including PDFs with OCR, Word documents, and audio transcription
  • + Advanced similarity measures beyond cosine similarity, including statistical correlation methods and dependency measures via optimized Rust library
  • + Intelligent caching system with SQLite storage prevents redundant computations and includes automatic RAM disk management for performance optimization

Cons

  • - Requires significant local computational resources for running multiple LLMs and processing large document collections
  • - Setup complexity may be challenging for users without experience in local LLM deployment and configuration
  • - Limited to local deployment model which may not suit teams requiring cloud-native or distributed processing solutions

Use Cases

Getting Started

Install the service by cloning the repository and installing Python dependencies via pip or conda. Configure your local LLM models and optional RAM disk settings in the configuration file. Launch the FastAPI server and access the Swagger UI to start uploading documents or submitting text for embedding generation and semantic search.