swiss_army_llama
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
Overview
Swiss Army Llama is a comprehensive FastAPI service that streamlines semantic text search and document processing using local LLMs. It automatically generates and caches text embeddings for various file types including PDFs (with OCR support), Word documents, and audio files through Whisper transcription. The tool leverages llama_cpp for local LLM integration and employs a high-performance Rust-based library for advanced similarity measures like Spearman correlation, Kendall tau, and Hoeffding's D statistic. Beyond basic cosine similarity, it offers sophisticated semantic search capabilities through FAISS vector indexing with multiple embedding pooling methods including mean pooling, SVD, and Independent Component Analysis. The service intelligently caches embeddings in SQLite to prevent redundant computations and supports optional RAM disk usage for faster LLM loading. All functionality is exposed through REST endpoints with an integrated Swagger UI, making it easy to integrate into existing applications. This makes it particularly valuable for organizations wanting to implement semantic search and document analysis capabilities while maintaining full control over their data through local deployment.
Pros
- + Comprehensive document processing pipeline that handles diverse file types including PDFs with OCR, Word documents, and audio transcription
- + Advanced similarity measures beyond cosine similarity, including statistical correlation methods and dependency measures via optimized Rust library
- + Intelligent caching system with SQLite storage prevents redundant computations and includes automatic RAM disk management for performance optimization
Cons
- - Requires significant local computational resources for running multiple LLMs and processing large document collections
- - Setup complexity may be challenging for users without experience in local LLM deployment and configuration
- - Limited to local deployment model which may not suit teams requiring cloud-native or distributed processing solutions
Use Cases
- • Enterprise document search across mixed file types (PDFs, Word docs, audio recordings) while keeping data on-premises for security compliance
- • Research applications requiring sophisticated similarity analysis beyond basic cosine similarity for academic paper analysis or content clustering
- • Knowledge management systems that need to process and search through large document repositories with automatic embedding generation and caching