GovBD-MRA: Federal Contract Intelligence Platform 🏛️
GovBD-MRA (Market Research Analytics) is a sophisticated enterprise platform designed to revolutionize how government contractors discover, analyze, and track federal procurement opportunities. Built as part of the Kontratar ecosystem, MRA combines real-time data ingestion, AI-powered analytics, and intelligent search to provide contractors with actionable insights into government spending patterns, entity relationships, and award trends.
"Transforming government procurement data into strategic business intelligence."
🎯 The Challenge We Solve
Government contractors face significant challenges in the federal marketplace:
- 📊 Data Overload: 2M+ entities, 1M+ awards annually across multiple sources (SAM.gov, USAspending.gov, FPDS)
- 🔍 Complex Search: Multi-dimensional queries across entities, awards, agencies, NAICS codes, PSC codes
- 📈 Analytics Gap: Difficulty identifying spending trends, agency patterns, and competitive landscapes
- 🤖 Manual Research: Time-consuming manual research and opportunity tracking
- 🔗 Fragmented Data: Entity data, award data, and historical relationships scattered across systems
- ⏰ Time Sensitivity: Missing critical opportunities due to delayed notifications
- 💼 Team Collaboration: Coordinating research across multiple team members and projects
GovBD-MRA solves all of these by providing an integrated, AI-powered platform that automates data aggregation, analysis, and insight generation.
✨ Key Features
🏢 SAM Entity Management
Comprehensive Entity Database
- 2M+ Registered Entities: Complete SAM.gov entity registry
- Real-time Synchronization: Daily updates from SAM.gov API
- Historical Downloads: Automated historical data ingestion
- Staging Pipeline: Multi-stage validation and deduplication
- Vector Embeddings: AI-powered semantic search using E5-Large-v2
Entity Data Points
- Core registration information (UEI, CAGE code, DUNS)
- Business types and classifications
- NAICS codes (primary and secondary)
- PSC codes (Product/Service Codes)
- Points of contact (POC) with full contact details
- Physical and mailing addresses
- Certifications (8(a), HUBZone, WOSB, SDVOSB, etc.)
- Financial information and banking details
- Geographical service areas
Advanced Entity Search
- Full-text search across all entity fields
- Filter by business type, NAICS, PSC, location
- Certification filtering (small business, veteran-owned, etc.)
- Relationship mapping (parent companies, subsidiaries)
- Export to CSV for bulk analysis
- Async extract for large datasets
💰 Award Intelligence System
USAspending.gov Integration
- 1M+ Awards Annually: Complete federal contract awards
- Real-time ETL Pipeline: Automated daily data extraction
- Award Staging: Validation and enrichment pipeline
- Vector Database: Qdrant integration for semantic search
- Historical Tracking: Multi-year award history
Award Data Coverage
- Prime contract awards (all federal agencies)
- Award amounts, dates, and durations
- Contracting agency and office
- Recipient information (entity linkage)
- NAICS and PSC codes
- Place of performance
- Contract type and pricing
- Competition type
- Set-aside categories
Award Analytics
- Spending trends by agency, time period
- Top recipients and contractors
- NAICS/PSC spending distribution
- Geographic spending patterns
- Small business utilization
- Award size distribution
- Competition analysis
📊 Advanced Analytics & Drilldown
Automated Analytics Jobs
- Daily Drilldown: Comprehensive agency spending analysis
- Weekly Rollups: Historical trend aggregation
- Scheduled Processing: APScheduler-based automation
- Incremental Updates: Efficient delta processing
Analytics Dimensions
- By Agency: Agency-level spending patterns
- By NAICS: Industry sector analysis
- By PSC: Product/Service category trends
- By Entity: Contractor performance tracking
- By Time: Temporal trend analysis
- By Geography: Regional spending patterns
Drilldown Capabilities
- Top 10 recipients per category
- Spending distribution charts
- Year-over-year comparisons
- Award count vs. amount analysis
- Competition metrics
- Set-aside utilization
🤖 AI-Powered Chat & Research
Intelligent Conversational Interface
- LangChain Integration: Multi-LLM support (Ollama, OpenAI, Gemini)
- Streaming Responses: Server-Sent Events (SSE) for real-time chat
- Context-Aware: RAG (Retrieval-Augmented Generation) using vector search
- Memory Management: Automatic conversation history and summarization
- Multi-Thread: Concurrent conversation threads per user
Research Automation
- Scheduled Research: Automated research prompt execution
- Entity Research: Deep-dive entity analysis with AI
- Award Research: Contract opportunity research
- Market Analysis: Competitive landscape assessment
- Trend Identification: AI-powered trend detection
Chat Features
- Natural language queries across entities and awards
- File upload support (PDF, DOCX, XLSX)
- Document Q&A with RAG
- Export chat history
- Share research threads
- Team collaboration on research
🔍 Multi-Source Search Engine
Unified Search Interface
- Elasticsearch Integration: Fast full-text search across 2M+ records
- Vector Search: Semantic similarity using Qdrant
- Hybrid Search: Combined keyword + semantic ranking
- Fuzzy Matching: Typo-tolerant searches
- Faceted Filtering: Multi-dimensional filtering
Search Capabilities
- Entity search (name, UEI, CAGE, DUNS)
- Award search (title, description, agency)
- NAICS code search
- PSC code search
- Geographic search (state, city, zip)
- Combined entity + award searches
- Advanced boolean queries
📁 Project & Document Management
Research Projects
- Create and organize research projects
- Associate entities and awards
- Tag and categorize opportunities
- Track project status
- Team collaboration
- Share project insights
Document Processing
- Upload RFPs, RFQs, solicitations
- AI-powered document parsing
- Extract key information
- Q&A on uploaded documents
- Document versioning
- Attachment management
👥 Team Collaboration & Access Control
Multi-Tenant Architecture
- Tenant isolation (data segregation)
- Role-based access control (RBAC)
- Team management
- Invitation system
- Permission management
Team Features
- Create and manage teams
- Invite team members
- Assign roles (admin, member, viewer)
- Share research and projects
- Collaborative chat threads
- Activity tracking
💳 Subscription & Billing
Stripe Integration
- Multiple pricing tiers
- Monthly and annual billing
- Usage-based limits
- Automatic renewals
- Payment method management
- Invoice generation
Subscription Plans
- Free Tier: Limited searches and entities
- Professional: Enhanced search, analytics
- Enterprise: Unlimited access, team features
- Custom: Tailored solutions for large organizations
🔔 Expiring Opportunities & Alerts
Opportunity Tracking
- Track expiring solicitations
- Custom alert thresholds
- Email notifications
- Dashboard widgets
- Favorite opportunities
- Calendar integration
⭐ Favorites & Watchlists
Personal Tracking
- Favorite entities
- Favorite awards
- Save searches
- Track competitors
- Monitor agencies
- Export watchlists
🏗️ Technical Architecture
GovBD-MRA is built using a modern microservices architecture combining multiple technologies for optimal performance and scalability.
Architecture Overview
The platform utilizes a multi-tier architecture:
- Frontend Layer: Modern React-based web application
- Application Layer: Microservices handling business logic and data processing
- Data Layer: Multiple specialized databases for different use cases
- External Integration: Connections to government data sources
Technology Stack
Technology Stack
Backend Technologies:
- Java 17 with Spring Boot
- Python 3.11+ with FastAPI
- PostgreSQL for data storage
- Elasticsearch for full-text search
- Vector databases for semantic search
Frontend Technologies:
- Next.js 15 with React 19
- TypeScript for type safety
- Modern UI framework with responsive design
AI/ML Technologies:
- LangChain for LLM orchestration
- Multiple LLM providers (OpenAI, Google Gemini, Ollama)
- Advanced embedding models for semantic search
- RAG (Retrieval-Augmented Generation) pipeline
📊 Data Flow & Integration
The platform integrates data from multiple government sources, processes it through AI-powered analytics, and presents actionable insights to users through an intuitive interface.
Key Data Sources
- SAM.gov Entity API
- USAspending.gov Award API
- FPDS Contract API
Processing Pipeline
- Data Ingestion: Automated collection from government APIs
- Validation & Storage: Data quality checks and secure storage
- AI Enhancement: Vector embeddings and semantic analysis
- Search & Discovery: Fast, intelligent search capabilities
- Analytics Generation: Automated reporting and insights
📊 Performance & Scale
1. Microservices Architecture
- 3 Specialized Services: Entity, Award, Backend (chat/analytics)
- Service Isolation: Each service has dedicated database schemas
- Independent Scaling: Scale services based on load
- Technology Diversity: Java, Python, TypeScript in single platform
2. High-Performance Data Ingestion
- 10 SAM API Keys: Parallel entity fetching (100 req/sec)
- Batch Processing: 10,000 entities per batch
- Staging Pipeline: Validate before production insertion
- Incremental Updates: Only fetch changed entities
- Historical Downloads: Automated monthly historical data ingestion
3. AI-Powered Intelligence
- Multi-LLM Support: Ollama (local), OpenAI, Gemini
- RAG Pipeline: Vector search + LLM generation
- Intent Classification: Route queries to optimal handlers
- Streaming Responses: Real-time SSE chat
- Memory Management: Conversation summarization
- Web Search: Gemini-powered web search for recent info
4. Vector Search & Embeddings
- E5-Large-v2: State-of-the-art embedding model (768D)
- Qdrant Integration: High-performance vector database
- Semantic Search: Find similar entities/awards by meaning
- Hybrid Search: Combine keyword + semantic
- Cosine Similarity: Efficient similarity calculations
- HNSW Indexing: Fast approximate nearest neighbor search
5. Advanced Analytics
- Automated Drilldown: Daily and weekly analytics jobs
- Pre-computed Aggregations: Fast dashboard loading
- Multi-Dimensional: Agency, NAICS, PSC, Entity, Time, Geography
- Top-K Analysis: Top 10 recipients per category
- Trend Detection: Year-over-year comparisons
- Exportable: Excel export for offline analysis
6. Scalability & Performance
- Async/Await: Fully asynchronous Python backend
- Connection Pooling: Database connection reuse
- Worker Pool: 4 Uvicorn workers in production
- Compression: Gzip for responses >1KB
- Caching: TanStack Query caching in frontend
- Lazy Loading: Infinite scroll for large datasets
7. Enterprise Features
- Multi-Tenancy: Data isolation per tenant
- RBAC: Role-based access control
- Team Collaboration: Shared projects and research
- Audit Trails: Track all entity/award changes
- Subscription Management: Stripe integration
- Usage Limits: Tier-based feature access
📊 Performance & Scale
Data Volume
- Entities: 2M+ SAM.gov registered entities
- Awards: 100m+ new awards annually (cumulative 5M+)
- Analytics: 10K+ pre-computed aggregations
- Chat Messages: 1M+ AI chat interactions
- Vector Embeddings: 2M+ entity vectors, 5M+ award vectors
Processing Speed
- Entity Ingestion: 10,000 entities/min (10 API keys)
- Award Ingestion: 5,000 awards/min
- Embedding Generation: 100 embeddings/sec (batch)
- Vector Search: <50ms for top-10 similarity search
- Chat Response: 1-2s first token, 50-100 tokens/sec streaming
- Analytics Drilldown: Complete daily job in 10-15 minutes
API Performance
- Entity Search: <100ms (indexed queries)
- Award Search: <150ms (Elasticsearch)
- Semantic Search: <200ms (vector + rerank)
- Chat Endpoint: <2s (SSE start)
- Analytics Dashboard: <500ms (pre-computed data)
Resource Usage
EntityData Service:
- Memory: 2GB (JVM heap)
- CPU: 2 cores
- Database: 50GB (entities + relationships)
AwardLoad Service:
- Memory: 2GB (JVM heap)
- CPU: 2 cores
- Database: 100GB (awards + staging)
MRA Backend:
- Memory: 1GB (per worker)
- CPU: 4 cores (4 workers)
- Database: 20GB (users, projects, analytics)
MRA Frontend:
- Memory: 512MB (Node process)
- CPU: 1 core
- Disk: 500MB (build artifacts)
Databases:
- PostgreSQL: 200GB total
- Elasticsearch: 50GB (indexed data)
- Qdrant: 100GB (vector storage)
🎯 Use Cases
1. Competitive Intelligence
Scenario: Track competitors' contract wins
Workflow:
- Search for competitor entities by name
- Add to favorites
- View award history for each competitor
- Analyze spending trends by agency
- Identify agencies they frequently win from
- Export data for presentation
Result: Understand competitive landscape and target similar opportunities
2. Market Research
Scenario: Identify agencies spending in your NAICS code
Workflow:
- Navigate to Analytics dashboard
- Filter by NAICS code (e.g., 541512 - Computer Systems Design)
- View top agencies by spending
- Drill down into agency details
- View recent awards in that NAICS
- Chat with AI: "What are the trends in DoD IT spending?"
Result: Data-driven agency targeting strategy
3. Opportunity Tracking
Scenario: Monitor expiring solicitations
Workflow:
- Set up expiring opportunities alerts
- Define threshold (e.g., 7 days before deadline)
- Receive email notifications
- Review opportunities on dashboard
- Add relevant opportunities to projects
- Collaborate with team on responses
Result: Never miss critical deadlines
4. Entity Research
Scenario: Deep-dive research on potential teaming partner
Workflow:
- Search entity by name/UEI
- View entity profile (NAICS, PSC, certifications)
- View award history
- Create research prompt: "Analyze this entity's past performance"
- AI generates comprehensive research report
- Share research with team
- Export to PDF for meeting
Result: Informed teaming decisions
5. Award Analysis
Scenario: Understand past awards for upcoming RFP
Workflow:
- Search for similar past awards by title/description
- View award details (amount, dates, awardee)
- Identify incumbent contractor
- View incumbent's other awards
- Chat: "What is the typical contract value for this type of award?"
- Export similar awards to Excel
Result: Better pricing and strategy for proposal
🐛 Support
For technical support, feature requests, or bug reports, please contact the Kontratar engineering team.
📝 License
This project is proprietary software owned by Kontratar LLC.
GovBD-MRA - Market Research Analytics Platform
Copyright (C) 2024 Kontratar LLC
All rights reserved. Unauthorized copying, modification, distribution,
or use of this software, via any medium, is strictly prohibited.🤝 Contributing
GovBD-MRA is a proprietary platform. Contributions are limited to authorized Kontratar team members.
For team members:
- Create feature branch:
git checkout -b feature/amazing-feature - Make changes and test thoroughly
- Run tests:
pytest(backend),bun test(frontend) - Update documentation if needed
- Commit:
git commit -m 'Add amazing feature' - Push:
git push origin feature/amazing-feature - Create Pull Request with detailed description
- Request code review from team lead
- Address review comments
- Merge after approval
🛣️ Roadmap
Q2 2026
- 🔍 Advanced Search: Boolean operators, proximity search
- 📊 Enhanced Analytics: Predictive spending models
- 🤖 AI Agents: Autonomous opportunity monitoring agents
- 📱 Mobile App: React Native iOS/Android apps
Q3 2026
- 🌐 FPDS Full Integration: Complete contract data from FPDS
- 📈 Real-time Dashboards: WebSocket-based live analytics
- 🔗 API Marketplace: Public API for third-party integrations
- 🎓 Knowledge Base: AI-powered procurement knowledge base
Q4 2026
- 🧠 Advanced AI: GPT-4-turbo fine-tuned on procurement data
- 🌍 International Expansion: Support for non-US contracts
- 🔐 SOC 2 Compliance: Enterprise security certification
- 📊 Custom Reports: Drag-and-drop report builder
📞 Support
For Issues
- 🐛 Bug Reports: Contact engineering team
- 💬 Questions: Internal Slack #mra-support
- 📧 Email: mra-support@kontratar.com
- 📚 Documentation: Internal wiki
Service Status
- 🟢 Production: https://mra.govbd.com/status
- 🟡 QA: http://qa.mra.govbd.com/status
- 🔵 Dev: http://dev.mra.govbd.com/status
🏆 Project Stats
Codebase Metrics
- Total Lines: 150,000+ lines
- Java: 50,000 lines (EntityData + AwardLoad)
- Python: 40,000 lines (MRA Backend)
- TypeScript/TSX: 60,000 lines (MRA Frontend)
- Files: 850+ files
- Java: 150 files
- Python: 166 files
- TypeScript: 450+ files
- Services: 3 backend services + 1 frontend
- API Endpoints: 80+ REST endpoints
- Database Tables: 100+ tables
- Vector Collections: 3 collections (entities, awards, documents)
Technology Diversity
- Languages: Java, Python, TypeScript/JavaScript
- Frameworks: Spring Boot, FastAPI, Next.js
- Databases: PostgreSQL, Elasticsearch, Qdrant, ChromaDB
- LLM Providers: Ollama, OpenAI, Gemini
- Cloud Services: AWS RDS, S3 (planned)
💖 Acknowledgments
GovBD-MRA is built on the shoulders of giants:
- Spring Team: Excellent enterprise Java framework
- FastAPI Team: High-performance Python web framework
- Next.js Team: Revolutionary React framework
- LangChain Team: LLM orchestration framework
- Qdrant Team: High-performance vector database
- Elastic Team: Powerful search engine
- OpenAI: GPT models powering AI chat
- Google: Gemini API for web search
- Ollama: Local LLM runtime
- SAM.gov: Entity data API
- USAspending.gov: Award data API
🌟 Why Choose GovBD-MRA?
Comparison with Alternatives
| Feature | GovBD-MRA | GovWin | Deltek | BGov | SAM.gov |
|---|---|---|---|---|---|
| Entity Data | ✅ 2M+ | ✅ 2M+ | ✅ 2M+ | ⚠️ Limited | ✅ Native |
| Award Data | ✅ 5M+ | ✅ 5M+ | ✅ 5M+ | ✅ 5M+ | ⚠️ Partial |
| AI Chat | ✅ RAG-powered | ❌ No | ⚠️ Basic | ❌ No | ❌ No |
| Vector Search | ✅ Semantic | ❌ No | ❌ No | ❌ No | ❌ No |
| Analytics | ✅ Pre-computed | ⚠️ Basic | ✅ Advanced | ✅ Advanced | ⚠️ Basic |
| Team Collaboration | ✅ Full | ✅ Full | ✅ Full | ⚠️ Limited | ❌ No |
| API Access | ✅ REST API | ⚠️ Paid | ⚠️ Paid | ❌ No | ✅ Free (limited) |
| Pricing | $$ | $$$ | $$$$ | $$$ | Free (limited) |
| Self-Hosted | ✅ Yes | ❌ No | ❌ No | ❌ No | N/A |
Ready to revolutionize your government contracting intelligence? Get started with GovBD-MRA today! 🚀🏛️
Built with 💙 by the Kontratar Engineering Team
"Empowering government contractors with data-driven intelligence." ⚡