ingestion-pipeline

Here are 64 public repositories matching this topic...

bruin-data / ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery postgresql snowflake mssql data-integration data-pipeline data-ingestion copy-database ingestion-pipeline duckdb

Updated Mar 15, 2026
Python

opensemanticsearch / open-semantic-etl

Star

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

Updated Oct 9, 2022
Python

AstraBert / ingest-anything

Sponsor

Star

From data to vector database effortlessly

pdf automation ingestion-pipeline vector-database qdrant llamaindex chonkie

Updated May 17, 2025
Python

KnudsenMorten / AzLogDcrIngestPS

Star

AzLogDcrIngestPS - Unleashing the power of Log Ingestion API with Azure LogAnalytics custom table v2, Azure Data Collection Rules and Azure Data Ingestion Pipeline

data log powershell azure manipulation dcr ingestion-pipeline azure-pipeline data-collection-rules log-ingestion

Updated Jan 26, 2025
PowerShell

Morphl-AI / MorphL-Model-User-Search-Intent

Star

Google Cloud Storage connector, pre-processor and model for predicting user search intent based on keywords

machine-learning preprocessor pyspark nlp-machine-learning predict ingestion-pipeline morphl-platform

Updated Nov 2, 2019
Python

Morphl-AI / MorphL-Model-Publishers-Churning-Users

Star

Google Analytics connector, pre-processor and model for predicting churning users for digital publishers.

machine-learning google-analytics preprocessor prediction pyspark ingestion-pipeline morphl-platform

Updated May 27, 2019
Python

Clarifai / clarifai-python-datautils

Star

Extract Transform and Load unstructured data into the Clarifai's AI platform

ingestion dataengineering dataanalysis unstructured-data unstructured-text ingestion-pipeline unstructured-data-analysis unstructured-image

Updated Oct 22, 2025
Python

mcplusa / elastic-ingest-http

Star

This is an Elasticsearch Ingest Pipeline Processor that calls an HTTP(s) endpoint and adds the response back to the ingest document for further processing.

plugin elasticsearch embeddings ingest ingestion-pipeline

Updated Jan 1, 2025
Java

xinmiao14 / opensky-flight-pipeline

Star

Real-time flight data fetching, cleaning, and analytics API using FastAPI, Pandas, PostgreSQL, and Python.

postgresql data-engineering backend-api ingestion-pipeline fastapi

Updated May 19, 2025
Python

akshaybahadur21 / Emancipitaion-of-Apache-Spark

Sponsor

Star

My experiments with Apache Spark for Humans ⭐

spark apache-spark architecture ingestion-pipeline

Updated Mar 22, 2023
Java

SandeepGitGuy / Insurance_Documents_QA_Chatbot_RAG_LlamaIndex_LangGraph

Star

A Question Answering(Q/A) Chatbot on Insurance Documents. Powered by Retrieval Augmented Generation(RAG), LlamaIndex and LangGraph. Inspired from my Upgrad_IIITB PG Course.

chatbot python3 openai question-answering diskcache ingestion-pipeline serpapi llm langchain openaiapi llama-index vector-store chromadb insurance-dataset openai-embeddings langgraph gpt-4o-mini langgraph-chabot

Updated Dec 12, 2024
Jupyter Notebook

anhtuan284 / chest-xray-multi-disease

Star

Multi-disease segmentation chest X-rays by YOLO and DenseNet121, CoAtNet models

computer-vision deep-learning yolo chest-xray-images flask-api flutter-apps rag ingestion-pipeline densenet121 llamaindex ollama

Updated May 10, 2025
Jupyter Notebook

azuregig / work_with_OrdnanceSurvey_data

Star

Sample Azure Data Factory pipeline for ingesting Data Packages directly from the Download API of the Ordnance Survey Data Hub into Azure Storage.

azure geospatial-data ingestion geospatial-processing azure-data-factory ordnance-survey ordnancesurvey mastermap ingestion-pipeline os-data-hub-api

Updated Jul 5, 2022

tmcgrath / cassandra-ingest

Star

DataStax or Cassandra Ingest from Relational Databases with StreamSets

cassandra rdbms cdc ingest streamsets datastax ingestion-pipeline

Updated Mar 5, 2019
PLSQL

2dogsandanerd / DAUT

Star

DAUT – Documentation Auto Updater - AI-powered documentation generator for your codebase. MCP-Connector

pdf documentation code rag ingestion-pipeline mcp-server

Updated Mar 14, 2026
Python

victor-antoniassi / postgres-to-databricks-cdc

Star

Enterprise-grade ingestion blueprint for Postgres to Databricks powered by dlt. Features dual-mode operation (Full Load + CDC Load) and robust CI/CD via Databricks Asset Bundles.

data postgresql data-engineering cdc databricks dlt data-ingestion data-engineering-pipeline ingestion-pipeline lakehouse dlthub data-engineering-project lakeflow-jobs

Updated Dec 10, 2025
Python

CyberCRI / welearn-datastack

Star

Data stack for WeLearn LPI projects. This pipeline can collect, vectorize and store data from various sources.

learning data sdg ingestion-pipeline sutainability

Updated Mar 12, 2026
HTML

SandeepGitGuy / Insurance_Documents_QA_Chatbot_RAG_LlamaIndex_LangChain

Star

A Question Answering(Q/A) Chatbot on Insurance Documents. Powered by Retrieval Augmented Generation(RAG), LlamaIndex and LangChain. Inspired from my Upgrad_IIITB PG Course.

chatbot python3 openai question-answering diskcache ingestion-pipeline serpapi llm langchain llama-index vector-store chromadb insurance-dataset langchain-agent openai-embeddings gpt-4o-mini

Updated Dec 12, 2024
Jupyter Notebook

siddharth271101 / Stock-Exchange-Analysis

Star

Created a data pipeline using sqoop to ingest data from sql server into the hive table and used hive for feature engineering and analysis.

mysql big-data hive sqoop ingestion-pipeline

Updated Jun 5, 2020
Shell

sabit-shaiholla / agentic-file-query

Star

AI-powered document search agent using Google ADK and Gemini — scans, reasons, and follows cross-references instead of blind retrieval.

cli adk ai-agents rag ingestion-pipeline fastapi vector-database docling

Updated Feb 23, 2026
Python

Improve this page

Add a description, image, and links to the ingestion-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ingestion-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ingestion-pipeline

Here are 64 public repositories matching this topic...

bruin-data / ingestr

opensemanticsearch / open-semantic-etl

AstraBert / ingest-anything

KnudsenMorten / AzLogDcrIngestPS

Morphl-AI / MorphL-Model-User-Search-Intent

Morphl-AI / MorphL-Model-Publishers-Churning-Users

Clarifai / clarifai-python-datautils

mcplusa / elastic-ingest-http

xinmiao14 / opensky-flight-pipeline

akshaybahadur21 / Emancipitaion-of-Apache-Spark

SandeepGitGuy / Insurance_Documents_QA_Chatbot_RAG_LlamaIndex_LangGraph

anhtuan284 / chest-xray-multi-disease

azuregig / work_with_OrdnanceSurvey_data

tmcgrath / cassandra-ingest

2dogsandanerd / DAUT

victor-antoniassi / postgres-to-databricks-cdc

CyberCRI / welearn-datastack

SandeepGitGuy / Insurance_Documents_QA_Chatbot_RAG_LlamaIndex_LangChain

siddharth271101 / Stock-Exchange-Analysis

sabit-shaiholla / agentic-file-query

Improve this page

Add this topic to your repo