{"schema_version":"onlylabs.public_signal.v1","title":"Parasail Repo: parasail-ai/ocr_pipeline","description":"Parasail repo signal with public source context, captured evidence pages, related signals, and category-scoped analysis context.","url":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65","json_url":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65/signal.json","generated_at":"2026-06-11T04:09:58.640117+00:00","org":{"slug":"parasail","name":"Parasail","category":"neocloud","category_label":"Neocloud","dossier_url":"https://onlylabs.fyi/labs/parasail","dossier_json_url":"https://onlylabs.fyi/labs/parasail/dossier.json"},"related_urls":{"signal":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65","signal_json":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65/signal.json","source":"https://github.com/parasail-ai/ocr_pipeline","lab_dossier":"https://onlylabs.fyi/labs/parasail","lab_dossier_json":"https://onlylabs.fyi/labs/parasail/dossier.json","analysis":"https://onlylabs.fyi/analysis/parasail","analysis_json":"https://onlylabs.fyi/analysis/parasail/analysis.json","analysis_evidence_json":"https://onlylabs.fyi/analysis/parasail/evidence.json","category":"https://onlylabs.fyi/neoclouds","category_json":"https://onlylabs.fyi/neoclouds.json","category_feed":"https://onlylabs.fyi/neoclouds/feed.xml","category_signals_json":"https://onlylabs.fyi/signals.json?category=neocloud","topic":null,"topic_signals_json":null,"topic_feed":null,"data_business":null},"answer_pack":{"answer":"Parasail published parasail-ai/ocr_pipeline (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo parasail-ai/ocr_pipeline · language Python · New OCR pipeline repo, no traction yet. onlylabs links this event to 1 captured evidence page and 3 related repo signals.","signal_desk":"repos","source_context":{"source_url":"https://github.com/parasail-ai/ocr_pipeline","source_host":"github.com","occurred_at":"2025-11-01T17:16:04+00:00","first_seen_at":"2026-06-05T22:32:12.402256+00:00","date_source":"source","context":"Python"},"context_markers":[{"label":"Lab","value":"Parasail","source":"signal"},{"label":"Signal desk","value":"repos","source":"signal"},{"label":"Source host","value":"github.com","source":"source"},{"label":"Repository","value":"parasail-ai/ocr_pipeline","source":"source"},{"label":"Language","value":"Python","source":"source"},{"label":"Notability","value":"New OCR pipeline repo, no traction yet","source":"signal"},{"label":"Watch term","value":"Infrastructure","source":"evidence"},{"label":"Watch term","value":"Agents and tool use","source":"evidence"}],"evidence_coverage":{"target_pages":1,"captured_pages":1,"readable_pages":1,"capture_methods":["plain"],"missing_page_urls":[],"failed_page_urls":[],"blocked_page_urls":[],"page_urls":["https://github.com/parasail-ai/ocr_pipeline"],"related_signals":3,"has_source_url":true,"latest_page_fetched_at":"2026-06-11T04:09:58.640117+00:00"},"data_business":{"matches":false,"lanes":[],"matched_terms":[],"score":null,"reason":null},"agent_handoff":{"signal_json":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65/signal.json","dossier_json":"https://onlylabs.fyi/labs/parasail/dossier.json","analysis_json":"https://onlylabs.fyi/analysis/parasail/analysis.json","analysis_evidence_json":"https://onlylabs.fyi/analysis/parasail/evidence.json","topic_signals_json":null,"topic_feed":null,"category_signals_json":"https://onlylabs.fyi/signals.json?category=neocloud","data_radar_json":null,"opportunities_json":null},"analysis_playbook":{"objective":"Turn new repository signals into early evidence of tooling, eval, infrastructure, model-adjacent, or product work before it appears in polished launch channels.","evidence_focus":["repo name","owner","description","language","stars","source URL","first seen time","data, eval, infra, safety, and product terms"],"extraction_questions":["What technical area does this repository expose?","Does the repo imply eval, data, infrastructure, agent, or deployment work?","Is the repo new evidence for a lab direction that is not yet in writing or releases?","Which related signals should an analyst inspect next?"],"signal_questions":["What does this new repository reveal before a formal announcement exists?","What technical area does this repository expose?","Does the repo imply eval, data, infrastructure, agent, or deployment work?","Do the 3 related repo signals show a repeated pattern?"],"output_fields":["org","repo","technical_theme","evidence_url"],"data_business_relevance":"Data-business lane extraction is scoped to frontier labs; for this category, interpret the repository as source-grounded category strategy evidence.","required_sources":[{"label":"signal_json","url":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65/signal.json","required":true},{"label":"source","url":"https://github.com/parasail-ai/ocr_pipeline","required":true},{"label":"dossier_json","url":"https://onlylabs.fyi/labs/parasail/dossier.json","required":true},{"label":"analysis_evidence_json","url":"https://onlylabs.fyi/analysis/parasail/evidence.json","required":true},{"label":"topic_signals_json","url":null,"required":false},{"label":"data_radar_json","url":null,"required":false}],"expected_output":["one-paragraph source-grounded interpretation","category-specific implication","confidence and missing evidence","recommended next source to inspect"],"prompt_seed":"Using only the linked onlylabs JSON, captured source context, and cited evidence, analyze Parasail's repo signal \"parasail-ai/ocr_pipeline\" for neocloud strategy."},"semantic_triples":[{"subject":"Parasail","predicate":"published repo","object":"parasail-ai/ocr_pipeline","text":"Parasail published repo parasail-ai/ocr_pipeline."},{"subject":"parasail-ai/ocr_pipeline","predicate":"is classified as","object":"repo signal","text":"parasail-ai/ocr_pipeline is classified as repo signal."},{"subject":"parasail-ai/ocr_pipeline","predicate":"belongs to","object":"repos desk","text":"parasail-ai/ocr_pipeline belongs to repos desk."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has context","object":"Python","text":"parasail-ai/ocr_pipeline has context Python."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has evidence coverage","object":"1 captured evidence page","text":"parasail-ai/ocr_pipeline has evidence coverage 1 captured evidence page."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has captured page count","object":"1","text":"parasail-ai/ocr_pipeline has captured page count 1."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has readable page count","object":"1","text":"parasail-ai/ocr_pipeline has readable page count 1."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has related signal count","object":"3","text":"parasail-ai/ocr_pipeline has related signal count 3."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has analysis playbook objective","object":"Turn new repository signals into early evidence of tooling, eval, infrastructure, model-adjacent, or product work before it appears in polished launch channels.","text":"parasail-ai/ocr_pipeline has analysis playbook objective Turn new repository signals into early evidence of tooling, eval, infrastructure, model-adjacent, or product work before it appears in polished launch channels.."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has source host","object":"github.com","text":"parasail-ai/ocr_pipeline has source host github.com."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has lab","object":"Parasail","text":"parasail-ai/ocr_pipeline has lab Parasail."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has signal desk","object":"repos","text":"parasail-ai/ocr_pipeline has signal desk repos."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has source host","object":"github.com","text":"parasail-ai/ocr_pipeline has source host github.com."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has repository","object":"parasail-ai/ocr_pipeline","text":"parasail-ai/ocr_pipeline has repository parasail-ai/ocr_pipeline."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has language","object":"Python","text":"parasail-ai/ocr_pipeline has language Python."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has notability","object":"New OCR pipeline repo, no traction yet","text":"parasail-ai/ocr_pipeline has notability New OCR pipeline repo, no traction yet."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has watch term","object":"Infrastructure","text":"parasail-ai/ocr_pipeline has watch term Infrastructure."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has watch term","object":"Agents and tool use","text":"parasail-ai/ocr_pipeline has watch term Agents and tool use."}]},"intelligence":{"signal_desk":"repos","answer":"Parasail published parasail-ai/ocr_pipeline (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo parasail-ai/ocr_pipeline · language Python · New OCR pipeline repo, no traction yet. onlylabs links this event to 1 captured evidence page and 3 related repo signals.","semantic_triples":[{"subject":"Parasail","predicate":"published repo","object":"parasail-ai/ocr_pipeline","text":"Parasail published repo parasail-ai/ocr_pipeline."},{"subject":"parasail-ai/ocr_pipeline","predicate":"is classified as","object":"repo signal","text":"parasail-ai/ocr_pipeline is classified as repo signal."},{"subject":"parasail-ai/ocr_pipeline","predicate":"belongs to","object":"repos desk","text":"parasail-ai/ocr_pipeline belongs to repos desk."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has context","object":"Python","text":"parasail-ai/ocr_pipeline has context Python."},{"subject":"parasail-ai/ocr_pipeline","predicate":"has evidence coverage","object":"1 captured evidence page","text":"parasail-ai/ocr_pipeline has evidence coverage 1 captured evidence page."}]},"signal":{"id":"d1a4dbae-f132-45ed-9bd1-09f40c300a65","url":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65","json_url":"https://onlylabs.fyi/signals/d1a4dbae-f132-45ed-9bd1-09f40c300a65/signal.json","source_url":"https://github.com/parasail-ai/ocr_pipeline","title":"parasail-ai/ocr_pipeline","summary":"Parasail published a new repository. onlylabs watches repos for tooling, eval, infra, and model-adjacent work.","context":"Python","kind":{"key":"repo_new","label":"Repo"},"org":{"slug":"parasail","name":"Parasail","category":"neocloud"},"occurred_at":"2025-11-01T17:16:04+00:00","first_seen_at":"2026-06-05T22:32:12.402256+00:00","date_source":"source","evidence_coverage":{"target_pages":1,"captured_pages":1,"readable_pages":1,"capture_methods":["plain"],"missing_page_urls":[],"failed_page_urls":[],"blocked_page_urls":[],"page_urls":["https://github.com/parasail-ai/ocr_pipeline"]},"facets":{"repo":"parasail-ai/ocr_pipeline","language":"Python"},"traction":{"github_stars":0,"hn_points":null,"hn_comments":null,"hn_story_id":null,"hf_downloads":null,"hf_likes":null},"data_radar":null},"primary_evidence_page":{"url":"https://github.com/parasail-ai/ocr_pipeline","final_url":"https://github.com/parasail-ai/ocr_pipeline","title":"parasail-ai/ocr_pipeline repository metadata","http_status":200,"content_type":"application/json","capture_method":"plain","fetched_at":"2026-06-11T04:09:58.640117+00:00","bytes":13086,"raw_path":"398d072741cc4758e81f786673c6ea8d98b3a547ba5e844ce9d4fe6101c2b9cf.json","content_hash":"d36d9c6d393946794706dc138a29b0abfa95543ded2541cc41c6c2830d6f2159","excerpt_chars":1200,"truncated":true,"excerpt":"parasail-ai/ocr_pipeline Language: Python Stars: 0 Forks: 0 Open issues: 0 Created: 2025-11-01T17:16:04Z Pushed: 2025-11-11T17:35:28Z Default branch: main Fork: no Archived: no README: Parasail OCR Pipeline FastAPI-based web application for ingesting contract documents, storing them in Azure Blob Storage, and orchestrating OCR extraction and schema management. Designed to align with Material Design 3 principles and deploy to Azure App Service. Features - Upload contracts via Material 3-inspired interface with real-time status updates. - Persist file metadata and processing status in PostgreSQL (Azure Database for PostgreSQL). - Store raw documents in Azure Blob Storage with background Docling extraction scaffolding. - Parasail OCR integration wired for OpenAI-compatible API key usage with selectable models. - Persist OCR text fragments and classification suggestions for each document. - Schema builder API and UI for defining reusable key-value mappings and reapplying them to documents. - Automatic document-type heuristics suggest schemas and fields when none is selected. - Swagger/OpenAPI documentation automatically exposed at `/docs` and compatible with Scalar. - GitHub Actions..."},"evidence_pages":[{"url":"https://github.com/parasail-ai/ocr_pipeline","final_url":"https://github.com/parasail-ai/ocr_pipeline","title":"parasail-ai/ocr_pipeline repository metadata","http_status":200,"content_type":"application/json","capture_method":"plain","fetched_at":"2026-06-11T04:09:58.640117+00:00","bytes":13086,"raw_path":"398d072741cc4758e81f786673c6ea8d98b3a547ba5e844ce9d4fe6101c2b9cf.json","content_hash":"d36d9c6d393946794706dc138a29b0abfa95543ded2541cc41c6c2830d6f2159","excerpt_chars":1200,"truncated":true,"excerpt":"parasail-ai/ocr_pipeline Language: Python Stars: 0 Forks: 0 Open issues: 0 Created: 2025-11-01T17:16:04Z Pushed: 2025-11-11T17:35:28Z Default branch: main Fork: no Archived: no README: Parasail OCR Pipeline FastAPI-based web application for ingesting contract documents, storing them in Azure Blob Storage, and orchestrating OCR extraction and schema management. Designed to align with Material Design 3 principles and deploy to Azure App Service. Features - Upload contracts via Material 3-inspired interface with real-time status updates. - Persist file metadata and processing status in PostgreSQL (Azure Database for PostgreSQL). - Store raw documents in Azure Blob Storage with background Docling extraction scaffolding. - Parasail OCR integration wired for OpenAI-compatible API key usage with selectable models. - Persist OCR text fragments and classification suggestions for each document. - Schema builder API and UI for defining reusable key-value mappings and reapplying them to documents. - Automatic document-type heuristics suggest schemas and fields when none is selected. - Swagger/OpenAPI documentation automatically exposed at `/docs` and compatible with Scalar. - GitHub Actions..."}],"related_signals":[{"id":"f4ab57b0-1aa8-4c04-9258-3c46a0b00968","url":"https://onlylabs.fyi/signals/f4ab57b0-1aa8-4c04-9258-3c46a0b00968","source_url":"https://github.com/parasail-ai/speedboat-pub","title":"parasail-ai/speedboat-pub","context":"Python","kind":{"key":"repo_new","label":"Repo"},"org":{"slug":"parasail","name":"Parasail","category":"neocloud"},"occurred_at":"2026-05-12T23:35:40+00:00","first_seen_at":"2026-06-05T22:32:12.402256+00:00","date_source":"source"},{"id":"87e2198d-448b-4d18-ba0f-795a01ebc7f9","url":"https://onlylabs.fyi/signals/87e2198d-448b-4d18-ba0f-795a01ebc7f9","source_url":"https://github.com/parasail-ai/openai-batch","title":"parasail-ai/openai-batch","context":"Python","kind":{"key":"repo_new","label":"Repo"},"org":{"slug":"parasail","name":"Parasail","category":"neocloud"},"occurred_at":"2024-11-11T23:44:07+00:00","first_seen_at":"2026-06-05T22:32:12.402256+00:00","date_source":"source"},{"id":"1ba7765a-4da7-4d5b-8944-33f9d8d93d11","url":"https://onlylabs.fyi/signals/1ba7765a-4da7-4d5b-8944-33f9d8d93d11","source_url":"https://github.com/parasail-ai/cookbook","title":"parasail-ai/cookbook","context":"Jupyter Notebook","kind":{"key":"repo_new","label":"Repo"},"org":{"slug":"parasail","name":"Parasail","category":"neocloud"},"occurred_at":"2024-10-07T17:55:46+00:00","first_seen_at":"2026-06-05T22:32:12.402256+00:00","date_source":"source"}]}