← Back to Home

Open Evaluation Rubrics & Internal Methodology

Idea OS publishes how it evaluates ideas: seven category rubrics with explicit 1–5 criteria, LLM fact extraction, and deterministic scoring functions that map structured facts to those bands. An internal weighted aggregate (Global Strategic Strength Score, 0–100) keeps comparisons and history consistent across the engine. This page documents the methodology for transparency and trust—so you can see exactly what criteria the product uses beneath its narrative outputs.

Framework version 2.0 · Last updated March 19, 2026

Overview

Idea OS evaluates startup ideas across 7 categories: Market Demand, ICP Specificity, TAM/SAM/SOM, Competition & Differentiation, Monetization Strength, Execution Feasibility, and Risk Exposure. LLMs extract structured facts from your idea description; pure functions then map those facts to 1–5 category bands deterministically. Cross-agent validation detects contradictions and applies internal penalties so the engine stays internally consistent.

At a glance: Seven evaluation dimensions (Market Demand through Risk), each with published 1–5 rubrics. Facts are scored with deterministic rules; bands combine using published weights into an internal 0–100 aggregate (Global Strategic Strength Score), with penalties when agents contradict each other. We publish the full rubrics so you can verify how the engine works—not to promise a single number as the primary user experience.

Score 5 criteria by category:

A score of 5 in Market Demand requires validated demand signals (revenue or preorders) with evidence confidence ≥ 0.75.
A score of 5 in ICP Specificity requires a niche segment, acute pain specificity, and B2B/B2C, demographic, and behavioral traits all defined.
A score of 5 in TAM/SAM/SOM requires TAM, SAM, and SOM all estimated, data sources cited, and TAM size category large.
A score of 5 in Competition & Differentiation requires moat-level differentiation, switching costs present, and a clear differentiation statement.
A score of 5 in Monetization Strength requires strong willingness-to-pay evidence, pricing defined, and LTV logic present.
A score of 5 in Execution Feasibility requires low technical complexity, low dependencies, realistic team requirements, and MVP scope defined.
A score of 5 in Risk Exposure requires no single point of failure, no high platform dependency, and low market timing and adoption barriers.

Market Demand (1–5)

Measures validated demand signals, problem clarity, and trend/community indicators. A score of 5 requires validated demand signals (revenue or preorders) with evidence confidence ≥ 0.75.

Internal aggregate: Global Strategic Strength Score (0–100)

The engine’s weighted sum of category bands, minus contradiction penalties: Σ (score/5 × 100 × weight) − contradiction_penalty, rounded to 0–100. Used internally for consistency across versions, experiments, and exports—it is not the product’s primary user-facing output.

Category Rubrics

Each category has explicit criteria for scores 1–5. A score of 5 represents the strongest evidence; 1 is the baseline when criteria are not met.

Market Demand (20%)

Measures validated demand signals, problem clarity, and trend/community indicators.

Score	Criteria
5	validated_signal_present AND validated_signal_type in [revenue, preorders] AND evidence_confidence ≥ 0.75
4	validated_signal_present AND validated_signal_type in [interviews, waitlist] AND evidence_confidence ≥ 0.60
3	explicit_problem_defined AND (search_trend_signal = strong OR community_signal = strong)
2	explicit_problem_defined AND (search_trend_signal OR community_signal in [moderate, weak])
1	Otherwise

ICP Specificity (15%)

Measures target audience definition clarity, pain specificity, and segment precision.

Score	Criteria
5	icp_segment_clarity = niche AND pain_specificity_level = acute AND b2b_or_b2c_defined AND demographic_defined AND behavioral_traits_defined
4	icp_segment_clarity = niche AND pain_specificity_level = specific AND b2b_or_b2c_defined
3	icp_segment_clarity = defined AND pain_specificity_level in [specific, acute]
2	icp_segment_clarity in [broad, defined]
1	Otherwise

TAM/SAM/SOM (10%)

Measures market sizing completeness, data source citation, and TAM size category.

Score	Criteria
5	tam_estimated AND sam_estimated AND som_estimated AND data_source_cited AND tam_size_category = large
4	tam_estimated AND sam_estimated AND som_estimated AND data_source_cited AND tam_size_category in [medium, small]
3	tam_estimated AND sam_estimated AND data_source_cited
2	tam_estimated AND NOT data_source_cited
1	Otherwise

Competition & Differentiation (15%)

Measures competitive landscape awareness, differentiation clarity, and defensibility.

Score	Criteria
5	differentiation_type = moat AND switching_cost_present AND clear_differentiation_statement
4	differentiation_type = business_model OR (differentiation_type = experience AND switching_cost_present)
3	clear_differentiation_statement AND differentiation_type in [feature, experience]
2	(direct_competitors_identified OR indirect_competitors_identified) AND NOT clear_differentiation_statement
1	Otherwise

Monetization Strength (15%)

Measures revenue model clarity, pricing definition, willingness-to-pay evidence, and LTV logic.

Score	Criteria
5	willingness_to_pay_evidence = strong AND pricing_defined AND ltv_logic_present
4	pricing_defined AND pricing_logic_explained AND ltv_logic_present
3	revenue_model_defined AND pricing_defined AND NOT pricing_logic_explained
2	revenue_model_defined AND NOT pricing_defined
1	Otherwise

Execution Feasibility (15%)

Measures MVP scope clarity, technical complexity, dependencies, and regulatory risk.

Score	Criteria
5	technical_complexity = low AND dependency_count = low AND team_requirement_realistic AND mvp_scope_defined
4	mvp_scope_defined AND technical_complexity in [low, medium] AND regulatory_risk = none
3	mvp_scope_defined AND dependency_count = medium
2	dependency_count = high OR regulatory_risk = moderate OR NOT mvp_scope_defined
1	Otherwise

Risk Exposure (10%)

Measures single points of failure, platform dependency, market timing risk, and adoption barriers.

Score	Criteria
5	NOT single_point_failure AND NOT platform_dependency_high AND market_timing_risk = low AND customer_adoption_barrier = low
4	NOT single_point_failure AND NOT platform_dependency_high AND customer_adoption_barrier = medium
3	market_timing_risk = medium OR customer_adoption_barrier = high
2	platform_dependency_high
1	Otherwise

Global Strategic Strength Score

Internally, Idea OS computes the Global Strategic Strength Score (0–100) as the weighted aggregate of all 7 category bands, minus penalties for cross-agent contradictions. Each category contributes (score/5 × 100 × weight) to the total. The formula below is the same code path used by the evaluation pipeline.

Σ (score/5 × 100 × weight) − contradiction_penalty

The contradiction penalty is subtracted from the raw sum before rounding to 0–100.

Cross-Agent Validation

Idea OS uses cross-agent validation to ensure consistency when multiple agents analyze an idea. The following contradictions trigger a penalty:

Market Demand claims validated signals but no explicit problem is defined
Strong willingness-to-pay evidence but ICP is completely undefined
Low technical complexity claimed but high systemic risk flags present
Moat-level differentiation claimed but no competitors identified to differentiate from
Large TAM claimed without cited data sources
Revenue validation claimed but no pricing is defined in monetization
High regulatory risk but technical complexity rated as low
Hyper-niche ICP defined but TAM categorized as large — potential sizing inconsistency

Penalty: 5 points per violation, up to 20 points total (max 4 violations counted).

Ready for a structured evaluation with full reasoning?

Evaluate your idea