Skip to main content
You're offline. Cached data shown.
Building in Public10 min read

Building SpaceNexus: How We Aggregate Space Industry Data at Scale

SpaceNexus ingests data from 50+ sources — NASA APIs, NOAA feeds, SAM.gov, CelesTrak, and more — into a unified platform. Here is an honest look at the engineering, data quality, and product decisions behind the infrastructure.

By SpaceNexus TeamMarch 22, 2026

When we started building SpaceNexus, we made a decision that has shaped everything: we would integrate primary data sources directly rather than scraping content or relying on third-party data brokers. That meant building direct integrations with NASA APIs, NOAA's Space Weather Prediction Center, CelesTrak, SAM.gov, the FCC licensing database, and dozens more. Here is an honest account of what that looks like in practice.

The Data Source Landscape

The space industry's data infrastructure is a patchwork of public APIs, structured data feeds, semi-structured HTML, and completely unstructured documents. Our integrations fall into a few categories:

  • Well-maintained REST APIs: NASA has excellent public APIs (DONKI for space weather events, EPIC for Earth imagery, Exoplanet Archive, etc.) with consistent schemas and reasonable uptime. These are the easiest integrations to maintain
  • File-based feeds: CelesTrak distributes TLE (two-line element set) data as plain text files in a format unchanged since the 1980s. Authoritative, widely used, and reliably updated — but requiring custom parsing
  • Government procurement databases: SAM.gov offers a contract opportunities API and bulk data downloads. The data quality varies significantly by agency; some filings have excellent structured metadata, others are PDFs with minimal machine-readable content
  • RSS and structured news: The space industry has a rich journalism ecosystem — SpaceNews, NASASpaceFlight, SpaceFlightNow, Ars Technica, and dozens more. We aggregate via RSS where available, with content categorized using our own classification model
  • Financial data: Space-adjacent public companies trade on major exchanges; we integrate with market data providers for real-time and historical quotes, earnings data, and fundamentals

Architecture Decisions

SpaceNexus is built on Next.js 14 with the App Router, PostgreSQL via Prisma ORM, and deployed on Railway. A few architectural decisions are worth explaining:

  • Server-side rendering with aggressive caching: Space data has widely varying freshness requirements. TLE data is updated multiple times per day; launch schedules change on a timescale of hours to days; company profiles are relatively static. We use per-route caching headers and background revalidation to serve fresh data without hammering upstream APIs
  • Scheduled fetchers, not webhooks: Very few space data sources offer webhooks or push notifications. Nearly all our data ingestion is poll-based, with fetch intervals tuned to each source's update cadence and our freshness requirements
  • Graceful degradation: Upstream APIs go down. NOAA has occasional outages; SAM.gov bulk exports sometimes fail; third-party feeds go stale. We treat external data source failures as expected events, not exceptions. Every module has fallback behavior — either serving cached data or displaying a clearly labeled "data temporarily unavailable" state rather than breaking the page
  • AI-assisted categorization: With 50+ news sources generating hundreds of items per day, human curation is not scalable. We use language model classification to tag news items by topic (launch, policy, funding, technology, etc.), company mentions, and urgency. Categorization errors exist — we review flagged edge cases and iterate on the prompts

Data Quality Challenges

Aggregating data at scale surfaces quality problems that aren't visible when manually browsing source sites:

  • Duplicate launches: A single upcoming launch may appear in SpaceX's manifest, NASA's launch schedule, the range's public calendar, and three different news articles — each with slightly different dates, payload names, or vehicle designations. Deduplication requires entity resolution across inconsistent naming conventions
  • Outdated records: Government databases sometimes retain stale entries. A company may be listed as active in one database after it has been acquired or dissolved. We run periodic freshness checks and flag records that haven't been updated beyond expected intervals
  • Unstructured regulatory filings: FCC satellite license applications are filed as a mix of structured database fields and uploaded PDF exhibits. Extracting technically meaningful information (orbital parameters, frequency coordination) from the documents requires parsing that is partly manual

What We Have Learned

A few hard-won lessons from 18+ months of building:

  • Schema stability matters more than breadth: Early on, we tried to capture every available data field from every source. The result was a schema that changed constantly as source APIs evolved. We have since standardized on a narrower set of canonical fields per entity type, with raw source data preserved separately for future processing
  • Surface data provenance to users: Space professionals are appropriately skeptical of aggregated data. Showing the source, fetch timestamp, and raw data link for every data point builds trust and helps users catch errors we missed
  • Feedback loops are underrated: Some of our most valuable data corrections have come from users who noticed a discrepancy and reported it. We built a simple data feedback mechanism early, and it has paid dividends in data quality

We publish periodic engineering updates in this building-in-public series. If you have questions about our data sources or methodology, reach out via the community forum or the feedback widget on any module page.

Share this article

Share:

Get space intelligence delivered weekly

Join 500+ space professionals who get our free weekly intelligence brief.

Get space industry intelligence delivered

Join SpaceNexus for real-time data, market intelligence, and expert insights.

Get Started Free