28 April 2026

I built pipequery, a pipe-based query language for JavaScript — over a weekend

nodejs javascript architecture websocket typescript data

This was a weekend project. I want to say that up front, because by the time I was done it had grown into something bigger than I planned, and I don't want anyone thinking I sat down with a six-month roadmap. I had two free days at Lake Garda, one nagging itch, and a coffee machine.

The result is pipequery — a small, zero-dependency query language for JavaScript and TypeScript that lets you filter, transform, aggregate, and join data using a left-to-right pipe syntax instead of the back-to-front nesting that SQL forces on you. Repo is here: https://github.com/andreadito/pipequery. Docs (with a playground) are here: https://andreadito.github.io/pipequery/.

This post is the "why" — what I was annoyed by, what I wanted, and what I learned along the way.

The itch

I keep ending up in the same situation. I have an array of objects in memory — orders, log lines, API responses, sensor readings, whatever — and I want to do something analytically obvious to it. "Give me the top ten by price, but only the paid ones, grouped by category, with a running total." In SQL that's three lines. In JavaScript it's a chain of .filter().sort().slice() calls if you're lucky, a reduce() with an accumulator object if you're not, and a Stack Overflow tab if it involves a join.

The Pandas/dplyr/Polars crowd solved this years ago for the data-science world. The SQL world is finally getting there too — BigQuery shipped pipe syntax, PRQL has a great following, KQL and ES|QL have proven that analysts genuinely prefer reading data flow left-to-right. But on the JavaScript side, if you want a real query language over an array, your options are basically: (a) pull in a giant ORM, (b) hand-write a chain of array methods, or (c) bring an entire SQL engine into the browser.

I wanted option (d): a tiny string-based query language I could drop into any JS/TS project, with no dependencies, that read like a Unix pipeline.

What it looks like

The whole API is one function. You give it an array (or a map of named arrays for joins) and a query string:

import { query } from '@andreadito/pipequery-lang';

query(items, 'where(price > 100) | sort(price desc) | first(10)');

That's it. Each operation receives the output of the previous one. You read the query in the order the data actually flows through it. No subqueries, no CTEs, no working out which clause runs first.

Joins look how you'd hope:

query(
  { orders, customers },
  'orders | join(customers, customerId == id) | select(id, name, total)'
);

Group-and-aggregate is groupBy into rollup:

query(items, 'groupBy(category) | rollup(sum(price) as total, count() as n) | sort(total desc)');

Once I had the parser working, I kept adding the things I personally reach for — window functions (running_sum, lag, row_number), statistical aggregates (median, stddev, percentile), pivots, transposes, the usual data-wrangling toolkit. Nothing exotic, but having them built in means I don't have to re-derive them every time I open a notebook.

The thing I'm proudest of: live queries

A regular query() call runs once and returns. That's fine for batch work, but the reason I actually wanted this language is dashboards. Live monitors, ops screens, anything that's getting incremental updates over a WebSocket and needs to re-render the top-N-by-something a few times a second.

So liveQuery exists. You give it a query, a key field, and an optional throttle, and you push patches into it as data arrives:

const lq = liveQuery(events, 'where(severity > 2) | sort(timestamp desc) | first(20)', {
  key: 'id', throttle: 100,
});

lq.subscribe((result, stats) => renderTable(result));

ws.onmessage = (m) => lq.patch([JSON.parse(m.data)]);

It re-executes efficiently, gives you per-tick stats (patch time, execution time, row count), supports batching and live query swapping, and stays under 100ms re-render even on busy feeds. This was the part I most wanted to exist, and the part I had the most fun writing.

Where the weekend got out of hand

Here's where I confess that "a weekend" became "a weekend, and then several evenings, and then another weekend." Once the language and the live engine were working, it became really tempting to put a CLI on it.

So now there's pq. You point it at a pipequery.yaml with some data sources and it serves them as live JSON API endpoints you can spin up with one command:

pq endpoint add /api/top-articles -q "articles | sort(views desc) | first(10)"
# http://localhost:3000/api/top-articles is now live

The data-source list is where the project genuinely got out of hand. There are eleven of them now — REST APIs (polled, with env-var auth interpolation), WebSockets (with auto-reconnect, subscribe payloads, heartbeats), local CSV/JSON files (with watch-mode hot reload), inline static fixtures, Postgres, MySQL, SQLite, Snowflake, ClickHouse, MongoDB, and Kafka. For the database sources, pipe operations get pushed down: where, sort, first, select, distinct, and rollup all compile straight to a parameterized SQL statement (or a Mongo aggregation pipeline) against the database, no in-memory materialization. If you write something the push-down planner doesn't recognize, it transparently falls back to the in-process engine. Same call surface, no config switch. This was the part that took the longest. It also turned out to be the most satisfying.

Everything is hot-reloadable. You can pq source add a new database connection, pq endpoint add a new query against it, and have it serving traffic ten seconds later — no restart, no edit-the-yaml-and-redeploy cycle. The CLI is fully decoupled from the server, too: pq remote connect <url> points your local CLI at a remote pq serve instance, so you can run the server in Docker on a remote box and drive the whole thing from your laptop terminal. That fell out naturally from building everything as a thin wrapper over an HTTP control API, and it surprised me how nice it was to use.

There's a terminal dashboard (pq dashboard) built on Ink — tables, bar charts, sparklines, single-value stat panels, heatmaps. Resizable panels, keyboard-driven, real-time SSE updates with polling fallback. Surprisingly fun to write; React-for-the-CLI is genuinely a great DX.

There's an MCP server so Claude Desktop / Cursor / Copilot / any MCP-aware client can run pipequery against your live data — five tools (query, list_sources, describe_source, list_endpoints, call_endpoint), bearer-token auth, stdio or HTTP transport.

The bit that turned out unexpectedly delightful is the Telegram bot. It exposes the same five verbs as Telegram slash commands, but if you give it an Anthropic API key it'll also accept plain-English questions, have Claude Haiku translate them into pipequery, run the query, and post back a Markdown table — with the translated expression shown right above the result so you can see what got run. The translator uses two prompt-cache breakpoints (the always-stable grammar prompt, and the per-tenant schema preamble) so repeated translations stay cheap. Pipequery's grammar is small enough that small/cheap models translate to it well, which I didn't realize was going to be true until I tried it.

There's a watches system — register a query, an interval, and a fire condition (when_non_empty, when_empty, on_change), and pipequery will post to a Telegram channel when the result transitions. Idempotent across modes, so no flapping on bursty data. Templated messages with field substitution from the result row. Persisted in the same yaml as everything else.

I did not plan any of this on Friday night. Each piece took about an evening, fit on top of the language without changing it, and turned out to be the kind of thing where adding it costs you a hundred lines because the foundation does the work.

The bit I want to flag separately: editor support

Halfway through I realized a query language without good editor support is a much weaker tool than one with it. So pipequery also ships:

A CodeMirror 6 extension for syntax highlighting and tokenization in browser editors (the docs playground uses it).
A TextMate grammar that gives you the same highlighting in VS Code, IntelliJ, and Sublime Text.
A React component, PipeQueryBuilder — a visual, click-to-build pipeline UI for users who don't want to learn the syntax. You hand it the available sources and field names; it produces a query string.

Errors are position-aware: every parse and runtime error carries line and column, which is what makes red squiggles in a CodeMirror gutter actually work. That was one of the small details that took disproportionate care to get right and that I'd never have bothered with if I weren't using it myself.

Why I actually built it

The honest answer is: because I kept reaching for it and it didn't exist. Every dashboard project I've worked on, every analytics-shaped side experiment, every "let me just slice this JSON real quick" — there's a query-language-shaped hole in JavaScript. Pandas owns Python. dplyr owns R. SQL owns the database. JavaScript got Array.prototype.reduce and a shrug.

I also wanted to see if I could keep it genuinely zero-dependency. The core language is — no parser generator, no lodash, nothing in the bundle but the code I wrote. That constraint forced a lot of decisions to be smaller and sharper than they would have been otherwise. I think the code is better for it.

Where it stands

It's MIT-licensed, with a docs site, a playground, and the CLI as a separate package. It's a beta because I want feedback from people who weren't me before I call anything stable. If you try it and something feels wrong, please open an issue — that's exactly the kind of input I can't generate alone.

If you read this far: thanks. Go build the small useful thing you've been putting off. Sometimes a weekend is enough.