An in-memory decision database for agents and applications that need to learn which decisions work โ and get better with every outcome.
Four moving parts. Four failure modes. Weeks to build. Months to maintain.
Four API calls โ one to define, three to learn.
BanditDB keeps weight matrices in memory. Every outcome you report updates those weights in microseconds, gradually building intuition about which choice wins for which context.
Name the campaign, list the arms, and set the context dimension. BanditDB initialises the weight matrices and is immediately ready to serve predictions.
curl -X POST http://localhost:8080/campaign \
-H "Content-Type: application/json" \
-d '{
"campaign_id": "sleep",
"arms": [
"decrease_temperature",
"decrease_light",
"decrease_noise"
],
"feature_dim": 5
}'
Pass the participant's context vector. Get back the recommended intervention and an interaction ID to track it.
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"campaign_id": "sleep",
"context": [1.0, 0.35, 0.50, 0.60, 0.96]
}'
# โ {"arm_id": "decrease_temperature",
# "interaction_id": "a1b2c3..."}
Apply the chosen intervention. BanditDB holds the context in its TTL cache, ready to receive the reward when the outcome is known.
# arm = "decrease_temperature"
apply_intervention(user_id, arm)
# lower bedroom temperature to 17ยฐC
# BanditDB waits for the outcome...
Report the outcome the next morning. Matrices update in microseconds. Every subsequent participant gets a smarter recommendation.
curl -X POST http://localhost:8080/reward \
-H "Content-Type: application/json" \
-d '{
"interaction_id": "a1b2c3...",
"reward": 0.27
}'
# โ "OK"
Standard LLM agents are stateless โ if they make a bad decision, they repeat it tomorrow. BanditDB's MCP server gives your entire fleet shared, persistent decision memory.
Two commands and BanditDB is a native tool in Claude, Cursor, or any MCP-compatible host. Agents can get intuition, record outcomes, and inspect learning state โ no config file editing required.
Every decision made by any agent in the swarm improves the routing for all future agents. The network accumulates judgment that no single agent could build alone.
BanditDB is not a chat memory or a vector store. It is a decision layer that learns which choices work for which context โ and gets better with every outcome.
BanditDB logs propensity scores at prediction time. Export to Parquet, run a Causal Forest, and go beyond correlation to genuine treatment effect estimates โ with confidence intervals and user-segment breakdowns.
Loading exports/sleep.parquet ... 1,847 interactions ยท 3 arms Fitting causal forests (one per arm) ... โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 1. AVERAGE TREATMENT EFFECT (90% CI) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ decrease_temperature +0.1842 [+0.098, +0.270] โ decrease_light +0.0421 [-0.018, +0.103] ~ decrease_noise -0.0297 [-0.104, +0.045] ~ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 2. CAUSAL ARM ASSIGNMENT โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ decrease_temperature 61.3% โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ decrease_light 24.7% โโโโโโโโโโโโ decrease_noise 14.0% โโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 3. FEATURE IMPORTANCE โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ decrease_temperature: activity 0.341 โโโโโโโโโโโโโ age_norm 0.287 โโโโโโโโโโโโ weight_norm 0.198 โโโโโโโโ sex 0.104 โโโโ bedtime_norm 0.070 โโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 4. WINNING SEGMENTS โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ decrease_temperature: activity (high, ฮ=+0.31), age_norm (high, ฮ=+0.19) decrease_light: sex (female, ฮ=+0.24), bedtime_norm (late, ฮ=+0.18) decrease_noise: age_norm (young, ฮ=+0.22), activity (low, ฮ=+0.14)
Temperature reduction has a statistically significant causal effect on sleep quality (+0.18, 90% CI entirely above zero). Light and noise show no reliable causal signal despite correlation in the raw data.
61% of users are causally best served by temperature. If the bandit's live distribution (Campaigns tab) matches this ratio, it has converged to the correct causal structure. A large mismatch means the model is still exploring.
High-activity adults respond best to temperature. Women with late bedtimes respond better to light reduction. Use these segments to design targeted trials or audit whether the bandit routes each profile correctly.
Requires LinUCB campaigns โ propensity scores are only logged for LinUCB. Thompson Sampling does not log propensities.
Six end-to-end examples, ordered from simplest to most advanced.
Temperature, light, or noise โ which adjustment works best for each person? A pure curl walkthrough, no SDK needed.
Discount, free shipping, or nothing โ learns which checkout offer closes each shopper without giving margin away.
Consult, intake form, refer, or decline โ learns which response maximises matter value for each enquiry profile, accounting for capacity and conflict risk.
Hold margin or liquidate? Learns from sell-through rate, holiday proximity, and competitor pricing โ context describes the market, not the user.
Learns which prompt strategy โ zero-shot, chain-of-thought, few-shot, structured โ produces the best response for each task type. Your evals run in production, not in a spreadsheet.
Routes patients toward the most effective treatment arm in real time as evidence accumulates โ no waiting months for interim analysis.
No sign-up. No cloud account. No configuration required.
Binary โ Linux, macOS, Windows
Docker