Shopify Rollouts · A/B Testing · 2026
Shopify's new native A/B testing: what Rollouts does, what it can't, and which tool to use
Shopify just shipped native A/B testing, called Rollouts. Here is exactly what it does, what it cannot test, how it stacks up against Intelligems, Shoplift, Convert and the rest, and how to decide which one you actually need.
For years, the answer to "does Shopify have built-in A/B testing" was no. You paid for an app, or you wired up Google Optimize, the free tool a lot of small stores leaned on. Then Google killed Google Optimize on September 30, 2023 (Google Analytics Help) and never shipped a replacement. The free option vanished, and the answer stayed no.
That changed this June. In the Spring '26 Edition (Shopify's "Everywhere" edition, June 17, 2026, more than 150 updates), Shopify made native A/B testing a real, built-in feature. After years of merchants asking, the splitter now lives in the admin.
Yes, Shopify finally has its own A/B testing. It is called Rollouts, it is native, server-side, and free of any add-on fee, and the honest catch is that it splits your traffic and reports the numbers but never tells you whether the result is real.
What did Shopify just launch for A/B testing and growth?
TL;DR Two things: Rollouts, Shopify's own server-side A/B testing for themes and checkout, and Campaign Autopilot, AI that runs your Meta ads and email from the admin.
Two separate launches, and people keep blurring them. Rollouts is the test: native, server-side A/B testing built into the admin. Campaign Autopilot is the marketing engine: AI that plans and runs your ads and email, and it lives in the renamed Growth tab. One splits the store you already have; the other goes out and buys traffic for it.
The split-test half has been a slow reveal. Rollouts first surfaced at Winter '26 in January, showed up in Shopify's March 31, 2026 developer changelog, and expanded on June 5, 2026 to cover whole themes plus checkout and customer-account configurations (Shopify changelog). By the Spring '26 Edition on June 17, 2026, the "Everywhere" edition (shopify.com/news/spring-26-edition-merchant), it was generally available, not a beta you had to request.
Around it, the same edition shipped a few headlines worth knowing so you can tell them apart from Rollouts. The Universal Commerce Protocol is now on by default (shopify.com/news/spring-26-edition-merchant). The old Shopify Scripts are being retired on June 30, 2026, which we cover below. And the Horizon theme system, which actually shipped back in Summer '25, is now the default for new stores (Shopify changelog, May 21, 2025). None of those is the A/B feature; Rollouts is. For the longer build, here is the full Rollouts walkthrough.
What is Shopify Rollouts, and how does it actually work?
A rollout is a scheduled set of changes to your theme, and on higher plans your checkout and account pages. It becomes an A/B test, what Shopify calls an experiment, the moment you send only a slice of traffic to the changed copy and let it run against your live store (Shopify Help Center).
You find it under Markets > Rollouts, the canonical spot, and it is also reachable from the theme area under Online Store > Themes. When you start one, Shopify makes a copy of your published theme so you can keep editing the live store on one side and the variant on the other without the two colliding (Shopify Help Center).
The control that turns a plain rollout into a test is "launch reach," a percentage slider. Leave it at 100% and you are simply scheduling a change to go live. Drop it below 100% and Rollouts treats it as an experiment, which forces a start and end date and begins splitting visitors between the two versions. You can ramp the reach gradually instead of flipping the whole audience over at once, and an end date triggers an automatic rollback, so a test that goes nowhere cleans up after itself. Rollouts also lets you run mutually exclusive experiments at the same time, so a shopper lands in one test, not three at once.
The part that matters for trust is where the split happens. It runs on Shopify's servers, before the page is built, so there is no client-side script swapping content after the page loads. That is the architecture difference behind the "no flicker" claim, and independent testing by DebugBear backs the idea that server-side assignment avoids the flash and the speed penalty that old script-based testers introduced. For why checkout increasingly lives in its own layer, see how the storefront is splitting from checkout.
A splitter shows you what happened. The honest part is waiting to confirm a winner instead of calling it early.
What can Rollouts test, and what can it not?
TL;DR Rollouts tests theme and layout changes (product page, hero, navigation, sections) and, on higher plans, checkout and account config. It cannot test pricing or discount logic.
The unit of a Rollouts test is a whole theme version, or a checkout or account configuration, and that version can carry any change you can make in the theme editor. The honest limit is the flip side of that: it tests the whole version, not one isolated section the way an app like Shoplift can.
On the "yes" side, that covers most of what merchants actually want to try: product-page layout, the hero, navigation, collection grids, product cards, content sections, banners, and localized content per market. On higher plans it extends to checkout and customer-account configurations. If you can build it in the theme editor, you can put it in an arm of the test.
On the "no" side, the big one is money: Rollouts cannot test product pricing or discount logic. Prices and discounts are not theme changes, so they sit outside what a theme-version split can reach (this reflects the feature's scope, corroborated by agency teardowns such as Conspire and Charle, not a Shopify statement). A couple of quieter limits matter too: Liquid template edits and global theme settings like color and type apply to both arms at once, so they are not testable against each other, and there is no audience segmentation beyond market. For why price is its own category, price tests are their own thing, and for the parts of the page that move revenue most, where the money actually leaks.
Rollouts can test
- Product-page layout
- Hero and homepage sections
- Navigation and menus
- Collection grids and product cards
- Banners and localized content per market
- Checkout and account config (Grow and up)
Rollouts cannot test
- Product pricing
- Discount and offer logic
- A single isolated section in one test
- Audience segments beyond market
- A real significance verdict or winner call
Do you need a paid plan to run a real Rollouts A/B test?
TL;DR Scheduling theme changes works on Basic and up, but the actual traffic-split experiment with analytics needs the Grow plan or higher.
Yes, partly. Scheduling and publishing a rollout works on Basic or higher. The traffic-split experiment, the A/B part, needs the Grow plan or higher, and market-specific targeting needs Advanced or Plus. Shopify's Help Center puts it plainly: experiments are available to stores on the Grow plan or higher.
The confusion comes from a 2026 rename. Shopify's plan ladder is now Basic, then Grow, then Advanced, then Plus, and "Grow" is the new name for the old mid-tier that used to just be called "Shopify." So native split testing unlocks at the second rung, not at Advanced. A lot of older guides and agency posts still say you need Advanced to A/B test with Rollouts. After the rename, that is stale.
"No extra monthly cost" is true in one specific sense: there is no add-on app fee. It is still plan-gated, so a store on Basic gets scheduling and rollback but not the split test. If the upgrade is the sticking point, here is how StorePilot runs on any plan without it, and if your traffic is thin, read the low-traffic playbook before you pay to test at all.
Schedule and publish on Basic and up. The traffic-split experiment with analytics needs the Grow plan or higher (about $105/mo, verify on shopify.com/pricing). Older guides saying "Advanced" are stale.
Plan names and prices change. Verify the current ones on shopify.com/pricing.
What is the Spring '26 "growth tool" everyone's asking about?
That is Campaign Autopilot, and it is not Rollouts. In the same edition, Shopify renamed the admin's Marketing tab to the Growth tab (the changelog reads "The Marketing tab is now the Growth tab") and added Campaign Autopilot, AI that plans and runs marketing for you across Meta (Facebook and Instagram), Shop Campaigns, and Shopify email.
You set the budget and approve what goes live; it optimizes within those guardrails (shopify.com/blog/introducing-campaign-autopilot). It is free on paid plans, so you pay the ad spend itself and nothing on top. At launch it is early access, and which channels you get depends on your region. Shop Campaigns, for example, is US and Canada only, with ChatGPT Ads, Microsoft Advertising, and Snapchat listed as coming.
Here is the line that keeps people from conflating the two. Campaign Autopilot's job is to drive traffic and spend ad budget. Rollouts' job is to split the theme you already built. Neither one watches how shoppers actually behave on the page, builds the fix for you, or runs a test scored honestly on revenue per visitor. Shopify has other AI surfaces around this, like Sidekick, but they are separate from both. For the wider set, see Shopify's other AI growth levers.
Does Rollouts tell you whether your test actually won?
TL;DR No. It shows side-by-side performance metrics, but Shopify documents no significance test, no automatic winner call, and no revenue per visitor, so the "did it win" call is still yours.
No. Rollouts reports the numbers and leaves the verdict to you. Its analytics show conversion rate, average order value (AOV), gross sales, sessions, bounce rate, and reached-checkout and add-to-cart rates (Shopify Help Center, "Rollout analytics"). What it does not document is the part that decides anything: a statistical-significance test, an automatic winner call, or revenue per visitor (RPV).
Four things are confirmed missing: a significance test, a winner declaration, RPV, and any projected-dollar range. On confidence intervals the picture is fuzzier, early teardowns disagree on whether Rollouts draws bands around the click metrics, and even on the most generous reading a band is not a significance verdict, a winner call, or an RPV scoreboard. Shopify's own changelog verb is telling: it says the feature helps you "find the winner," which means you find it. One footnote worth knowing: an Online Store theme experiment collects no analytics if you launch it immediately, only when you schedule it.
Rollouts pours your traffic into two versions and shows you what happened. It does not tell you whether what happened is real.
Why does the missing verdict matter so much? Because a live dashboard you can watch is a dashboard you can fool yourself with. The trap has a name, peeking: checking an in-flight test and stopping the moment it looks good. Evan Miller's classic write-up showed that repeated peeking can push a real false-positive rate from the 5% you think you are running at to about 26%. A tool that shows you a moving scoreboard with no significance gate makes that mistake easy. For the underlying idea, here is what statistical significance really means.
RPV is worth defining because it is the number Rollouts leaves out. It is total revenue divided by sessions, which is the same as conversion rate times AOV. That is why a "winning" variant can still lose money: if it lifts conversion but pulls AOV down, more orders can mean less revenue per visitor. Reading conversion rate alone hides that. For the full argument, see why revenue per visitor beats conversion rate.
Will free A/B testing actually make you money?
TL;DR Free removes the price of testing, not the two real barriers: most tests don't win, and most stores don't have the traffic to know which ones did.
Sometimes, and free genuinely helps. Removing the price of an A/B testing app is a real upgrade. It does not change the harder math underneath: most tests do not win, and most stores cannot gather enough traffic to know which ones did.
Start with the hit rate. Across large samples, only about 1 in 7 tests, roughly 14%, produces a clear winner (VWO, corroborated by Nielsen Norman Group). Mature programs land around 10% to 20%, and fresh stores with obvious problems to fix can see closer to 1 in 3 early on (Kohavi, Harvard Business Review, 2017). Then there is the size of the wins that do land: about 60% of completed tests deliver under 20% lift (Convert, 2026). Real, but rarely the doubling people imagine.
The second barrier is traffic. A small lift needs far more visitors per variant than a typical store can supply, so a low-volume store can run a test for months and still not reach a read it can trust. We worked the exact numbers in the low-traffic playbook, and the broader pattern, drawn from what we found auditing 1,000+ stores, is the same: free testing is worth having, but on thin traffic it pays off slowly, if at all.
Do you still need a third-party A/B testing app?
TL;DR Yes, if you need price tests, audience segments, a real significance engine, or a path that works on low traffic. Rollouts splits whatever you already built.
It depends on what you test. For straightforward theme and layout tests, on the Grow plan or higher and with enough traffic, Rollouts can replace the testing app you used to pay for. For anything past that, an app or an agent still earns its place.
It helps to be precise about where the paid tools actually win, because it is not where people assume. Apps like Shoplift, Convert, Optimizely, and Intelligems do have real significance engines, and several score on revenue and profit, not just clicks. So the gap they share with Rollouts is not "no statistics." It is the work on either side of the test: the part before, where someone has to spot the friction and build the variant, and the part underneath, where a low-traffic store needs a method that still reads when volume is thin. Keep that in mind reading the table, which lays out every option on the same terms.
Prices and plan gates move. We link the live source for each in the full tool breakdown, and all eight apps sit side by side in the best Shopify A/B testing apps for 2026.
| Tool | Native | Tests best | Builds the fix | Low-traffic path | Honest winner call | Rough price | Fair limitation |
|---|---|---|---|---|---|---|---|
| Shopify Rollouts | Yes (in admin) | Theme version, plus checkout and account config | No (you build it) | No | No documented significance test, winner call, or RPV | $0 add-on; A/B needs Grow plan or higher (~$105/mo) | No pricing or discount tests; segmentation is market-level only; you read the numbers yourself |
| Intelligems | No (app) | Price, offer, shipping, profit | No | No (needs order volume) | Yes (profit and RPV, significance) | Core $79 ($59 annual); Plus $499 unlocks price testing; Blue $999 | Price testing sits on the $499 tier; built for higher-volume stores |
| Shoplift | No (app, theme-native) | Theme, template, page tests, no flicker | Partial (Lift Assist builds pattern sections) | No | Yes (Bayesian significance; RPV not its headline) | Core $99 ($74 annual, up to 50k visitors); Advanced $299; Pro $699 annual | 2 variants per test; no shipping or checkout; visitor-metered cost climbs |
| Visually.io | No (app) | Testing, personalization, merchandising | Partial (AI assists, you still drive) | No | Yes (claims significance; RPV and calibration unconfirmed) | App Store $15 to $80; managed tiers ~$1,200 to $3,600 | Ships manufactured-urgency widgets; two pricing surfaces |
| Convert | No (standalone) | A/B, multivariate, multi-page, personalization | No | No | Yes (mature stats engine) | Growth $399 ($299 annual); Pro $599 ($420 annual) | Client-side, so anti-flicker masking rather than true server-side; you bring the hypothesis |
| Optimizely | No (enterprise) | Experimentation and personalization at scale | No | No | Yes (gold-standard stats engine) | Unpublished; reported ~$36k to $200k+/yr | Cost and complexity overkill for most Shopify stores |
| StorePilot | App (theme-safe, reversible) | Product-page and cart friction it finds for you | Yes (finds friction, builds the variant) | Yes (apply-and-measure, holdbacks, cross-store priors) | Yes (calibrated words, RPV scoreboard, projected-dollar range, no early winners) | Free first 3 months, then founding rate (anchor $129/mo, verify live); any plan | MVP scope is product page and cart; no native price or checkout testing (same platform limit as the rest) |
Prices and plan gates move. We link the live source for each in the full tool breakdown.
Which Shopify A/B testing tool should you pick?
TL;DR Pick by the change in front of you: Rollouts for safe theme splits when you have traffic, Intelligems for price, Shoplift or Convert if you want to drive a stats engine yourself, and StorePilot if you want the fix built and judged for you on any traffic.
Two questions settle it. First, do you already know the exact change and can you build it yourself? Second, do you have enough traffic to split and still reach a result you can trust? Your honest answers point at one tool, not five.
If you know the change, can build it, and have the traffic, Rollouts is the free, native, no-flicker splitter, and you are done. If the thing you want to test is price or shipping, that is Intelligems, because Rollouts cannot touch pricing. If you test often and want a real significance engine and segments you steer yourself, Shoplift or Convert earn their fee. And if you do not yet know what to test, cannot build the variant, or your traffic is too thin to split, that is where an agent like StorePilot fits, finding the friction, building the fix, and judging it on RPV. The tree below walks the same logic node by node.
If you would rather read it as plain rules, the ladder below says each path in one line.
These paths are not all either/or. Rollouts can be the splitter underneath while a smarter layer decides what to run and whether it won: here is how StorePilot fits on top of the splitter.
Where does StorePilot go further than Rollouts?
They are not the same kind of tool. Rollouts is the native, server-side splitter and reporter: it divides your traffic between theme versions and shows you the numbers. StorePilot is an AI conversion rate optimization (CRO) agent that decides what to test, builds the variant, and judges the result.
Four differences matter. First, Rollouts splits what you already built; StorePilot watches real shopper behavior, ranks the friction, and builds the variant for you, with a preview before anything goes live. Second, the Rollouts experiment is plan-gated and needs real volume to read; StorePilot is built for thin traffic, with apply-and-measure, holdbacks, cross-store priors, and a time-to-result estimate at your actual traffic. Third, Rollouts hands you raw metrics to eyeball; StorePilot reports calibrated confidence (exploratory, likely, strong), scores on RPV, gives a projected dollar range, and never calls an early winner. The winner call itself is always deterministic statistics, never the AI, a division of labor we argue for in Never let an AI decide your A/B test. Fourth, Rollouts is neutral plumbing; StorePilot filters every fix against your brand and screens out dark patterns.
Rollouts splits whatever you hand it. StorePilot decides what is worth handing over, builds it, and tells you honestly whether it earned more money.
Where Rollouts genuinely wins is real and worth saying plainly. It is native and server-side, with true zero flicker, no add-on fee, safe scheduled rollouts and automatic rollback, the deepest theme and checkout integration, and no third-party data sharing. None of that is small. So this is not StorePilot instead of Rollouts. StorePilot can run on Rollouts as the splitter, or run its own method when your traffic is too thin to split at all. For the wider picture, see the full CRO playbook, or check what StorePilot costs on pricing.
The table below lays the two side by side, capability by capability.
| Capability | Shopify Rollouts | StorePilot |
|---|---|---|
| Finds what to test | No (you bring the hypothesis) | Yes (watches real behavior, ranks the friction) |
| Builds the variant | No (you build it in the theme editor) | Yes (generates the variant, preview before launch) |
| Low-traffic path | No (needs Grow+ and real volume) | Yes (apply-and-measure, holdbacks, cross-store priors) |
| Statistical significance | No significance engine documented | Yes (min-traffic and significance thresholds, a range) |
| Declares the winner honestly | No (you eyeball the metrics) | Yes (recommended decision, no early winners, revenue winsorized so one whale cannot fake a win) |
| Revenue per visitor (RPV) | No (shows conversion rate, AOV, revenue, not RPV) | Yes (RPV is the primary scoreboard) |
| On-brand / dark-pattern filter | No (neutral plumbing) | Yes (brand profile filters off-brand fixes) |
| Checkout / pricing tests | Checkout config on Grow+ (building it needs Plus); pricing No | Pricing No (same platform limit); product page and cart in scope |
| Cost | $0 add-on, A/B needs Grow plan or higher | Free first 3 months, then founding rate (~$129/mo, verify live), any plan |
Where Rollouts wins outright: it is native and server-side with true zero flicker, no add-on fee, safe scheduled rollouts with automatic rollback, the deepest theme and checkout integration, and no third-party data sharing. StorePilot is happy to use it as the splitter.
What about checkout, and what is the Shopify Scripts shutdown?
These are two separate things merchants keep mixing up. Rollouts decides which theme version a visitor sees. Checkout discount, shipping, and payment logic lives in a different layer (Functions), and Shopify is switching off the old Shopify Scripts on June 30, 2026 (you already lost the ability to edit or publish them on April 15, per Shopify's developer changelog).
Rollouts can A/B test checkout and customer-account configurations on the Grow plan or higher. But meaningfully building a different checkout runs through Checkout Extensibility, which is Shopify Plus and has been generally available (GA) since early 2023, not a new Spring '26 feature.
The practical takeaway: on standard plans, checkout is a sealed box. That is exactly why the testable wedge is the product page and cart, where you can change the layout, the offer, and the path to add-to-cart on any plan. For where the pricing and checkout boundary sits, see the full tool breakdown, and for why the two are drifting apart, how the storefront is splitting from checkout.
Questions merchants keep asking
Does Shopify have built-in A/B testing now?
Yes. It is called Rollouts, built into the admin under Markets > Rollouts (also reachable from Online Store > Themes). It splits live traffic server-side between theme versions, so there is no app to install, no flicker, and no add-on fee. It became generally available with the Spring '26 Edition (Shopify Help Center).
What is Shopify Rollouts?
Rollouts is Shopify's native tool for scheduling, gradually publishing, and A/B testing changes to your theme (and, on higher plans, checkout and account configurations). It is included in your subscription, not sold as a separate app. A rollout becomes an experiment the moment you send only a slice of traffic to the changed version (Shopify Help Center).
Is Shopify Rollouts free?
Yes, in the sense that there is no add-on fee. Scheduling and publishing theme changes work on Basic and up. The actual traffic-split experiment, with its analytics, is gated to the Grow plan or higher (Shopify Help Center). So it is free of an extra monthly charge, but the A/B part is still plan-gated.
What plan do I need to A/B test with Rollouts?
Basic and up can schedule and publish rollouts. The real traffic-split experiment needs the Grow plan or higher, Shopify's second tier (about $105 a month, verify live). Many older guides and agency posts still say "Advanced," but that is now stale after Shopify's 2026 plan rename (Shopify Help Center).
Where is Rollouts in the Shopify admin?
It lives under Markets > Rollouts, which is the canonical location, and it is also reachable from the theme area under Online Store > Themes. If you went looking only under Themes and could not find the split-test controls, check Markets, where the experiment settings live (Shopify Help Center).
What can Rollouts test, and what can't it?
It tests theme and layout changes: product-page layout, hero, navigation, collection grids, sections, and banners, plus checkout and customer-account configurations on higher plans (Shopify changelog, June 5, 2026). It cannot test product pricing or discount logic, and it has no audience segmentation beyond market. The unit of a test is a whole theme version, not one isolated section.
Can Rollouts test pricing or discounts?
No. Prices and discounts are not theme changes, so they sit outside what Rollouts splits (feature scope, corroborated by agency analysis such as Conspire and Charle). For price, offer, margin, or shipping tests, an app like Intelligems still does what Rollouts cannot, working around Shopify's price-test limits with cart-transform functions.
Does Rollouts show statistical significance?
Shopify's analytics documentation lists performance metrics for a rollout (conversion rate, average order value, gross sales, sessions) but documents no statistical-significance test. So the dashboard shows you the numbers and leaves the "is this gap real or just noise?" judgment to you. That is the gap apps and agents fill (Shopify Help Center).
Does Rollouts pick a winner for me?
No. It shows side-by-side performance and you choose which version to ship. Shopify documents no automatic winner call and no revenue per visitor (RPV), the metric that counts whether people buy and how much they spend. The changelog's own verb is "find the winner," meaning you find it (Shopify Help Center).
Does Rollouts work for low-traffic stores?
Not well. The split-test experiment needs the Grow plan or higher and, more importantly, enough visitors to reach a trustworthy read. Detecting a typical small lift takes tens of thousands of visitors per variant, so a store doing a few thousand visits a month can wait a year or more for one clean result. See the low-traffic playbook.
Rollouts vs Intelligems: which do I need?
Use Rollouts when the test is about your theme and layout: product page, hero, navigation, sections. Use Intelligems when the test is about price, margin, shipping, or an offer, which Rollouts cannot touch. Many stores end up running both, one for the storefront, one for pricing (Shopify Help Center; Intelligems positioning, verify live).
Rollouts vs Shoplift: what's the difference?
Rollouts is free, native, server-side, and basic: it splits whole theme versions and reports raw metrics. Shoplift is a paid app (around $99 a month, verify live) with section-level tests, audience segments, and a Bayesian significance engine. Rollouts gives you the safe splitter; Shoplift adds the statistics and the finer targeting you drive yourself.
Do I still need a third-party A/B testing app?
It depends on what you test. For simple theme-layout tests on Grow or higher with traffic, Rollouts likely replaces an old testing app. For price tests, audience segments, a real significance engine, or a low-traffic method, an app or an agent still earns its place. Rollouts splits whatever you already built; it does not find or build the fix.
What is the Spring '26 growth tool?
In the Spring '26 Edition (June 17, 2026), Shopify renamed the admin's Marketing tab to the Growth tab and added Campaign Autopilot, AI that plans and runs your Meta ads and email. It is free on paid plans beyond the ad spend itself. That is separate from Rollouts, which splits the theme you built (shopify.com/blog/introducing-campaign-autopilot).
Did Shopify retire Shopify Scripts?
Yes. Shopify Scripts stop running on June 30, 2026, and you already lost the ability to edit or publish them on April 15, 2026 (Shopify developer changelog). Checkout discount, shipping, and payment logic now lives in Functions, and meaningfully rebuilding checkout runs through Checkout Extensibility, which is Shopify Plus and has been generally available (GA) since early 2023.
Does Rollouts cause flicker or slow my store?
No. Because the split happens on Shopify's servers before the page is sent, the assigned version is there at first paint, with no client-side script swapping content after load. That means no flash of the old version and no drag on your Core Web Vitals (DebugBear measured one real site improving from 6.0s to 2.7s largest contentful paint).