added video transcripts

This commit is contained in:
Shayan Rais
2026-03-19 18:17:32 +05:00
parent 6de173c410
commit 20a4e72145
6 changed files with 1631 additions and 0 deletions
File diff suppressed because one or more lines are too long
@@ -0,0 +1,332 @@
# Building Claude Code with Boris Cherny — The Pragmatic Engineer
Transcript of the interview with Boris Cherny ([@bcherny](https://x.com/bcherny)), creator of Claude Code, on The Pragmatic Engineer podcast, published March 4, 2026.
<table width="100%">
<tr>
<td><a href="../">← Back to Claude Code Best Practice</a></td>
<td align="right"><img src="../!/claude-jumping.svg" alt="Claude" width="60" /></td>
</tr>
</table>
---
## Video Details
- **Guest:** Boris Cherny (Creator of Claude Code)
- **Host:** Gergely Orosz (The Pragmatic Engineer)
- **Published:** March 4, 2026
- **YouTube:** [Watch on YouTube](https://youtu.be/julbw1JuAz0)
---
## Transcript
[`0:01`](https://youtu.be/julbw1JuAz0?t=1) You were the first ever TypeScript book with O'Reilly.
[`0:04`](https://youtu.be/julbw1JuAz0?t=4) Yeah, I found that book translated in Japanese in this little town in Japan. That was just the coolest moment. And then I realized I don't remember TypeScript at all. Now we're at the point where Quad Code writes, I think something like 80% of the code had Enthropic on average. I wrote maybe 10 20 p requests every day. Opus 4.5 and Quad Code wrote 100% of every single one. I didn't edit a single line manually.
[`0:22`](https://youtu.be/julbw1JuAz0?t=22) Andre Carpet posted that he's never felt as much behind as a programmer as he is now.
[`0:26`](https://youtu.be/julbw1JuAz0?t=26) This is something I really struggle with. The model is improving so quickly that the ideas that worked with the old model might not work with the new model. One metaphor I have for this moment in time is the printing press in the 1400s because there was a group of scribes that knew how to write.
[`0:42`](https://youtu.be/julbw1JuAz0?t=42) Some of the kings were illiterate who are employing the scribes.
[`0:45`](https://youtu.be/julbw1JuAz0?t=45) And if you think about what happened to the scribes, they ceased to become scribes, but now there's a category of writers and authors. These people now exist. And the reason they exist is because the market for literature just What happens when you join one of the top AI labs in the world and your first poll request gets rejected? Not because the code was bad, but because you wrote it by hand. This is exactly what happened to Boris Churnney when he joined Antrophic. Boris is the creator and engineering lead behind Claude code. Before joining Androphic, he spent 7 years at Meta where he led code quality across Instagram, Facebook, WhatsApp, and Messenger, and was one of the most prolific code authors and code reviewers at the company. In today's episode, we cover how Cloud Code went from a side project to one of the fastest growing developer tools and the internal debate at Entrophic whether to release it at all. Boris's daily workflow of shipping 20 30 poll requests a day with zero handwritten code and how code review works when AI writes everything. Why Boris believes we're living through a time as transformative as a printing press and which engineering skills matter more now and which ones do not. If you want to understand how one of the people closest to AI coding agents actually builds software today and what that means for the rest of us engineers, this episode is for you. This episode is presented by Statsig, the unified platform for flags, analytics, experiments, and more. Check out the show notes to learn more about them and our other season sponsors, Sonar and Work OS. How did you get into tech, software engineering, and and coding in general?
[`2:14`](https://youtu.be/julbw1JuAz0?t=134) It starts a while back. I think there was kind of like two parallel paths that crossed. So, when I was maybe 13 or something like this, I started selling my old Pokemon cards on eBay. And I realized that on on eBay, you can actually like write HTML. And I was looking at other people's Pokemon card listings and I realized like some of them have like big colors and fonts and stuff like this. And then I discovered the blink tag and I named Blink Tag.
[`2:41`](https://youtu.be/julbw1JuAz0?t=161) And if I put the blink tag on it, I could sell my card, you know, for like 99 cents instead of 49 cents or whatever. So I kind of learned about HTML this way. Then I got an HTML book and kind of learned about HTML. And then uh the second thing was this was also I think sometime in middle school. We had these old TI83 uh graphing calculators and we use them for math. And what I realized is I can get a better answer on the math test if I just program the answers to the math test into my calculator. And so I wrote these little programs to just program the answers and then the test got harder. first then I had to program solvers instead of the actual questions cuz I didn't know what what you know the coefficients and stuff would be ahead of time and then the math got more advanced like the next year and so I had to drop down from basic to assembly to just make the program run a little bit faster.
[`3:28`](https://youtu.be/julbw1JuAz0?t=208) Oh wow. So like in high school you dropped down to assembly.
[`3:30`](https://youtu.be/julbw1JuAz0?t=210) I think this is like middle school or high school maybe like 8th or 9th grade or something like this. Then then the thing I realized is uh everyone in my class was starting to realize that I had the solver and they got kind of jealous and so I bought this little serial cable. so I can give it to them too. And then the next math test, everyone on the class just got A's. And the teacher was like, what's going on? And then eventually she realized it. She was like, okay, you get away with it once and and uh knock it off. But for me, it it was very practical. So, you know, in school I studied economics. Um I actually dropped out to to startups and I never thought that coding would be a career at all. It was always very practical to me. Coding is a means to build things and to to make useful things. this startup. Um, the first one was I think it's like my friends and I were trying to get weed and so we started this like weed review startup. We made like a website. We called kind of different uh dispensaries I I think and then we just tried to get kind of like weed samples so we could like review it for them. And it actually kind of blew up. Um, and then I actually got more interested in uh at the time no one was like testing this stuff and so I got into kind of the like chemical testing kind of chemical analysis and then after this I kind of did a bunch of other startups and then I joined YC actually pretty early uh and I was the first hire of uh this YC startup up in up in Palo Alto after.
[`4:54`](https://youtu.be/julbw1JuAz0?t=294) How did you decide to go go to one startup after the other?
[`4:57`](https://youtu.be/julbw1JuAz0?t=297) Kind of vibes vibes I'd say cuz you know you know like you know startups it's it's never a linear path. You always kind of pivot pivot pivot. You have to figure out what the market wants and what users want. And it's never the thing that you think. You you always try a thing, but the the idea is always a hypothesis and then almost always you have to pivot once, twice, three times. You know, at at this uh at this medical software company, this is called Agile Diagnosis. This was kind of an early YC company. This was back in maybe 2011, 2012, something like that. It was medical software for doctors. And the idea was there's these like clinical decision protocols. They vary a lot hospital to hospital. And our idea was there was one hospital in Chicago that had a really great protocol specifically for cardiac symptoms. And so we're like, wouldn't outcomes be great if every hospital in the US would use the same protocol? And so we tried to standardize it. And we made this like decision tree software for doctors to use. And I wrote, you know, some of the software. The team was like it it was it was just a few of us. It was a pretty small team. And I wrote the software. It was in a web browser. And I remember this was back in the like the Internet Explorer 6 days. that's what hospitals were using
[`6:06`](https://youtu.be/julbw1JuAz0?t=366) and I wrote this like SVG renderer uh because it was this visual decision tree and we launched it and then we had a DAU chart and the DUS were flat and couldn't figure it out and we were piloting it with a few hospitals at the time and at the time we were based in PaloAlto we were piloting it with uh you know a few hospitals including UCSF and I rode a motorcycle at the time so I rode my motorcycle up to you know UCSF and I shadowed doctors for a couple days just to see how how do they actually use And I realized that actually doctors don't have time to sit down and use a computer because you're seeing a patient
[`6:42`](https://youtu.be/julbw1JuAz0?t=402) then you have maybe 5 minutes until the next patient and in those 5 minutes you have to walk down the hall you have to go to the computer station you have to open up this totally legacy computer. By the time it boots up that's like 3 minutes. Then you open up Inner Explorer 6 that takes like 30 seconds. Then you have to open up this like app that we built. You have to sign in and your 5 minutes are up. you don't even have time to use it. And so we rewrote everything to run on Android and they still weren't using it. And the thing we realized is doctors are walking around with a bunch of residents behind them. In this kind of situation, it's like a social situation, right? Like the thing that matters is they're seen as an authority. They don't want to be seen on their phones. And then we pivoted again. So at that point, we were like, okay, so maybe the doctor isn't the target user. Actually, we wanted to be used by maybe nurses or X-ray technicians or something like this. At that point, I left because I was like, "This is actually pretty far off from kind of what I wanted to do." This is like the most fun thing for me is finding this this product market fit because it's always surprising. You can't have one big idea because the idea is probably going to be wrong. So, you kind of form hypothesis, you you follow it down and and you see what's right. Also, I find it so interesting how you're telling us this story because I feel behind a lot of startup success stories, we hear the success story. We hear the path of how it went. But first of all, a lot of startups are like this. And second of all, what struck me is you you were hired as a software engineer, right? And this was back before product engineers or anything was a thing which we're now talking about. But you just like you rode your motorbike and you went there and you shadowed the people and you understood how they're using it, why they're not using it. getting getting ideas. I I feel, you know, this this is what makes a great software engineer back then and and even today, right? You you weren't doesn't seem to me that you were focused on a technology. You were focused on the outcome, though.
[`8:31`](https://youtu.be/julbw1JuAz0?t=511) Yeah. I mean, look, there there's different kinds of engineers and there's different ways to do it. And you know, I even even on our team right now, I look at an engineer like Jared Sumar and he's just incredible technical mind. He understands systems better than anyone I've met. And you know you need you need people like this. You need people with this kind of depth. For me engineering has always been a practical thing. Uh and you know for me I've always been a generalist and like it doesn't matter if I'm doing you know like design or you know if I'm doing engineering or user research or whatever. The investment thesis for AI and software engineering is straightforward. As AI writes more code more code needs to be verified. But there's a catch. AI generated code is on average harder to verify than human written code. This is why there's Sonar, the makers of Sonar Cube. As a critical verification layer for the AI enabled world, Sonar ensures that speed and volume with AI does not compromise your codebase. Sonar's competitive position is built on 17 years of specialized expertise that no foundational model can replicate. We're talking about deep analysis engines like symbolic execution and cross- repository data flow tracking that simulate how code actually behaves, not just what it says. To bridge the divide between AI productivity and code quality, Sonar has released the Sonar Cube MCP server. This tool acts as a universal translator between AI applications and the Sonar Cube platform. By using the modal context protocol, it gives AI tools like cloud code, GitHub copilot, and cursor direct access to sonar cubes analysis capabilities. Instead of context switching, your AI agent becomes a full-fledged code review and quality assurance copilot capable of analyzing code snips for issues, filtering bugs by severity, and even checking your project's quality gate status before you ever commit code. Whether you're working with coding assistants or scaling up with full agogentic workflows, Sonar provides the automated verification that 75% of the Fortune 100 rely on. It's about giving your developers the freedom to innovate without the fear of breaking the code base. Head to sonarsource.com/pragmatic to learn more about how Sonar enables the confidence to develop at the speed of AI. With this, let's get back to Boris's career and what he learned working at startups. My first job I ever had, I was like, I think I was 16 and I just wanted to buy an electric guitar. And so what I did was I I started uh I just started freelancing. And so I was like, "Okay, I guess I'll make websites." And I think Fiverr was not a thing back then. So there were some other freelancing websites. So I just started like I put up a website. I started bidding on stuff. And my first paycheck, I just spent the entire thing on an electric guitar. But it but it was very practical, right? Right? Cuz it's like when you're in this kind of setup, you have to you have to do the engineering, you have to do kind of the accounting, you have to do the the design, you have to talk to customers. It's just always been like that for me. After a couple of these startups, you ended up at Facebook now now called Meta. And there you spent seven years there. Can you just talk us through what you've worked there, what you've learned there? You've also had a very remarkable career growth in terms of four promotions over over over seven years. And what did you take away from that that experience?
[`11:39`](https://youtu.be/julbw1JuAz0?t=699) Yeah, so I started on Facebook groups. That was the first time I worked on uh Vlad Klesnikov uh hired me. I think I think he's actually still at Facebook. Um I think he's on some other team now. And it was cool actually. There there's a big group of people that I worked with that were these kind of early JavaScript people too. And you know, like I I did a bunch of JavaScript stuff. And it's funny like I kept crossing paths with these people. And so Vlad, he worked on Bolt.js, JS which was the software it was the framework that powered ads manager which later became ReactJS. I I kept crossing paths with these people and later on for yeah later on there there was a bunch more people like this but anyway so I I was working on Facebook groups um I was really excited about it because the because of this mission of connecting people to their community. This is the thing that drew me in. And at the time I was a big Reddit user. I became a Reddit user back when I was a teenager because I didn't know anyone else that coded. Even in college, I didn't really know anyone that coded.
[`12:37`](https://youtu.be/julbw1JuAz0?t=757) And honestly, I was always kind of embarrassed about it cuz I thought it was this nerdy thing. And I thought it was kind of this this thing that I knew how to do, but I wanted, you know, I wanted to be like a cool kid and, you know, like I I couldn't like tell people that I coded. It was like it was very nerdy. Um, and and at some point I discovered it was some like programming community on Reddit and I was I was just shocked like there's other people that are into this thing. It's like such a weird hobby. It's so niche and it was just so exciting to find like-minded people like this and get this connection and so I just wanted to work on this. I wanted to kind of contribute to this in in some way. So I worked on Facebook groups for a while. Um, and then you know there there's a bunch of different projects have to to kind of get get into details for any of these. Eventually I became the the tech lead for for Facebook groups and kind of grew grew into this and the org grew the work really changed. It changed from kind of building to a lot of like dock writing and coordination and kind of delegating to others. The culture was changing at the time. So you know this early Facebook culture was disappearing. The docs were coming in. The you know alignment meetings were coming in. uh there was a lot of a lot more work around this kind of foundational stuff like privacy, security, things like this that I think honestly early on a lot of corners were cut in order to grow. But at some point you just have to pay that debt and that was the time when that happened. Then I spent a few years at Instagram after um and that was also a funny story. My wife got a got a job offer and she was just really excited about it and she came to me and was like, "Hey, like I got this offer but we're going to have to move. Is that okay?" And I was like, "Yeah, that's fine." You know, like I work in tech. we can work remotely anywhere. Where's the job? And she was like, it's a N. And I was like, where where's that? And uh N is like rural Japan. And this was uh
[`14:17`](https://youtu.be/julbw1JuAz0?t=857) different time zone as well.
[`14:19`](https://youtu.be/julbw1JuAz0?t=859) Different time zone. Yeah. This was
[`14:20`](https://youtu.be/julbw1JuAz0?t=860) 12 hours or something different or something like that.
[`14:22`](https://youtu.be/julbw1JuAz0?t=862) Something like that. Yeah. It was like 2021.
[`14:24`](https://youtu.be/julbw1JuAz0?t=864) Wow.
[`14:25`](https://youtu.be/julbw1JuAz0?t=865) Um and then I I tried to kind of find a team that would sponsor me cuz there was there were these kind of arcane HR rules about like the time zone you have to be in and the team you have to be collocated with and so on. And so uh there was a little kind of naent team uh for Instagram in Tokyo and Will Bailey was running this team. He was also the guy that made Instagram stories and uh so he was my manager for a while and so we decided to grow that team together and I worked remotely from NA and then most of the team was in Tokyo and uh during this time I I started hacking on Instagram and the stack was just insane like Facebook was the single best web serving stack in the world. the the way that HH everything is optimized like from from the hack language to the HHVM runtime to the to GraphQL as the transport layer to like the client libraries like relay and and all the stuff it was just and in React it was just amazing there there's no other devstack in the world that was this good and it's just fully optimized and then I went to Instagram and it's like you know Python where the type checker didn't work and click to definition didn't work and it was this like kind of hack together Django and then like a work of uh you know the Syon runtime and just nothing really worked and so I came to Instagram I joined the labs team uh you know in in Japan and the idea was to find the next big thing for Instagram. We tried some stuff but what I very quickly realized is that I was just not effective at working on the stack because it was such a terrible stack and so I just went and started working on Dev Infra because uh we we needed to fix it and there there's a few projects that we worked on. So one was migrating from Python to the big Facebook monolith. Another one was migrating from Rest to GraphQL. And uh these projects, they're they're actually in progress, you know, like these are things that involve it takes hundreds of engineers many years to do this. It's a big code base. It's a big migration. Um now it's it's much faster.
[`16:16`](https://youtu.be/julbw1JuAz0?t=976) Yeah. With with with these tools that we have, the AI AI tools and migrations are a pretty good use case for them though.
[`16:21`](https://youtu.be/julbw1JuAz0?t=981) Yeah. It's like the it's the perfect use case for it. And then I I just started getting kind of deeper into this. And by the end, by the time I left Instagram, so I was working on this on dev and kind of leading a bunch of these migrations. That's also where I intersected with Fiona Fun who is now the manager for the quad code team. I just worked with her and she was just such an amazing leader, this incredible depth and kind of history in tech. And I just thought like there's no better there's no better manager for this team. And then I I also started working on code quality. And so the the work on Instagram kind of expanded a bit. And um by the time I left, I was leading code quality for all of Meta. And so I was responsible for the quality of the code bases across Instagram, Facebook, Messenger, WhatsApp, Reality Labs, kind of all these code bases. At Meta, it it was this program called Better Engineering. And the idea was I think it's sort of like 2016 or 2018 or something, but Zuck mandated that every engineer at the company 20% of their time has to be spent fixing tech debt.
[`17:17`](https://youtu.be/julbw1JuAz0?t=1037) Oh, interesting.
[`17:19`](https://youtu.be/julbw1JuAz0?t=1039) And we called this better engineering.
[`17:22`](https://youtu.be/julbw1JuAz0?t=1042) Mhm. And the some of this is kind of bottom up where you know a team knows best the tech debt that they have to fix and then some of it is top down where you need to do you know very big migrations you need to migrate to new language features new frameworks things like this and at Facebook scale you know there was tens of thousands of these migrations every year. Um and so I I just started leading all this and I realized very quick that it just needed a little bit more order to it. There was no goals. No one knew kind of like what the outcomes were there. there wasn't any tracking. Um, and so we developed a bunch of stuff. Uh, one of the ideas was a centralized way to prioritize the different kind of code quality efforts. The second thing was figuring out the impact of code quality on engineering productivity which turned out to be significant.
[`18:04`](https://youtu.be/julbw1JuAz0?t=1084) How how did you measure what did you find there?
[`18:06`](https://youtu.be/julbw1JuAz0?t=1086) There was a bunch of stuff. I think some of this has been published. I don't know if all of it has, but essentially you try to do like causal analysis and causal inference. This is the methodology. You try to figure out like what what are the factors that make it so engineers are more productive. Some of it is code quality, some of it is outside of code quality. So for example, meta went back to uh you know return to office instead of work from home. That was partially driven by this because we just found some you know fairly strong correlations that we thought were causal.
[`18:30`](https://youtu.be/julbw1JuAz0?t=1110) Yeah.
[`18:32`](https://youtu.be/julbw1JuAz0?t=1112) Um about this but quality actually contributes like you know double digit percent to to productivity. It turns out even even at the biggest scale. It's it's kind of comforting to hear because I I think it's it's rare to have a place where you actually measure this, but I think we feel it like when you have a clean code base in modular or it can get easier to work with and I I think you know reasoning could it also be easier for LM to to work with it and my hint would be yes it should be right but I I think there's just very little data but that's a feeling that I I would have. Yeah, I think a lot of the big companies have published about this. Like I think Facebook published something. Uh Microsoft publishes a bunch about this, Google does, but yeah, totally. If if if every time that you build a feature, you have to think about do I use framework X or Y or Z. These are all options that you can consider because the codebase is in a partially migrated state where all of these are around the code somewhere. As an engineer, you're going to have a bad time. As a new hire, you're going to have a bad time. As a model, you might just pick the wrong thing and then, you know, like the user has to course correct you. So actually you know the better thing to do is just always have you know a clean code base always make sure that when you when you start a migration you finish the migration and this is great for engineers and nowadays it's it's great for models too and then you joined entropic and I've heard this story which you can confirm or give more color to it that your first poll request was rejected by Adam Wolf.
[`19:55`](https://youtu.be/julbw1JuAz0?t=1195) He was my rampa buddy. So I joined Enthropic. I was trying to figure out kind of like what to do next and you know I I met a bunch of people at all the different labs and anthropic was just the obvious choice for me because of the mission. This is the thing that personally I know that I need the most. Um and also just kind of seeing all this change that's happening. It's important to have some sort of framework to think about this and to think about our role in it. I'm also a really big sci-fi reader. Like that that's definitely my genre. Um I'm I'm a big reader. I have like, you know, giant bookshelf at home and stuff and I just know how bad this thing can go and I just felt like this is a place that has serious thinkers. People are taking this very seriously and thinking about what what what can we do to make this thing go better. So when I joined Anthropic, I did a bunch of ramp up projects uh just you know various stuff that that I was hacking on and I wrote my first pull request by hand because I thought that's how you write code.
[`20:44`](https://youtu.be/julbw1JuAz0?t=1244) That used to be how you write code.
[`20:46`](https://youtu.be/julbw1JuAz0?t=1246) That used to be how you write code. But even at the time at Enthropic, there was this thing called Clyde and it was the it was the predecessor to quad code. It was it was super janky. It was like it was Python, you know, it took like 40 seconds to start up. It was research code. It was not agentic. But if you prompt it very carefully and hold the tool just right, it can write code for you. And so Adam rejected my PR and he was like, "Actually, you should use this Clyde thing for it instead." And I was like, "Okay, cool." It took me like half a day to figure out how to use this tool because you have to like pass in a bunch of flags and like use it correctly. Um, but then it it sped out a working PR. It just one-shotted it.
[`21:23`](https://youtu.be/julbw1JuAz0?t=1283) Oh,
[`21:26`](https://youtu.be/julbw1JuAz0?t=1286) and this was like 2024. This like September 2024, August, something like that. And I think for me, this was my first fuel hi moment at Anthropic cuz I I was just, oh my god, like I didn't know the model could do this. Like I I was used to these like kind of tab completions, line level completions in an IDE. I had no idea that it could just make a working pull request for me. Boris just talked about how he had a true wow moment at work using their AI model. A very different wow moment is when you use a tool at work that makes things so much easier than before. And this leads us nicely to our presenting sponsor, Statsig. Statsig offers engineering teams the tooling for experimentation and feature flagging that used to require years of internal work to build. It's the kind of tool that was so complex to build that only large companies like Meta or Uber had their own custom advanced tooling for it. Here's what satic looked like in practice. You ship a change behind a feature gate and roll it out gradually, say to 1% or 10% of users at first. You watch what happens. Not just did it crash, but what did it do to the metrics you care about? Conversion, retention, error rates, latency. If something looks off, you turn it off quickly. If it's trending the right way, you keep it rolling forward. And the key is that measurement is part of the workflow. You're not switching between three tools and trying to match up segments and dashboards after the fact. Feature flags, experiments, and analytics are all in one place using the same underlying user assignments and data. This is why teams at companies like Notion, Brex, and Atlastian use Statsig. Statsic has a generous free tier to get started, and pro pricricing for teams starts at $150 per month. To learn more and get a 30-day enterprise trial, go to stats.com/pragmatic. And with this, let's get back to Boris and the origin story of Claude Code.
[`23:10`](https://youtu.be/julbw1JuAz0?t=1390) Yeah. And and then when you when you joined Entrophic, we we've covered this in in a deep dive, but we could recap briefly on how Claude Code came to be out of out of what seemed like a side project or just a cool hack. So yeah, I I I started hacking on a bunch of different stuff. Um I was working on some things in product. Um I worked on reinforcement learning for a little bit just to kind of understand the layer under the layer which I was building. This is still advice that I give to a lot of engineers is always understand the layer under. It's really important because that just gives you the depth and you kind of like you have a little bit more levers to to work at the layer that you actually work at. This was the advice 10 years ago. It's still the advice today. Um but the layer under is a little bit different now. You know, before it was like understand, you know, the Java if you're writing JavaScript, understand the JavaScript VM and frameworks and stuff.
[`23:56`](https://youtu.be/julbw1JuAz0?t=1436) Now it's like understand the model. So I was hacking on a bunch of different stuff. Uh something shipped, some things uh didn't ship. And at some point I I just wanted to understand the public anthropic API because I'd never used it before. Um and I didn't want to build a UI. I just wanted to, you know, hack something up quite quickly cuz we didn't have quad code back then. We're still writing code by hand. And I wrote this little batch tool that um all all it did was it hit the anthropic API and it it was essentially like a chatbased application um but just in the terminal because that's what AI used to be. And you know, I I still think about it like engineers are the first adopters. And so when we started to move out of conversational AI to agentic AI, it took a little bit, but engineers understood it pretty quick. And I I think now when you ask non-engineers about like what is AI, they would say it's this conversational AI, it's like a chatbot or something. And that's why I'm actually very excited for, you know, co-work this new product that we launched because it's going to bring the same thing that engineer saw very early to everyone else. But when I think about, you know, co-work, I I think back to this moment that we're talking about like very early on, quad code originally wasn't quad code. It was a chatbot because that's what I thought AI was. Um, but we had to kind of figure out kind of what is the next thing. And so I at at the time I I built this chatbot. It was somewhat useful, but it was just a chatbot. And the next thing that I tried was I I wanted it to use tools because tool use just came out and I didn't know what it was and I was like let's experiment and and I I gave it a single tool which was the bash tool and I didn't know what to do with the bash tool and so I asked it you know like I I actually didn't know if it could even do this but I asked it like what music am I listening to and uh it just wrote a little Apple script program using like said or or whatever to uh open up my music player and then like query it to see what music it's listening to and just one shot at this with sonnet 3.5. This is actually my second a field AI moment very quickly after the first one
[`26:01`](https://youtu.be/julbw1JuAz0?t=1561) and the model just wants to use tools that though that's that's just what I realized like this thing like if you give it a tool it will figure out how to use it to get the thing done and I think at the time when when I think about the way that people were approaching AI and coding everyone essentially had this mental model of you take the model and you put it in a box and you figure out like what is the interface like what how how do want to interact with this model? What do you need it to do? Essentially, it's like if if you have a program, you you stub out some module, stub out some function, and you say, "Okay, this is now AI." But otherwise, the rest of the program is just a program. And so, this is just not the way to think about the model. The way to think about it is the model is its own thing. You give it tools. You give it programs that it can run. You let it run programs. You let it write programs, but you don't make it a component of this larger system in this way. And I think there's just like, you know, this is a version of the bitter lesson. There's the bitter lesson is a very specific framing, but there's many corollaries to it. This is one of the corollaries is just let the model do it do its thing. Don't try to put it in a box. Don't try to force it to behave a particular way.
[`27:06`](https://youtu.be/julbw1JuAz0?t=1626) One of the first ways you saw it was giving it tools, giving it access to the bash and then later to the file system and then to more tools. Right.
[`27:14`](https://youtu.be/julbw1JuAz0?t=1634) That's right. Yeah, we we give it uh we give it bash then uh I say we it it was just me the first three months but then the team grew. So it it was bash, it was uh and and file edit that was the second one.
[`27:24`](https://youtu.be/julbw1JuAz0?t=1644) And one of the interesting thing we talked about uh last time for the deep dive is when you built it and it started to actually write code with with the tool tools that you had. You've had an internal debate inside entrophic should we just keep it to ourselves because it's making suddenly it spread across engineering and it was making all of you a lot more productive right. Yeah, that's right. In the end, the decision was to release so that we can study safety in the wild. Because when you think about safety and you know, I keep talking about the word safety. The reason anthropic exists as a lab is safety. This is the reason it was founded. This is the reason it exists. If you ask anyone at anthropic why they chose it, it's because of safety. And so if you think about model safety, you know, there's different layers at which to think about it. There's kind of alignment and mechanistic interpretability. This is at the model layer. Then there's evals and this is kind of like a it's kind of putting the model in a petri dish and synthetically studying it in this way. Um and then you can study it in the wild and you can see how it actually behaves. You can see how users talk about it. You can you can see like what are the risks in the wild and you actually learn a lot this way. And by doing this we we've been able to make the model much safer. So in in hindsight it was it was totally the right decision. It's amusing to hear about it from your perspective because from the outside what what I saw and what a lot of engineers saw is like oh entropic release cloth code oh wow this you know for the first release with uh I I believe it was with sonet 4 release was was did it come out with sonet 4 originally or sonet 4.5
[`28:53`](https://youtu.be/julbw1JuAz0?t=1733) I think it was it was for that that was the general availability in February but I think it was research preview before that
[`28:59`](https://youtu.be/julbw1JuAz0?t=1739) yeah but when it came out my infiltration was like oh this thing can write code pretty well and over time it became a lot more capable. So from from our perspective it was like this really capable coding tool that we just started to adopt and use and use for all sorts of increasingly product productive parts and it has become I believe one of the fastest growing developer tools and I'm always surprised to hear the story that it actually comes from research and the goal to understand how people use the model because at the other hand like some startups have been trying to build developer tools deliberately to to get adoption and yet this research tool is getting a lot more adoption.
[`29:38`](https://youtu.be/julbw1JuAz0?t=1778) I mean this is a you know anthropic we're we're a research lab we're a safety lab and you know product is this kind of thing tacked on to the side product exists so that we can serve research better and so we can make the model safer and this is kind of how we think about everything there there was this there's also this funny moment early on when uh we we had this launch review and we were deciding whether to launch it. I remember this moment cuz we were in the room. I think it there was like there was Mike Creger, there was Daario, there were some other folks in the room and we were deciding what should we do. We were looking at the internal adoption chart which was just vertical said it was just insane. It was you know like nowadays
[`30:13`](https://youtu.be/julbw1JuAz0?t=1813) vertical is 100% right
[`30:15`](https://youtu.be/julbw1JuAz0?t=1815) just just 100% like nowadays everyone at an every technical employee at anthropic uses quad code every day is pretty much 100%. For nontechnical employees it's also like it's actually getting quite close to 100%. It's it's increasing very quickly like you know like half the sales team uses quad code um and I think that's increasing it's just it's crazy. Dario had this question about like how how did it grow this fast? Are you like forcing people to use it? And I was like no we offer this tool people vote with their feet and you know just like let people use the tool that they prefer.
[`30:45`](https://youtu.be/julbw1JuAz0?t=1845) Yeah they chose it.
[`30:47`](https://youtu.be/julbw1JuAz0?t=1847) You don't seem like the person who's act exactly forcing people to use your tool.
[`30:51`](https://youtu.be/julbw1JuAz0?t=1851) Yeah. Yeah. I mean the the way we did it, we just we launched the thing and then we just like listened to the users and we talked to people, we saw how they use it, we followed up, we made it better and yeah, I mean now now we're at the point where Quad Code writes I think something like 80% of the code in at Enthropic on average and you know it writes all of my code for sure.
[`31:09`](https://youtu.be/julbw1JuAz0?t=1869) Yeah. And this started for you it started the first time you mentioned I think it was in November when it started to write all of your code. When did that switch come and what what happened to made you trust it to to write your code or how much you trusted? How much you review that code for example?
[`31:25`](https://youtu.be/julbw1JuAz0?t=1885) So the switch was instant when we started using Opus 4.5. This was before before it came out, you know, we we were dogfooting it for a little bit and it it was just right away. Um it's such a more capable model. I just found that I didn't have to open my ID anymore. I just uninstalled my ID cuz cuz I just didn't need it at that point. I actually did that like a month later because I I I just didn't even realize that I wasn't using it anymore.
[`31:49`](https://youtu.be/julbw1JuAz0?t=1909) Yeah, a lot of us had similar experiences once Opus 4.5 was out in the public and especially over the winter break. I I had a similar experience. I just realized that this thing it actually writes, if I'm being honest with myself, as good code as I would have written in the stack that I'm very familiar with and my code base, my side projects where I know it and just a lot better than what I could for code base that I'm not as familiar or technologies I'm not as familiar with. Yeah. I'll be honest, he writes better code than I do.
[`32:17`](https://youtu.be/julbw1JuAz0?t=1937) I I I don't want to go there. I I still like to keep my pride, but probably true.
[`32:21`](https://youtu.be/julbw1JuAz0?t=1941) Yeah. Yeah. I I realized this because also in December, I was traveling a little bit. I was like on a I was on a coding vacation. We we're talking about this before, but I I went to Europe. We were just in a different time zone kind of nomading around. And it was so fun cuz I was just coding all day every day, which is my favorite thing to do. And uh I wrote maybe, you know, like 10 20 p requests every day, something like that. Opus 4.5 and quad code wrote 100% of every single one. I didn't edit a single line manually and I realized uh at the end of that month Opus introduced maybe two bugs whereas if I had written that by hand that would have been you know like 20 bucks or or something like that. Can we talk about your development workflow? You have written threads about this which is awesome. It's on it's on social media on threads and on on X. But can you tell us how you use today uh cloud code in terms of you know parallelism and and tips and tricks that you and the team have kind of learned and share across the across the team?
[`33:15`](https://youtu.be/julbw1JuAz0?t=1995) Yeah, I mean look there's no one right way to use quad code. So I I can share some tips and things but I I think the wrong conclusion to draw would be to just copy copy these and and use it. The way we build cloud code is we build it to be hackable because we know every engineer's workflow is different. There's no one way to do things. There's no two engineers that have the same workflow. It's just every every engineer is
[`33:39`](https://youtu.be/julbw1JuAz0?t=2019) same with workstation setup, right? Like keyboards, monitor placement, all that. Everyone has it differently.
[`33:42`](https://youtu.be/julbw1JuAz0?t=2022) Yeah. It's like we're like crafts people, right? Like you choose you choose your tools. Like we care deeply about it. So there's no one right way to do it. So for me, the way that I do it generally is I have five terminal tabs. Each one of them has a checkout of their repository. So it's five parallel checkouts. Um and usually I'll kind of roundroin and start cloud code in each one. Almost every time I start in plane mode. So that's like shift tab twice in the terminal. And uh I also overflow uh as I run out of tabs cuz there's only so many terminal tabs. I used to use web a lot for this. So like quad.ai/code, that's the place that I overflow to. Nowadays I actually use the desktop app. Um it's more convenient. So Quad Code, you know, it's been in our desktop app for, you know, for many months. It's just a code tab in in the Cloud app. Um, and I actually really like it because it has built-in uh work tree support. So that's existed for a while. Um, and that that's quite nice for parallelism. So you have multiple, you don't need multiple checkouts. You just have one and then we automatically set up Git work trees for you. So you get this kind of environment isolation. The reason I do that is I actually just really hate fiddling with git work trees on the command line cuz it it's kind of fiddly. like you need to know the CD get work tree for those of who are not as familiar with it. It's it's when you can check out instead of having a separate local folder, it's almost like checks out separate branch, right? And then you can work on it separately but not have the comp have the complex only at like merge time.
[`35:07`](https://youtu.be/julbw1JuAz0?t=2107) That's right. Imagine that you you have a folder but you have maybe like git makes five copies of that folder in a way that's very cheap um and kind of easy to throw away. So you get this kind of isolation. it can work in parallel and the quads don't interfere.
[`35:20`](https://youtu.be/julbw1JuAz0?t=2120) Yeah. So, you now have support for this which I I think you recently added like native support but like for for your workflow you just stuck with the old one of checking out on separate f folders, right?
[`35:30`](https://youtu.be/julbw1JuAz0?t=2130) Yeah, exactly. I I actually find over time I'm using the desktop app more and more for this.
[`35:34`](https://youtu.be/julbw1JuAz0?t=2134) Um just cuz I don't need these separate checkouts and you know I I just have a bunch of quads running in parallel and I don't have to think about it. The other surprise hit is the iOS app for me. Every day I start like I wake up and I just start a few agents on my phone. Oh, the the native one. Yeah,
[`35:47`](https://youtu.be/julbw1JuAz0?t=2147) the native one. Yeah, it's just like it's the quad app. It's the code tab in the in the quad app and it's the same exact quad code.
[`35:53`](https://youtu.be/julbw1JuAz0?t=2153) Yeah, except it it runs in the cloud, right?
[`35:55`](https://youtu.be/julbw1JuAz0?t=2155) It runs in the cloud. Yeah. So, you have to kind of configure the environment. Luckily, our environment is pretty simple. So, you know, um and it we just use hooks for it. So, you just use the session start hook and configure it. This is kind of one of the benefits of making quad code really hackable is it's very easy to do to do this kind of configuration. And this is something honestly I would never have predicted because you know like I I I code on a computer. If you told me six months ago I'd be writing I don't know a third I haven't pulled the data maybe like a third half something like this of my code on a phone. That's crazy. But that's that's what I'm doing today.
[`36:29`](https://youtu.be/julbw1JuAz0?t=2189) And you're using parallel agents. At what point did you start using them? And how has it changed your work? Cuz one thing that I notice on myself, I don't really use that many parallel agents. I maybe like two at a time, but I'm someone who well I I like to be in charge and especially with Claude. Claude is is is a a tool that you can follow it along. It tells you what it's doing. It you can also have for example learn mode which this was shipped a lot earlier where where you can actually follow along. It gives you tasks. I I feel that like staying in one tab and following along the model is pretty fast as well. I can kind of keep in touch. I'm assuming at some point you must have done this but then what happened when you changed to parallel and are do you feel you're losing any control or it doesn't really matter that much?
[`37:14`](https://youtu.be/julbw1JuAz0?t=2234) Yeah, I I I think there's kind of like two modes to think about or kind of like two two uh two kind of workflows to think about. So when you're new to a codebase, highly re learn mode is awesome. Highly recommend it for people that are onboarding to the quad code team, people that onboard to enthropic. Um the thing that we recommend is so you do for people that haven't tried it you do slashconfig in quad code you pick the output style and you can do learn or explanatory. We usually recommend explanatory cuz that tends to be better for new code bases um that you kind of haven't been in before. For me once you're familiar with the codebase you just want to be productive right like you just want to ship as much as you can and you want to kind of be effective doing that. Um so the role really switches. I don't really go deep into tasks anymore. I start a quad in plan mode. I'll have it kick something off. With Opus 4 4.5, I think it got there. With 4.6, it just really really does it. Once there is a good plan, it just it will oneshot the implementation almost every time.
[`38:10`](https://youtu.be/julbw1JuAz0?t=2290) So, the most important thing is to go back and forth a little bit to get the plan right. So, what I do is I I start one, I enter plan mode, I give it a prompt. As it's chugging along, I'll go to my second tap and I'll start the second quad also in plan mode. Get it chugging along. Then go to the third tab, go to the fourth one. Then maybe I'll go back to the first one when I get notified that it's done. Uh, and then I'll kind of
[`38:30`](https://youtu.be/julbw1JuAz0?t=2310) Do you have notifications on or do you turn them off?
[`38:33`](https://youtu.be/julbw1JuAz0?t=2313) I actually operate in both modes. Um, sometimes I do like, you know, focus mode on the Mac. Um, so I just have it off, but also sometimes I use the system notifications.
[`38:42`](https://youtu.be/julbw1JuAz0?t=2322) And you're very very productive with with PRs. I mean, I I think it was very visible. Even around the holiday breaks uh on social media, you actually were responding to I think someone reported a bug or or a feature request. I'm not sure which one it was. And then an hour or two later it was done cuz cuz you did it. You've also talked about like number of poll requests you've done on a day not to like show up but just as context. What what does a poll request typically involve in terms of complexity? Are these like are some some super trivial or some actually like larger pieces of work as well?
[`39:15`](https://youtu.be/julbw1JuAz0?t=2355) Yeah, pull request each one varies a lot. Um sometimes it's a few lines, sometimes it's a few hundred or a few thousand lines. They're all just very very different. It's changed so much. Like back when I was at Instagram, I think I was one of the uh top two maybe top three most productive engineers at Instagram just by volume of code written. Oh wow. Um so I've always, you know, for me I've I've always just coded a lot. Like this is uh coding is like a way that I can express myself and it's just like it's a way that my brain thinks also. And so now I just get to do it. But I I think with quad code the the the kind of code that you write if you are very productive it it tends to be even it's just the number of PR sort of underelves what what's happening because I I think people that used to be very productive in the old days before AI assistance a lot of the code maybe was like code migrations or something like this so like people that shipped you know 20 30 PRs every day a lot of it was like pretty you know like a oneliner or kind of migrating A to B or whatever. Nowadays I ship you know 20 30 PRs every day but every PR is just completely different. Some of them are thousands of lines, some of them are hundreds, some of them are dozen, some of them are oneliners. It's none of these are kind of code migrations cuz actually Claude just does those and I I don't need to be part of that.
[`40:27`](https://youtu.be/julbw1JuAz0?t=2427) Shipping this much code or this much productive. The obvious question that comes up for any I guess software professional is well the review. What the way teams used to work and I'm not sure if Instagram did this but a lot of other companies did this is you make a pull request you put it up there there's a mandatory human reviewer at Google there's actually two cuz there's one on code quality as as well how has this workflow changed how does the hot code team think about code review and how has it changed over time yeah I'll start by thinking I I'll start by talking about how code review used to work for me so the the way that I used to do it is uh every time I I also used to be one of the most prolific code reviewers.
[`41:04`](https://youtu.be/julbw1JuAz0?t=2464) Oh, okay. So, both.
[`41:05`](https://youtu.be/julbw1JuAz0?t=2465) I I met Yeah. Yeah.
[`41:06`](https://youtu.be/julbw1JuAz0?t=2466) Right. Or is it code reviewers?
[`41:08`](https://youtu.be/julbw1JuAz0?t=2468) That's actually and that's one of the benefits of being in a different time zone. Like I'm not super human. I just didn't have any meetings. And the the way that I approach code review is every time that I would have to comment about something, I would drop it in a spreadsheet and I I would like describe the issue. So, let's say, you know, like someone named a parameter, you know, in a function badly, I would like put that in a spreadsheet. If someone did some bad React pattern or something, I would I would put that in a spreadsheet. And then over time I would just kind of tally up the spreadsheet and anytime that a particular row had more than three or four instances I would write a lint rule for it.
[`41:39`](https://youtu.be/julbw1JuAz0?t=2499) So just automate it with kind of an op. And so that's what it used to look like for me. I've always tried to automate myself away um because there's just so many things to do. Um and this is one of our superpowers as engineers
[`41:50`](https://youtu.be/julbw1JuAz0?t=2510) is we were able to automate all of the tedious work. There's very few other fields where you're able to do this thing. This is a thing uniquely that we're able to do. Um, and this is a thing that I I've just always enjoyed because it gives me more free time and uh I get to do the work I actually enjoy. And so today the way this looks is a little different, but it it mirrors this a little bit. So when cloud code writes code, it generally it will run tests locally. And this is something cloud just often decides to do when it's relevant or it'll write new tests. So you kind of do this this kind of verification. When we make changes to cloud code, cloud will also test itself. So it'll launch itself kind of in a subprocess. It'll verify itself and it'll test itself end to end.
[`42:30`](https://youtu.be/julbw1JuAz0?t=2550) This is for the the your internal cloud code implementation. So you have like this test suite so they can test itself.
[`42:35`](https://youtu.be/julbw1JuAz0?t=2555) Yeah, that's right. That's right. But it'll literally launch itself just in a bash process and kind of just see like hey do I still work.
[`42:42`](https://youtu.be/julbw1JuAz0?t=2562) Wow. Okay. So it'll do this and this is something that we we just didn't code in like it just with Opus 4 4.5 especially it just sort of spontaneously doing this. It just wants to kind of check. So so we do this and then we also run claudep. So this is the quad agent SDK in uh CI. So every pull request at Enthropic is code reviewed by quad code. Uh and that actually catches maybe like 80% of bugs something like this. Um and it's the first round of kind of code review. Cloud will automatically address some of these. Some of them some of them it'll leave to a human cuz it's not sure what to do. There's always an engineer that does the second pass of code review. Um and you know there there always has to be a person in the loop approving the change.
[`43:23`](https://youtu.be/julbw1JuAz0?t=2603) Mhm. So on on on the team before anything goes into production if you will an engineer does look at it. Yes. As you're thinking of code review would you do this for every type of project or this is specifically because you now know that this actually has real world impact people depend on it. You know there's a lot of users let me put it the other way around like can you see places where you would just not have an engineer review uh code. What situations would that be in?
[`43:47`](https://youtu.be/julbw1JuAz0?t=2627) I think it depends how how how it's used. Yeah I'd agree with that. But you know if you're building some personal side project like you can just yolo straight to main you know like
[`43:56`](https://youtu.be/julbw1JuAz0?t=2636) it's even even before AI you would have not reviewed you just trust yourself or you know just ship to production or SSH into production and do some changes that kind of stuff right
[`44:06`](https://youtu.be/julbw1JuAz0?t=2646) exactly exactly um the very first versions of quad code that were internal like you know I committed straight to main but then you know as soon as you have users and you know for enthropic our main customer base is enterprises this is what we care about the most for us for safety reasons security is really important privacy is important. These are these are all related. It's also very important for our customers. And so because this is an enterprise product, it has to be secure. It has to be we have to make sure that it meets a certain bar. So we definitely use a lot of automation, but at least for now, there has to be a human in the loop just to make sure.
[`44:38`](https://youtu.be/julbw1JuAz0?t=2678) One thing that is just known about LM is they're nondeterministic. And by putting the element as a reviewer claude doing a review like it it will give good feedback but how do you deal with the fact that you can be sure if it's always giving the feedback you cannot be sure that even if it's capable of catching an issue that it will necessarily catch that. Are you doing anything in in this loop to do deterministic thing? For example, linting is very deterministic as you will very well know. Like have you thought of marrying some of these ideas or are you using for example are using llinters on the codebase or you found no need to for it? Yeah, absolutely. Absolutely. Yeah, you
[`45:14`](https://youtu.be/julbw1JuAz0?t=2714) this is just a Yeah.
[`45:15`](https://youtu.be/julbw1JuAz0?t=2715) Yeah, we we have type checkers, we have llinters, we run the build. Claude is actually so good at writing lint rolls. So, actually what I do now, I used to tally stuff up in a spreadsheet. Now, what I do is when a coworker puts up a pull request and I'm like, this is lintable. I'll just be at Claude, please write a lint roll for this in that PR on their PR. And we have, you know, you just run like slash I think it's like setup GitHub or or something like this. You can do this in cloud code and it'll install the GitHub app which then makes it so you can tag add Claude on any pull request, any issue. I use this every single day. Um, so very very useful. So you want these deterministic steps. Also though there are there are ways to get cloud to be a little bit more deterministic. So for example, you can do best event. You can have it do multiple passes
[`46:00`](https://youtu.be/julbw1JuAz0?t=2760) and and this is actually quite easy to do. So you know for example the coderview skill that we use internally it's open source um and it's available in the quad code repo and so all we do is you know we launch parallel agents to do stuff and then we launch parallel dduping agents to check for false positives but essentially best of end the way you implement it is is all you say is claude start three agents to do this and that's it. or just talked about building that enterprise infrastructure layer, the O, the permissions, the security that has to all work before you can ship to real customers. This makes it a great time to speak about our season sponsor work OS. If you're building any SAS, especially an AI product one, then authentication, permissions, security, and enterprise identity can quietly turn into a long-term investment. SL edge cases, directory sync, audit logs, and all the things enterprise customers expect. It's a lot of work to build these mission critical parts and then some more to maintain them. But you don't have to. Work provides these building blocks as infrastructure so your team can stay focused on what actually makes your product unique. That's why companies like Antrophic, OpenAI, and Cursor already run on Work OS. Great engineers know what not to build. If identity is one of those things for you, visit work.com. And with this, let's get back to building cloud code with Boris. How does cloud code work in terms of ar architecture? So as as an engineer, how can I imagine it's setup? It's uh we we covered some of this in the the deep dive and I think you told me that you had some pretty complex ideas when you started and you just simplified a lot of it.
[`47:33`](https://youtu.be/julbw1JuAz0?t=2853) Yeah. Yeah. It's very simple like you know there there's not much to it. There's like there's a core query loop. Uh there's a few tools that it use that it uses. We we delete these tools all the time. We add new tools all the time. We're just always experimenting with it. So there's kind of this core kind of agent part of it. Then there's the the 2E part of it. Uh and then there's there's actually a ton of different pieces around security. Um and making sure that everything that QuadCode does is safe and that there's a human in the loop for when it happens.
[`48:06`](https://youtu.be/julbw1JuAz0?t=2886) And by safety, do you mean as as a user when it's doing stuff on my computer or also as entropic monitoring use cases that that could be deemed unsafe? Yeah, there's kind of a couple versions of this. You safety, there's just many, many layers and for things like safety and security, there's no one perfect answer. So, you know, it's always a Swiss cheese model. You just need a bunch of layers and with enough layers, the probability of catching anything goes up. And so, you just have to kind of count the number of nines in that probability and pick the threshold that you want. And so, for something like prompt injection for example, we do this generally at three different layers. So, let's think about something like web fetch. So cloud fetches a URL and uh it reads the contents of of of that web page and then it does something in in quad code. So one of the risks for something like this is prompt injection. Maybe there's an instruction on that website to be like hey quad delete all the folders or something like that.
[`48:55`](https://youtu.be/julbw1JuAz0?t=2935) So we think about this in a number of ways. The the most basic way is it's an alignment problem. And so opus 4.6 is the most aligned model we've ever released because we've taught the model how to be more resistant to prompt injection. And so you can read about this on the model card and I think it was part of the release. The second part is that we have classifiers at runtime where if there is a request that seems to be prompt injected, we block it um and we just make the model try again. And then the third layer is for something like web fetch, we actually summarize the results in using a sub agent and then we return that summary back to the main agent. So again, this kind of reduces the probability of prompt injection. And so you can kind of see how this isn't just one mechanism. It's it's a layer and by by having a bunch of these different layers, it just reduces the probability a lot.
[`49:42`](https://youtu.be/julbw1JuAz0?t=2982) One interesting technical choice that you've also mentioned is is using rag or not rag retrie retrieval augmented generation and you mentioned how in the earlier version of cloud code you use a local vector database to to get some to to speed up search and you layer threw this away. Can you talk about how this one because this was another example where I guess did the model get better?
[`50:04`](https://youtu.be/julbw1JuAz0?t=3004) Yeah, I mean this is one of those things where we try so many different things. We try so many different tools and just statistically most of them we throw away.
[`50:13`](https://youtu.be/julbw1JuAz0?t=3013) Even something like the spinner in quad code I think it's gone through like a hundred iterations
[`50:17`](https://youtu.be/julbw1JuAz0?t=3017) I want to say. Oh
[`50:20`](https://youtu.be/julbw1JuAz0?t=3020) just the spinner and you know out of those we've landed maybe like 10 or 20 in production and like 80 of them I probably just threw away cuz it didn't feel good enough. So just statistically almost all the code we write we throw away because it's just so easy to write this code and try stuff and see what feels good. So for something like rag we tried a bunch of different approaches early on. So the the first one was rag for retrieval cuz I think this I was just like reading up like how people were doing retrieval and it seemed like all the papers were talking about rag. Um and so the way I did it was it was like a local vector database. I think it was like written in Typescript and it just lived on the user machine. Uh and then I was using some like embedding uh model that was in in the cloud to compute the embeddings before storing it. Um and that that worked like pretty good, but there's a lot of issues with rag. Um so for example, I was finding that the code drifted out of sync. Like if I make a local function, it's not yet indexed and so rag isn't going to find it. There's also this question of like how exactly is the index permissioned? So who can access it? I can access it. Um but then how do we like encode that in kind of permission policies? How do we make sure no one else can access it? How do we make sure that like if there's a rogue IT person within the company, they can't access someone else's data? This is really really important that we think about this.
[`51:32`](https://youtu.be/julbw1JuAz0?t=3092) Yeah.
[`51:35`](https://youtu.be/julbw1JuAz0?t=3095) Um and so we just decided like it was sort of working, but it was it also has a lot of downsides. And so we tried a bunch of other stuff. Uh one of them was just using the model to uh kind of index everything recursively. Um that was kind of a cool idea. There was another version where um we just tried glob and gp. We tried a bunch of different stuff. It it turned out that agentic search just outperformed everything
[`51:56`](https://youtu.be/julbw1JuAz0?t=3116) and and when I say agentic search, this is a fancy word for glob and grap. That's all it is.
[`52:02`](https://youtu.be/julbw1JuAz0?t=3122) Nice. So So the model both got good enough and you realize that it can use these tools pretty efficiently.
[`52:07`](https://youtu.be/julbw1JuAz0?t=3127) Yeah. And this was uh it was partially inspired honestly by my experience at Instagram because at at Instagram click to definition didn't work because the the dev stack was just borked like half the time and I think now it's better. And so what engineers weren't to do instead is let's say you're looking for the definition of the function fu instead of click to definition what you would do is you would use the global index which is quite good at meta and then you would search for fu per opening parenthesy and this worked pretty well and it it's funny because like this works for the model pretty well too interesting how one one idea from one area can come to the other one of the more advanced parts of cloud code that we've also previously talked about is the permission system. Can you talk about what was complex about it? And also you recently open source sandboxing, right? Permissioning is really complex. Um there's like everything else that has to do with security. It's a Swiss cheese model. There are a number of classifiers that run to make sure the command is safe. Um and there's also static analysis that we do to make sure the command is safe. As a user, you can also allow list particular patterns that you know to be safe. So, for example, um some standard Unix utilities we preow because we know they're readon because we know they can't expilt your data or anything like this. So, we we just won't prompt you for permission. But actually quite few tools fall into this category because even something like the find command, there's actually a way to execute arbitrary code as part of that command because there's there's like system flags that you can use for this. or even something like the said command. There's ways to use this. So there's just like all this like arcania about these various Unix utilities where it's actually not as safe as you think.
[`53:53`](https://youtu.be/julbw1JuAz0?t=3233) And so we want to be by default fairly conservative about what we allow by default. As a user though you can configure an allow list. So you can say for example like the these patterns are allowed the these patterns are not allowed. Uh and so we we let you define that and we also check this allow list to to make sure that it's safe.
[`54:09`](https://youtu.be/julbw1JuAz0?t=3249) Yeah. And then you you have this like neat permission system where every time you run a command that needs permission, you can decide to run it once or run it for either this session or whatever it makes sense or just globally allowed going forward. Right. That's right. This is a funny artifact. This was actually in the very very first version of quad code. This is the way permissions worked. This is the very first release. This was like September 2024, the first internal release. I remember at the time we weren't sure whether agentic safety could be even be solved. And so there was actually a lot of push back internally from safety teams because they were like okay like you can't just run let the model run bash commands like that's unsafe. So like what do you do like this is not a solvable problem so like we can't launch this. I I brainstormed with Ben man and Ben was he started the labs team. He's one of the founders at Enthropic. Um he's actually he's the the person that hired me to Anthropic. We just came up with permission prompts as the way to do this. You you put the if you're not sure just ask the human and and they can decide.
[`55:07`](https://youtu.be/julbw1JuAz0?t=3307) Yeah. I wanted to ask you about how software engineering is done in general in terms of Antrophic and one of the first questions which is a I guess a more formal one but or from the outside is titles or lack of them. Everyone at Antroic has the same title member of technical staff. Why did this happen and what does this result in this kind of like everyone there basically no titles right except for one? I think it's kind of an acknowledgement that um everyone just is figuring stuff out. And um if if you kind of squint and look at the work people are doing, it's all quite similar and it's it's kind of quite generalist and if you talk to the average software engineer, they might not just be doing coding. They might also be doing a little design. They might also be talking to users. They might be writing their own product requirements. They might be writing software and also uh you know doing research. They might be writing product code and also infrastructure code. At anthropic there's a lot of generalists. This is also you know from my background. This is one of the reasons that I gravitated towards it. And I I I think member of technical staff just kind of encodes this in in the way that people talk to each other even if they don't know each other. Without this title the default would have been I see your name on Slack and under your name it says software engineer. And then I'm like well okay I guess you're like you're the coding person then. So I'm I'm not going to ask you like product questions, but when everyone's title is member of technical staff, by default, you assume everyone does everything. And so it kind of inverts this this relationship between people even if you don't know each other well yet. In in a way, it's kind of this like optimism built into the built into the structure. Um I think it's also a glimpse of the future because I I think this is where software engineering is going. I think this is where every discipline is going is more of this generalist model. It definitely feels like it in in software engineing. And I I heard this funny uh comment by Mark Andre uh how we said that there's this Mexican standoff happening in the tech world where the the designers are are saying that they're actually now doing like PM and engineering work. The engineering are saying we're doing design and and like everyone thinks they're doing the work of the others and they're kind of standing there like I'm doing your work as well. when the reality is everyone's role is expanding most of it thanks to AI because it makes easier for an engineer to do product work or for a product person to engineer work and so on. So just what what you've said
[`57:29`](https://youtu.be/julbw1JuAz0?t=3449) I I remember back in the back in June or July of last year I I walked into the office and the data there's a row of uh data scientists that sit right next to the quad code team at least at least at the time and I walked in and our data scientist for the quad code team had quad code up on on his monitor and um he he was using it and I was like this is interesting cuz you're you're a data scientist did you have like why are you using a terminal like you didn't have NodeJS installed cuz we depended on Node.js JS back then. I I was like, "Are you are you dog fooding it? Like are you just like trying to like figure out how this thing works or something?" He's like, "No, no, I'm like I'm using it to run queries." He was just like using it to run SQL and it had like little like ASKI visualizations uh in the terminal. Uh and then the next week the entire row of data scientists had quad code running on their computers and and this expanded and so if you look at the team today on the quad code team everyone codes the engineers code our engineering manager codes designers code uh data scientists code uh our finance guy codes everyone on the team codes and I think part of it is quad code just makes it so easy so you don't really have to understand the codebase. You can just like dive in and and kind of make small changes quite easily. But I think another thing is people are able to use cloud code to do their jobs more whether it's you know financial forecast or you know data science or whatever and by doing this it's actually quite an easy crossover to just use it to write a little bit of code also. So it's just a way to dip your toe in the water. One other interesting thing about how you work is Cat Woo was talking about she is I guess you the title is the same but people might gravitate for role a bit more. I understand she's a little bit more on a product role but you said that PRDs are just not really written inside entropy and PRD's product requirement document. It's a well-known artifact across big tech and increasingly over larger startups where you write a spec and the idea is that you write down your thoughts, people align, you send it over and now you know what to build. But apparently you're not doing much of this or at all.
[`59:30`](https://youtu.be/julbw1JuAz0?t=3570) Some of this I think is because Anthropic is still, you know, it's still a startup. So you you don't actually have to align with that many people usually. You can just kind of talk about it or do it in Slack or whatever. Um but yeah, also part of it is, you know, like Cat used to be an engineering manager. She's she's extremely technical and I think this is this is the way that you know our product team thinks about it too is you know better send a PR.
[`59:51`](https://youtu.be/julbw1JuAz0?t=3591) You're you're doing a lot of prototyping instead. So like that that's also something where when we talked about how you were building cloud code early on you were showing actually you had a whole thread about the number I think you did like 15 or 20 prototypes for the the to-do list and all of them interactive working and what surprised me compared to my past tech experience and you said that well you did this in like a day and a half all all 20 tried it out got a feeling for it which incomprehensible for me it would have taken a week or two weeks and people would have not done 20 they would have done three. Yeah.
[`1:00:23`](https://youtu.be/julbw1JuAz0?t=3623) So like are are you seeing this? Is there an increase in in prototyping and and building and showing instead of you know writing things?
[`1:00:30`](https://youtu.be/julbw1JuAz0?t=3630) Yeah. Absolutely. I mean on our team the culture is we don't really write stuff. We just we show. It's a little hard to to reflect back on the time before cuz I I think now just prototyping everything is so baked into the way that we build. Just everything is prototype multiple times. Like uh you know we launched agent teams earlier this week. This is our implementation of swarms. It it's very exciting because uh it just lets Claude do more work for longer, more autonomously. You have a bunch of different uh uncorrelated context windows and you have this kind of communication between agents. They can just do more. This is something that uh Daisy and Suzanne and other folks on the team uh and and Karen, they they prototyped this for months and they tried all in all probably hundreds of versions of this before they got a user experience that felt really good. um it was just really really hard to get right. There's just no way we could have shipped this if if we started with, you know, like static mocks in Figma or if we started with a PRD or something like this. It's a thing that you have to build and you have to feel and you have to see how it feels. And to me, one of the big takeaways even from there was like we probably should prototype more and just be more daring or just release your priors of how long it took to build a prototype or who needed to build. Back then it was always an engineer that needed to build, but it's probably not true anymore. Yeah, that's right. I mean, we're in this world right now also where we just we don't know what the right answer is. You know, like I I think back in the old way of building you the cost of building was high and so you had to actually spend a lot of effort to aim very carefully before you take your shot because after you take your shot um it it's very hard to course correct. You can only take so few shots. But now it's changed. The cost of building is very low. Um but also we don't know where we're aiming. So we just have to like we have to try and we have to see what feels good. And it's just very very exploratory. And I think also a big part of it is humility where you know personally I'm wrong like half the time I'd say like most of my ideas are bad. At least half of them are bad. And I don't know which half until I try it.
[`1:02:28`](https://youtu.be/julbw1JuAz0?t=3748) And I get feedback from others as well sometimes.
[`1:02:30`](https://youtu.be/julbw1JuAz0?t=3750) That's right. It's like I I have to try it myself and then I have to see what others think cuz you know my intuition does not always match others. When you were showing these prototypes of just how the the tasks were built, you were telling me that you built the prototypes and then your process was always you first like looked at it, you tried it out, you got a feel for it and then for the ones that you felt were good, you showed it to others and sometimes they give you feedback like nah this doesn't work and then sometimes when it felt good then you shared it even broader. So I feel like you know like it's a mix right where like sometimes you can decide already and then sometimes you get feedback and then eventually some good ideas come out of it. Yeah, and there's a lot of examples of this like uh we we launched this kind of condensed view for file reads and file search just because the the model is just so agentic now like I felt like half the screen is these like file reads and I actually don't care like I you know I read a thing I don't really care what it is and so we condensed this down to make the output a little bit more readable. I really liked it after probably 30 prototypes or something like this. It took it took so much effort to make that feel really good and clean. We rolled it out to employees at Enthropic for about a month and we had everyone dog fooded and I fixed another probably dozen dozen bugs, dozen tweaks based on all this feedback. We launched it externally and you know almost all users liked it but there were a few users that didn't because they want more expanded output. Um and so on the GitHub issue I was just going back and forth with people to be like you know what like what don't you like and people gave a lot of feedback. I shipped another version. Then some people liked it, some people didn't. And so I iterated again and kind of made it good. And it it's actually I think almost there where people can configure it the way that they want, but still the default is really good. But this is just the process. You know, we we get it right some of the time. We have to learn from our users. We want to hear from people so we can get it right.
[`1:04:12`](https://youtu.be/julbw1JuAz0?t=3852) Do you use ticketing systems for your work where you know where where you capture like, all right, here's the work I I want to or do you just pretty much do the work as as it comes in?
[`1:04:21`](https://youtu.be/julbw1JuAz0?t=3861) So at Anthropic, we leave it up to teams on the quad code team. and we leave it up to every person. Uh different people use uh use this differently. For example, I don't use a ticketing system. Some people like to use a sauna or notes or something like this. One of the coolest things that I saw, this was maybe like 3 months ago or something. We launched plugins and the way we launched that is uh Daisy for a weekend, she had a very early version of swarms and she let the swarm run and she told that your job is to build plugins. You have to come up with a spec. Then you have to make a asauna board and split up into tasks. And then all the different agents have to build it. And uh she set up a container and she set up a quad in dangerous mode. And she let it run for the entire weekend. It spawned a couple hundred agents. They made 100 tasks on the sauna board. Uh and then they implemented it. And that's pretty much the version of plugins that we shipped. These kind of coordination systems that used to be for humans, but um I think nowadays it's just as much for models. Let's let's talk about cloud co-work. Uh it's one of the very impressing things about this. It looks great. So I tried it out. It's inside cloud. You have the co-work tab there and and you can I I feel it's a lot more visual way of of running agents interacting with them. One of the surprising thing I heard that it was built in 10 days. Can can you take us through like what it took to build it and what does actually mean? Was it from the idea or like from the decision of of building it? And how big was the team building it?
[`1:05:45`](https://youtu.be/julbw1JuAz0?t=3945) The team was really small. It was just a few people for a long time. We felt that there is some product to be built for non-engineers. The reason we felt this is for a long time people that were using cloud code are non-engineers. Um and so you know in the product world when you see latent demand you see people jumping through hoops to use a product that was not designed for them. That's a really good sign it's time to build another product that is built just for them. There's all these people on Twitter that there's this one guy that was using uh quadco to like monitor his tomato plants. I just I love this. It was like he had like a webcam set up and quad was like, "Oh my god, I'm so happy that our plant is budding." And because it was it had like a webcam and just like every day was like monitoring it and it it was so happy that the tomatoes were growing. There was someone that was using quad code to, you know, recover photos off of a corrupted hard drive and it was like his wedding photos.
[`1:06:36`](https://youtu.be/julbw1JuAz0?t=3996) Wow.
[`1:06:38`](https://youtu.be/julbw1JuAz0?t=3998) Um you know, like I said, our entire finance team at Anthropic uses quad code. Our sales team uses quad code. So there there's just all these people that are non-engineers that were using it. And at that point quad code it's available in a lot of form factors right like we started in a terminal then we expanded and we added support for ideides. So we have extensions for you know every VS code based ID every Jet Brains based IDE there's also iOS and Android apps there's the desktop app uh there's web. So uh then then there's like Slack and GitHub apps. So we kind of expanded to all these places to make cloud code easier for engineers. But ultimately none of these are built still for non-engineers. And so cloud code evolved a lot, but it still felt like there's a there's kind of a gap and there's a product that could make this even easier for people. And so for the last couple months, the team was kind of hacking around and just saying like what is the right product? And at some point someone came up with this idea of like what if we just take quad code, add some guardrails. So for example, co-works with a virtual machine. This is one of the many ways that we make sure it's really safe. Um, especially for nontechnical users that don't want to read like bash commands to figure out what it what it's doing. And they were hacking on this. I think it was something like 10 days end to end or something. It was just fully built with quad code. Uh, and then we shipped it.
[`1:07:55`](https://youtu.be/julbw1JuAz0?t=4075) And can you give us a sense of like the complexity behind an app like this? And if if we can walk through like what parts needed to be built because from the outside it's a little bit hard to tell like is this just a nice UI wrapper that's you know like I don't know like a few hundred lines of code. I'm just being obviously I'm I'm provocative here or behind the scenes it's actually really complex piece of software. And the reason I ask is like Uber is a great example where people look at the app it looks really simple. I've worked there and I know it's it's really really complex because you don't see a lot of the complexity. There's a a lot of regional things. There's a lot of backend things that are all hidden. So from just from looking at it, claude coowork, it's it's hard to tell how much of this is is additional business logic that needed to be carefully thought out versus it's actually just a nice little thin wrapper on top of the the model. In some places, I think there's less complexity than you would think. In some places, there's more complexity. So on the product side, it's quite simple um cuz it's just the quad desktop app. So you know, you download the Quad app. It's it's a single desktop app. It has a tab for co-work, it has a tab for code, it has a tab for chat. So it is just one app and we were able to inherit a lot of that product logic. There's some UI rendering code under the hood. You know it's just the same quad code running. It's the same quad agent SDK that powers quad code. A lot of the complexity actually is about safety because we know like I said we know the user is nontechnical and so we just want to make sure they have a good experience and so for example if someone launches the app and then you know like they delete a bunch of family photos that's really not good and so we wanted to make sure that we protect against this so you can't accidentally do that. And so that's where a lot of the guardrails came from. So there's a bunch of classifiers running on the back end. This is for safety and again extra mitigations for things like prompt injection and you know risks like this around security. On the front end there's an entire virtual machine that we ship. There's a bunch of operating system system level integrations to make sure people don't accidentally delete things. So just around safety there there's a lot there. And then we also had to rethink the permission system because we inherit the permission system from quad code. Um but also for co-work actually a big part of the value is not just running locally but it's using all of your tools the way that quad code uses it. But the thing is for nontechnical users your tools aren't really available as CLIs. Some of them are available over MCP. Many of them are available in a browser. And so co-work is really really good when you pair it with a Chrome extension. And this is the way that I usually use it. So, you know, for example, I use it every week to do uh project management for the team. We have like we have a spreadsheet that tracks kind of at a really high level what everyone's working on. And this is kind of my personal way of project managing. You know, other people, like I said, use ASA, other people use notes or whatever. For my own test, I don't use anything, but kind of for the team overall, I have the spreadsheet and I have co-work kind of check-in and I I just ask co-work every week, hey, can you look at the rows for any status that has not been filled out? Can you just ping the engineer on Slack? And so it'll open one tab in Chrome for the spreadsheet. It'll open another tab with Slack and then it'll just start messaging engineers in Slack and it just oneshots it. There's like one engineer's name for some reason it can't autocomplete. Um but every everything else it just gets. And so this is actually like from a safety point of view, we also thought pretty deeply about this Chrome extension and how this works and how the permissioning model should interact with this local permissioning model. So there's also a bunch of code to kind of make sure that that's that feels smooth. And what's the tech side behind this? I assume a lot of will be similar to the the cloud app, but is it is it electron, typescript, those kind of things or or something else?
[`1:11:23`](https://youtu.be/julbw1JuAz0?t=4283) Yeah. Yeah, just electron and typescript. Actually, some of the people working on it are early electron folks. So, uh Felix who's uh you know the creator of of co-worker on electron. He helped build it.
[`1:11:37`](https://youtu.be/julbw1JuAz0?t=4297) Oh, amazing. And co-work launched Mac OS only. uh what was the reason for both for choosing this platform first and for now only choosing this platform?
[`1:11:47`](https://youtu.be/julbw1JuAz0?t=4307) Yeah, so Windows coming soon. Um I think probably by the time this podcast comes out we will have Windows support. Uh we just wanted to start early and start learning you know like everything we do at Enthropic it's kind of like the way that I told my own story the one of the things I like about anthropic is it just really really matches the way that people here think about it. you know, back to this point where like we don't have high certainty about the things that we build and our intuition is often wrong and so we just have to like learn from users and figure out what people actually want and just spend a lot of time listening to people and understanding the feedback deeply. This is the way that we build product and so we always launch a little bit before it's ready. Um we did this for quad code when we launched quad code initially it didn't even support Windows also it didn't support you know like a lot of different stacks and then over the coming weeks we added support for every stack. Now quad code supports every single stack. Um you know like Windows whatever weird Linux dro use Mac OS we support everything and so for core work also we just wanted to launch early we wanted to start with Mac as that was just the starting point but um yeah it's it's going to support everything. One thing you mentioned is is getting feedback. I'm curious both for cloud code and for cloud co-work. How do you go about things like observability monitoring when you're rolling out? Do you use any feature flags? And I'm I'm more interested in like did you build custom tools for this or did you decide to use certain vendors because es especially for observability I'm sure that this is this is both important but it also sounds like pretty high scale in terms of the the number of users that we can derive or this will not be a small operation. Yeah there's there's some off-the-shelf vendors that we use there's some custom code that we use. So um it's actually it's a mix of both. There's nothing too surprising about it. There's one thing about Enthropic that's kind of interesting is because we're an enterprise company and we care a lot about privacy and security, we can't see people's data. Um, and so, you know, like if someone reports a bug, like I actually can't pull up your logs to kind of see what's going on. A lot of work goes into kind of figuring out how to log events and things like this in a privacy preserving way. Um, this is just very important to the way that we operate
[`1:13:50`](https://youtu.be/julbw1JuAz0?t=4430) for co-work. What kind of learnings have you had so far? It's it's it's been out for I think a few weeks now. Did you see something unexpected? uh are you shaping the product based on feedback that you're getting?
[`1:14:03`](https://youtu.be/julbw1JuAz0?t=4443) Yeah. Uh every day the team is landing so many fixes. The most surprising thing is just how much people are loving it. To be honest, when Quad Code first came out, it actually wasn't an overnight hit. This is something people think it was, but it was sort of a slow take off at the beginning. And I think the first big inflection was in May when we released Opus 4 and Sonnet 4. That's when it really clicked and that's when our growth became exponential. But at the beginning, it was sort of a research preview. people didn't really know how to use it. Some people got it immediately, but most people didn't. It took it took a little while. For co-work, it's a much steeper growth trajectory than quad code was at the beginning. So, it it's just been an instant hit. And that that's actually been very surprising. I I didn't really expect that. One of your new releases, which came out just very recently, it was I think yesterday or the day before when we're recording this podcast, was agent teams. And I as I understand the idea with what agent teams agents forms instead of single agent you can have a lead agent and it can delegate to its different teammates. How did you start experimenting with this and how did you decide to ship it? Now we're always doing experiments right there's uh there's there's all sorts of ways uh to get more mileage out of out of quad code. Um one way you can do it is by extending context. Another way is autoco compacting context. So it's essentially infinite context and that's what we have right now. Another way is using sub agents. So you have multiple agents kind of working together. Um there's just like a lot of different approaches to get a little bit more mileage out of the context window. There's this one idea called uncorrelated context windows. That's what we call it. And the the idea is you have multiple context windows. Um but they essentially start fresh. So they don't know about each other. And so an example of this is like a correlated context window is if you have one if you have the model and it does a task and then you have it just do a second task in that same context window. Um and in this case the the second task knows about the first one cuz it's in the same window. But for something like a sub aent it's uncorrelated because the main agent prompts the sub aent but the sub aents context window is fresh. Besides that prompt it doesn't know what's in the parent context window. And you can see this actually a little bit in uh for example like sub agents versus uh skills because when you run a skill uh you know or slash command it sees the parent context window versus for a sub agent it doesn't. So it's uncorrelated. There's some cases where you want that context. There's some cases when you don't. Um and there's this kind of interesting thing where uncorrelated context windows and just throwing more context at the problem and throwing more tokens at it when the windows are uncorrelated gives you better results. Um, it's actually a form of test time compute to do this. And for something like teams, we've been experimenting with this for a while. I think since maybe like October or September or something like this, and it really just felt like with Opus 4.6, it clicked where the model figured out really how to use this. And sometimes you see these kind of cute exchanges where the agents are talking to each other and they're like discussing something and it's just very cool to see. It's very like humanistic in a way. But there's other times where you just get very good results. And so we had a bunch of internal evaluations for example where we have quad build something very very complex, something more complex than what a single quad would build. And we saw the results just really really improved with Opus 4.6 with teams. And that's why we felt it's the right time to release it. We also wanted to be careful. Um, and the reason you have to opt into it, the reason it's a research preview is it uses a ton of tokens cuz it's just a bunch of quads that are running. Um, not everyone wants this all the time. So just excited to see how people use it and uh you know to to hear the feedback. It's it's something you want for fairly complex tasks. You don't probably want this for every task. The main quad decides the rules for the sub quads. We don't have a kind of a regimented way to do this. It's context specific. I wouldn't say there's one right way to do it. I think actually a lot of the magic of this comes out of this idea of uncorrelated context windows. It's less about the specific configuration of the agents. But you know it's something that people should experiment with. I don't think there's a one-sizefits-all.
[`1:18:03`](https://youtu.be/julbw1JuAz0?t=4683) Have you seen use cases even in even I I know it's it's still research, but have you seen use cases where it could look it looks promising this approach, the swarm approach?
[`1:18:10`](https://youtu.be/julbw1JuAz0?t=4690) Well, you know, like I said before, plugins were fully built with swarms. There there's a bunch of other feature since that were built in this way. So yeah, I I think for anything where you see a single cloud struggling, swarms can help. It's it's an interesting to look at. Talking about change in in general with Andrew Carpathy, you had a really interesting exchange back in December where when he posted that he's never felt as much behind as as a programmer as he is now because of the progress with AI. And then you shared the story about how you started to debug a memory leak the oldfashioned way and then Claude just one shot at it. I think it was a reflection of like how everyone is feeling that things are changing so fast and in the in the holiday break I started to feel that things have have really shifted. How did you I guess come to terms with this or or start to embrace this change? This is something I really struggle with. The model is improving so quickly that the ideas that worked with the old model might not work with a new model. the things that didn't work with the new model might work or with the old model might work with a new model. And it's weird because there's just not a lot a lot of other technologies like this. So I I just don't really have a lot of experience to draw on to figure out how I should approach this. And it's been this new skill that I've had to learn. In a way, it's like you just always have to bring this beginner mindset. Honestly, like I'm using the word humility a lot, but you always just have to bring this kind of intellectual humility because just all these ideas that were bad before are now good and and and the inverse. I I think that's honestly it it's something I I constantly have to remind myself about. And back in the It's funny back in the old world when someone tries an idea again and we've tried it in the past and it didn't work, usually the feedback is like, why are you doing this again?
[`1:20:00`](https://youtu.be/julbw1JuAz0?t=4800) Yeah. Yeah. You should learn. This used I mean we used to call a bit of a gatekeeping but it was somewhat valid where I know with architecture someone came and said like why don't we do microser and someone said we tried it and it didn't work and if you tried it a year or two or 3 years ago it was kind of valid right cuz not much has changed. Yeah, that's right. That's right. And something with Microsoft, it's it's funny because it's like every 10 years it goes in and out of in and out of style. But yeah, now now it's I think the first time ever where it's actually not crazy to just try the same idea every few months because the model improves and it just works. And I I actually see this with engineers on the team. Like new people that are newer to the team, people that are newer to engineering sometimes do things in a better way than than I do. Um and I just have to like look at them and I have to learn and I have to adjust my expectations. you know, like an an example of this is, you know, when when we release features, sometimes I'll like screenshot myself using them on, you know, on X or on threads or whatever just to kind of talk about it. Um, but recently, Tar, our um, you know, our devro guy, he actually codes a lot. Um, he's amazing and he just started automating this. So, he's having like quad code generate its own videos for for its launches and he just started doing this and, you know, this is something like I thought would be, you know, maybe it's possible. It's not something I would have tried because I wouldn't have thought the model was ready, but he just he just did it and it just kind of worked.
[`1:21:18`](https://youtu.be/julbw1JuAz0?t=4878) One thing that I've I felt like just a bit like odd about and I think a lot of developers can relate is I've come to terms with this starting from Opus 4.5 the and and also similar models like I think GPT 5.2 gave me similar vibes as well. the models have been just really good at writing code and I I realize that I don't think I will handr write the code when I'm get I when I want to get stuff done if if I actually want to you know get the pleasure of writing I can still do it but one thing I reflected on is it's just been so much effort to get good at coding I I remember when I when I was learning when I I started from like kind of hacking around to go into university to learning C and C++ and it it was just bloody hard and actually you know going through my my first few jobs where I started to become better at it. I became better at debugging and there's a point where like a lot of my identity was tied to being good at coding. That's how we used to get jobs or higher paying jobs. When I was an engineering manager when we designed the interview loop at Uber, we we had talk with managers of what we need to screen for and we we talk like well what do developers do most of their time? About 50% of the time they code. Therefore, we placed about 50% of the signal was all about coding. So there was a lot of things tied into coding because it it is just hard. I think we all know that it takes grit. It takes some level of intelligence to get good at it. And there's a sense of loss of like well I I think it's great on one end that the model can do it. But it feels that something really quickly got taken away that I don't think I personally thought it would happen this quickly. And I'm I think a lot of other people are feeling like some people move on a bit easier, but there's definitely this sense of of grief. How did you think about it? Because again, you're you're an example of you you wrote so much code at at Facebook also outside of it. I know it was just a tool of doing it, but not many people could do what what you did. And now the models can also work as good as you have or if not better.
[`1:23:16`](https://youtu.be/julbw1JuAz0?t=4996) That's the challenge. Yeah. I think it's it's something that used to be a thing that we do as software engineers. It's becoming a thing that everyone is able to do. There was a moment, you know, like when I started coding, it was a very practical thing and it was a way to get things done. And at some point I just fell in love with the art of coding and like languages and kind of the the the tools themselves. And at some point I I kind of fell down this rabbit hole. I wrote this like I wrote I wrote a book about, you know, a programming language.
[`1:23:44`](https://youtu.be/julbw1JuAz0?t=5024) Typescript. You wrote the first ever TypeScript uh book at with O'Reilly.
[`1:23:50`](https://youtu.be/julbw1JuAz0?t=5030) Yeah. Yeah. Yeah. That's right. Um it it was funny actually. There there was this like there was this amazing moment for me in my little town in Japan. I went to the bookstore and I I found that book translated in Japanese.
[`1:23:58`](https://youtu.be/julbw1JuAz0?t=5038) No.
[`1:24:00`](https://youtu.be/julbw1JuAz0?t=5040) In this tiny town and that was just like the coolest moment. And then I actually realized I I don't remember Typescript at all cuz I was only writing Python for a couple years at that point. Yeah. And like at some point I started the the first the the biggest TypeScript meetup in the world. That was in that was in SF. And I got to meet kind of a lot of my heroes. There was like Chris Cowell who wrote like general theory of reactivity. There was Ryan Doll the guy that made Node. one of the first times that I I went really deep into this this community and um just the language itself and the the tools themselves and for something like TypeScript there's this beauty in the type in the type system cuz Hilesburg is just like he he he's just brilliant like the idea of like conditional types and just like anything can be a literal type and there there's these very deep ideas that even the most hardcore functional languages do not have like even in something like Haskell like it doesn't go this far and H Anders just took it and he pushed it much further than than it had had been pushed and you know like Joe Pamer and a bunch of other folks kind of explored a lot of these ideas and thought of this and I think for them it was also very practical right because they had these large untyped JavaScript code bases how do you gradually migrated to something typed and you have to come up with these very beautiful ideas to to do this for me is Scala was another kind of rabbit hole that I fell into in kind of like this functional programming world And still when I write code and when the model writes code I always think in the types first that that's what matters is what what is the type signature that matters more than the code itself and getting that right. So there is this beauty to it. There's a there's an art to it for sure. But in the end it's a practical thing and in the end this is a thing that we use to to build things and you know it's a means it's a means to an end. It's not an it's not an end to itself. I I think one metaphor I have for kind of the this moment in time that we're in is the the printing press in, you know, like the the 1400s or whatever
[`1:25:57`](https://youtu.be/julbw1JuAz0?t=5157) because at that moment it it was actually quite similar, right? Like there was a group of scribes that you know knew how to write
[`1:26:03`](https://youtu.be/julbw1JuAz0?t=5163) and it it it was as I understand of course we never lived there but as as I imagine it was it was a art process to learn. You needed to learn you needed to get the equipment. You probably needed some sponsorship or being selected practicing because you needed to produce the same thing over and over again and few people could do that and I assume it was either high prestige or highly paid or who knows let's assume it was
[`1:26:24`](https://youtu.be/julbw1JuAz0?t=5184) but then the printed press came along.
[`1:26:27`](https://youtu.be/julbw1JuAz0?t=5187) Yeah. Yeah. And at least in Europe like you had to like a lord or a king or something had to had to employ you and then you had to go through you know years of training and there was this class of scribes that knew how to write. They were employed by someone like this. often the king themselves like or you know the queen was was not literate. So it was this very very niche skill and it was like less than 1% of the population was literate in Europe you know back then and then the printing press came out and what happened so the cost of printed material went down something like 100x over the next I think 30 years 50 years or something the quantity of printed materials went up like 10,000x in the next 50 100 years this was the first effect literacy it took a little while for it to catch up so I think global literacy it went up to something like 70%. But that took like another 200 years, 300 years because learning learning to read is just very hard. Learning to write is hard. It takes a lot of effort. It takes uh education system. It takes you know infrastructure to have paper and ink uh and the free time to do this instead of working on a farm. So it kind of it took early stage of of of industrialization to actually get there. But I but I think this effect of making it so this thing that was locked away in ivory tower and now it's accessible to everyone. This is just, you know, like none of the things around us would exist today without this. Like if if we weren't literate, if the people that built, you know, this microphone weren't weren't literate, it would have just been very hard to have a modern economy. None of these things would exist. And I I just kind of think about back then if people had to predict what would happen when the printing press came out, no one would have predicted that the microphone would become a thing. So, I I just feel like this is uh this is the best the best uh analog for for the moment that we're in right now.
[`1:28:13`](https://youtu.be/julbw1JuAz0?t=5293) Yeah, it's interesting that you say that some of the kings were illiterate who are employing the scribes because if we're being honest with ourselves, we have business owners who know what they want to build and there are employing software engineers because they themselves cannot write code. And I think we we like to mock the CEOs who are coming there coming to the team. They they might even have a drawn prototype or whiteboard and saying this should be easy but of course they don't understand how difficult it is. There seems to be a bit of analogy where where there's a person who wants what they want but until now they needed to hire a software a specialist who can build that and there's always that disconnect between the idea and the person and just like with the printing press like what would happen if they could actually express and like the king could actually read or write their own letters they wouldn't need that middleman and it things become more efficient. But I mean of course for the scribe it's not the best news necessarily but I mean smart scribes can also do so someone needs to like write the books run the press etc. Yeah, exactly. And and if you think about what happened to the scribes, right? Like they cease to become scribes, but now there's a category of writers and and authors like the these people now exist. And uh the reason they exist is because the market for literature just expanded a ton.
[`1:29:30`](https://youtu.be/julbw1JuAz0?t=5370) And I guess also if we think about like back then a scrib's work was read by a few people and with the printing press and author there's a lot more authors and some of them are not really read but some of them have wider reach than than they could imagine. There's new careers that that exist because of that.
[`1:29:44`](https://youtu.be/julbw1JuAz0?t=5384) Yeah,
[`1:29:45`](https://youtu.be/julbw1JuAz0?t=5385) I love the analogy.
[`1:29:47`](https://youtu.be/julbw1JuAz0?t=5387) And the most exciting thing for me is it's just so impossible to say today what will happen after this happens and after this transition happens just you know the the economy as we know it would not have existed without it. So what's next? like what what is the thing that we can't even predict today that will exist because anyone can do this?
[`1:30:13`](https://youtu.be/julbw1JuAz0?t=5413) Well, we cannot predict but I think we can look at what is working right now. If you look around in your environment, may that be the team across entropic who are software engineers or or builders or members of technical staff, however we call them, who to you are stand out. What are they doing? What skills have they built up? And and how have they changed the way they they work? It's hard to name individuals because honestly this is just the strongest the these are the strongest people I've ever worked with in my career. There's all sorts of different archetypes. There's some people that are really amazing prototypers. Um so take something from zero to.5. Just you know figure out like what are some cool ideas? What is the technology on walk? There's other people that are amazing at finding product market fit. So kind of 0.5 to one or maybe 0ero to one. There's other people that span different disciplines and I I'm just seeing more and more of these people like I said like people that span uh product engineering and infrastructure engineering or you know product and design or design and engineering. I I think I'm just seeing a lot more of these of these hybrids.
[`1:31:15`](https://youtu.be/julbw1JuAz0?t=5475) What's a belief that changed from last year to this year? Something that you know like you either believed or or a conviction that you had that you've either revised or completely threw away. I think one thing I wasn't sure about is how big a problem is safety to be totally honest. Um I jo I joined Anthropic because like I said I read a lot of sci-fi and I kind of I know how bad this thing can go if it goes bad. It wasn't something I was sure about. Um but seeing it from the inside and then seeing how the new risks that have arisen in the last year, it just makes me much much more worried about it. Um so I I think it's it was kind of an important thing for me. Now it's just the most important thing for me is how do we make sure this thing goes well.
[`1:31:59`](https://youtu.be/julbw1JuAz0?t=5519) I think it's safe to say you you were a really great software engineer even before all all the AI things started and you seem to be a very productive engineer of course part of a team as well but but also individually. What are some skills of like you know before being a software engineer that are are still as valuable or maybe even more valuable than before and what are ones that are maybe just not as much and and they're best left behind probably. Okay, so the stuff that's left behind is uh best left behind is maybe like very strong opinions about like code style and languages and things like this. Like I I can't wait to get past like these endless language debates and framework debates and all the stuff because the model can just like you know use whatever language and framework and if you don't like it it can just rewrite it for you. So it just doesn't matter anymore. I think something that still matters a lot today is things it's being methodical and hypothesis driven. This matters both in product design in this world where everything is being disrupted and we need to figure out what to build next and this is something everyone is thinking about. Um, but it also matters for engineering day-to-day, you know, like something like debugging. You just have to be very methodical about it. And the model can can do this and it can help a lot. Um, but I think still we're in this transition point where you still need to have the skill. I don't know if you you're you're still going to need to have it in 6 months. Other skills that I think are more valuable are being curious and being open to doing things beyond your swim lane. So, you know, if you're working on engineering, but you really understand the business side, you can just build really awesome products. And I and I think the next, you know, billion dollar product, you know, like after quad code, whatever the next startup is that, you know, becomes the next trillion dollar startup, it might just be like one person that has some cool idea and their brain just is able to think across, you know, engineering and product and business or, you know, like design and finance and something else. It's like it's people are going to become more and more multi-disipline and this will become more and more rewarded. So in in some ways I think this will be the year of the generalist. I think the other skill that's actually been been rewarded of it is uh having a short attention span.
[`1:34:11`](https://youtu.be/julbw1JuAz0?t=5651) I was being rewarded now. Oh yeah. It's uh you know like people you know like teenagers are using you know like like Tik Tok and and all this stuff and I think in some ways it's kind of dangerous for society um because like you want people that can think deeply and can contemplate ideas and uh aren't just moving on to the next idea very quick but in some ways I think this year is kind of the year that is going to reward uh it's like the year of ADHD because the work for me has become jumping between quads. has become managing clouds and so it's not so much about deep work it's about how good am I about context switching and you know jumping across multiple different contexts very quickly
[`1:34:52`](https://youtu.be/julbw1JuAz0?t=5692) could I add that from what I unders what all you said maybe we could add one thing which is adaptability because you're saying of course that ADHD and and you can jump across but of course earlier you are very good at focusing deeply on one thing as well and what strikes me about you and maybe this is true for other people as well you you're just kind of very open to adapt ting your working style and seeing what works well for this stage, especially when things are changing. I think the one certain thing we can be sure is whenever the next model comes out, it'll change again. And you need to be curious and open to adapting how you work, right?
[`1:35:25`](https://youtu.be/julbw1JuAz0?t=5725) Yeah. And as closing, what's a book or books that that you would recommend? I've gone down a rabbit hole. Um, so he's the threebody problem guy, but he actually has like a lot of other really great books. I really love his uh short stories. Um, he has a couple books of short stories. I'm a big fan. For people that are new to sci-fi and you want like a little bit like harder sci-fi, um I really love Accelerondo by St. This is a book I would totally recommend. It's like essentially the product roadmap for the next 50 years. Um it it it starts with takeoff kind of starting to happen and kind of AI singularity and then it ends up with like uh this kind of like group lobster consciousnesses orbiting Jupiter and it's just like amazing. And the thing that I think it really captures is just the pace this like quickening quickening quickening pace of how this feels. It really matches the feeling right now. And then on the technical side, I would strongly recommend functional programming in Scola. Even if language choice just doesn't matter as much anymore, I think there is this art to functional programming that just teaches you how to code better. Um, and it'll just teach you how to think in types. If you read this book, I think what's really important is to do the exercises also. And I've gone through and I've done all of them probably like three times over and it's just amazing. It it really just like knocks this idea of functional types into your head and it's just a thing you can't stop thinking about.
[`1:36:45`](https://youtu.be/julbw1JuAz0?t=5805) Boris, thank you so much. This was awesome.
[`1:36:48`](https://youtu.be/julbw1JuAz0?t=5808) Yeah, thanks Kirk. This was a really interesting conversation and the thing that I keep coming back to is to Boris's prickic press analogy. The idea that medieval scribes were this tiny elite who could write employed by kings who themselves were often illiterate and that we soft rangers might be in a similar position today. We are the scribes. We spent years mastering this craft. And now the printer press is arriving. But what Boris told me is that the scribes did not disappear. They became writers and authors and the entire market for written work expanded beyond anything anyone could have predicted. I do find this hopeful and also appreciate that Boris didn't sugarcoat it. The other thing that struck with me is just how differently the Cloud Code team built software. No PRDS, no mandatory ticketing system, designers and data scientists and finance people all writing code and building dozens or hundreds of prototypes before shipping a feature. And Boris is shipping 20 to 30 pore requests a day without editing a single line by hand. And there are different verification systems in place. Claw code reviewing its code, automated lint rules, best of end passes, and human code review. If you've enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. A special thank you if you also leave a rating on the show. Thanks and
---
## Sources
- [Building Claude Code with Boris Cherny — The Pragmatic Engineer — YouTube](https://youtu.be/julbw1JuAz0)
- [The Pragmatic Engineer](https://newsletter.pragmaticengineer.com/)
File diff suppressed because one or more lines are too long
@@ -0,0 +1,340 @@
# Inside Claude Code With Its Creator Boris Cherny — Y Combinator
Transcript of the interview with Boris Cherny ([@bcherny](https://x.com/bcherny)), creator of Claude Code, on the Y Combinator Light Cone podcast, published February 17, 2026.
<table width="100%">
<tr>
<td><a href="../">← Back to Claude Code Best Practice</a></td>
<td align="right"><img src="../!/claude-jumping.svg" alt="Claude" width="60" /></td>
</tr>
</table>
---
## Video Details
- **Guest:** Boris Cherny (Creator of Claude Code)
- **Host:** Y Combinator (The Light Cone)
- **Published:** February 17, 2026
- **YouTube:** [Watch on YouTube](https://youtu.be/PQU9o_5rHC4)
---
## Transcript
[`0:01`](https://youtu.be/PQU9o_5rHC4?t=1) At Enthropic, the way that we thought about it is we don't build for the model of today. We build for the model six months from now. That's actually like still my advice to to founders that are building on LLM. Just try to think about like what is that frontier where the model is not very good at today cuz it's going to get good at it. All of Quad Code has just been written and rewritten and rewritten and rewritten over and over and over. There is no part of Quad Code that was around 6 months ago. You try a thing, you give it to users, you talk to users, you learn, and then eventually you might end up at a good idea. Sometimes you don't. Are you also in the back of your mind thinking that maybe like in 6 months you won't need to prompt that explicitly? Like the model will just be good enough to figure out on its own?
[`0:36`](https://youtu.be/PQU9o_5rHC4?t=36) Maybe in a month,
[`0:38`](https://youtu.be/PQU9o_5rHC4?t=38) no more need for plan mode in a month. Welcome to another episode of the light cone and today we have an extremely special guest, Boris Churnney, the creator engineer of Claude Code. Boris, thanks for joining us.
[`0:59`](https://youtu.be/PQU9o_5rHC4?t=59) Thanks for having me.
[`1:00`](https://youtu.be/PQU9o_5rHC4?t=60) Thanks for creating a thing that has taken away my sleep for about 3 weeks straight.
[`1:07`](https://youtu.be/PQU9o_5rHC4?t=67) I am very addicted to Cloud Code and uh it feels like rocket boosters. Has it felt like this for people like for you know months at this point. I think it was like end of November is where uh a lot of my friends said like something changed.
[`1:21`](https://youtu.be/PQU9o_5rHC4?t=81) I remember for me I felt this way when I first created Quad Code and I didn't yet know if I was on to something. I kind of felt like I was on to something and then that's when I wasn't sleeping.
[`1:28`](https://youtu.be/PQU9o_5rHC4?t=88) Yeah.
[`1:29`](https://youtu.be/PQU9o_5rHC4?t=89) And that was just like three straight months.
[`1:33`](https://youtu.be/PQU9o_5rHC4?t=93) This was uh September 2024. Yeah. It was like three straight months. I I didn't take a single day vacation. Worked through the weekends. Worked every single night. I was just like, "Oh my god, this is I think this is going to be a thing. I don't know if it's useful yet because it it couldn't actually code yet."
[`1:48`](https://youtu.be/PQU9o_5rHC4?t=108) If you look back on uh those moments to now, like what would be like the most surprising thing about this moment right now?
[`1:54`](https://youtu.be/PQU9o_5rHC4?t=114) It's unbelievable that we're still using a terminal. That was supposed to be the starting point. I didn't think that would be the ending point. And then the second one is that it's even useful cuz uh you know at the beginning it didn't really write code. Even in February when we G it wrote maybe like 10% of my code or something like that. I didn't really use it to write code. it wasn't very good at it. I still wrote most of my code by hand. Uh so the fact that it it actually like our bets paid off and it got good at the thing that we thought it was going to get good at because it wasn't obvious. At Enthropic, the way that we thought about it is we don't build for the model of today. We build for the model 6 months from now. And that's actually like still my advice to to founders that are building on LLM is, you know, just try to think about like what is that frontier where the model is not very good at today. um because it's going to get good at it and you just have to wait.
[`2:39`](https://youtu.be/PQU9o_5rHC4?t=159) Going back though, but when do you remember when you first got the idea? Can you just talk us through that? Like was it some like a spark or what was even the first version of it in your mind?
[`2:47`](https://youtu.be/PQU9o_5rHC4?t=167) You know, it's funny. It was like it was so accidental that it just kind of evolved into this. Um you know as as anthropic I think for Ant the bet has been coding for a long time and the bet has been the path to save to safe AGI is through coding
[`3:03`](https://youtu.be/PQU9o_5rHC4?t=183) and this is this has kind of always been the idea and the way you get there is you you teach the model how to code then you teach it how to use tools then you teach it how to use computers um and you can kind of see that because the the first team that I joined at Enthropic it was called the anthropic labs team uh and it produced three products it was quadcode MCP and in the desktop app. So you can kind of see how these like weave together. The particular product that we built, you know, like no one no one asked me to build a CLI. Um we kind of knew maybe it was time to build some kind of coding product cuz it seemed like the model was ready, but no one had yet really built the product that harnessed this capability. So like still there's this insane feeling of product overhang. But at the time it was just like even crazier cuz like no one had built this yet. And so I I started like hacking around uh and I was like, "Okay, we build a coding product. What do I have to do first? I have to understand how to use the API because I hadn't used anthropic API at that point." Um and so I I just built like a little terminal app to use the API. That's all that I did. And it was a little chat app because you know like you think about the you know AI applications of the time and you know for non-coders today most what what are most people using is just a chat app. So that's what I built. Uh and you know it was in a terminal. I can ask questions. I give answers. Then I think tool use came out. I just wanted to try out tool use because I I don't really understand what this is. I was like to use this is cool. Is this actually useful? Probably not. Let me just try it.
[`4:25`](https://youtu.be/PQU9o_5rHC4?t=265) You built it in terminal just because it was the easiest way to get something up and running.
[`4:29`](https://youtu.be/PQU9o_5rHC4?t=269) Yes. Cuz I didn't have to build a UI.
[`4:30`](https://youtu.be/PQU9o_5rHC4?t=270) Okay.
[`4:31`](https://youtu.be/PQU9o_5rHC4?t=271) It was just me
[`4:32`](https://youtu.be/PQU9o_5rHC4?t=272) at that point. It was like the IDEs, Cursor, Windsurf taking off. Were you sort of under any pressure or getting lots of suggestions of, hey, like we should build this out as a plugin or as a as a fully featured ID itself? There was no pressure because we didn't even know what we wanted to build. Like the the team was just in explore mode, you know, like we we didn't we know vaguely we wanted to do something in coding, but it wasn't obvious what no one was high confidence enough. That was like my job to figure out. And so I g I gave the model uh the batch tool. That was the first tool that that I gave it just cuz I think that was literally the example in our docs. I just like took the example. It was in Python. I just ported it to TypeScript because that that's how I wrote it. You know, I didn't know like what the model could do with bash. So I asked it to like read a file. It could like cat the file. So like that was cool. And then I was like, "Okay, like what can you actually do?" And and I asked her, "What music am I listening to?" He wrote some like Apple script to script my my Mac and look up the music in my music player.
[`5:24`](https://youtu.be/PQU9o_5rHC4?t=324) Oh my god.
[`5:26`](https://youtu.be/PQU9o_5rHC4?t=326) And this was Sauna 3.5.
[`5:28`](https://youtu.be/PQU9o_5rHC4?t=328) And you know, like I I didn't think the model could do that. And that was my first I think ever fuel the AGI moment
[`5:34`](https://youtu.be/PQU9o_5rHC4?t=334) where I was just like, "Oh my god, the model it just wants to use tools. That's all it wants."
[`5:39`](https://youtu.be/PQU9o_5rHC4?t=339) That's kind of fascinating. I mean it's very kind of contrarian that clocker works so well in such an elegant simple form factor. I mean terminals have been around for a really long time and that seemed to be like a good design constraint that allowed a lot of interesting developer experiences like it doesn't feel like working. It just feels fun as a developer. I don't think about files where everything is and that came by accident almost.
[`6:09`](https://youtu.be/PQU9o_5rHC4?t=369) Yeah, it was an accident. I remember so after the terminal started to take off internally. Um and honestly like after building this thing I think like 2 days after the first prototype I started giving it to my team just for dogfooting cuz you know like you know if you come up with an idea and it seems useful the first thing you want to do is you want to give it to people to see how they use it. And then I came in the next day and then Robert who sits across from me who's another engineer he he just like had quad code on his computer and he was like using it to code. I was like I was like what what are you what are you doing? Like this thing isn't ready. It's just a prototype. But yeah, it it was already useful in that form factor. And I remember when we did our launch review to kind of launch quad code externally, this was in December, November, something like that in 2024. Um Dario asked and he was like, "The us chart internally like the the Dow chart is like vertical. Are you like forcing engineers to use it? Like why are you mandating them?"
[`7:00`](https://youtu.be/PQU9o_5rHC4?t=420) And I was just like, "No, no, we didn't. We I just like posted about it and they they' just been like telling each other about it." Honestly, it was it was just accidental. We we started with the CLI because it was the cheapest thing and it just kind of stayed there for a bit.
[`7:13`](https://youtu.be/PQU9o_5rHC4?t=433) So in that 2024 period, what how were the engineers using it? Were they sort of shipping code with it yet or were they using it in a different way?
[`7:19`](https://youtu.be/PQU9o_5rHC4?t=439) The model is not very good at coding yet. I I was using it personally for automating git. Um I think at this point I I probably forgotten most of my git because cloud code has just been doing it for so long. But yeah, like automating uh bash commands that that was a very early use case and like operating like Kubernetes and kind of things like this. People were using it for coding. So there were some early signs of this. I think the first use case was actually writing unit tests because it's a little bit lower risk and the model was still pretty bad at it
[`7:46`](https://youtu.be/PQU9o_5rHC4?t=466) but people were were were kind of figuring it out and and they were figuring out how to use this thing.
[`7:51`](https://youtu.be/PQU9o_5rHC4?t=471) Um and one thing that we saw is people started writing these markdown files for themselves and then having the model read that markdown file. And this is where QuadMD came from. Probably the single for me biggest principle in product is latent demand. Um and the just every bit of this product is built through latent demand after their initial CLI. Uh and so quadmd is an example of that. There's this other general principle that I think is maybe interesting where you can build for the model and then you can build scaffolding around the model in order to improve performance a little bit and depending on the domain you can improve performance maybe 10 20% something like that and then essentially the gain is wiped out with the next model. So either you can build build the scaffolding and then you know get some performance gain and then rebuild it again or you just wait for the next model and then you kind of get it for free. the quantumd and kind of the scaffolding is an example of that and really I think that's why we stayed in the CLI is because we felt there is no UI we could build that would still be relevant in 6 months because the model was improving so quickly
[`8:50`](https://youtu.be/PQU9o_5rHC4?t=530) earlier we were saying like we should compare cloud MDs but you said something very profound which is you know yours is actually very short which is almost like the opposite of what you know people might expect why is that what's in your cloud MD
[`9:03`](https://youtu.be/PQU9o_5rHC4?t=543) okay so I I checked this before we came so my my cloud has two Um, one is, uh, there it's just two lines. So, the first line is whenever you put up a PR, enable automerge. Um, so as soon as someone accepts it, it's merged. That's just so I can like code and I don't have to kind of go back and forth with CR or whatever. And then the second one is whenever I put up a PR, post it in our internal team stamps channel. Uh, just so someone can stamp it and I can get unblocked. Uh, and the idea is every other instruction is in our quadmd that's checked into the codebase and it's something our entire team contributes to multiple times a week. And very often I'll see someone's PR and they make some like mistake that's totally preventable and I'll just literally tag Claude on the PR. I'll just do like add quad, you know, like add this to the quad MD and I'll do this, you know, like many times a week.
[`9:52`](https://youtu.be/PQU9o_5rHC4?t=592) Do you have to like compact the Claude MD? Like I definitely reached a point where I got the message at the top saying your cloud MD is like thousands of tokens now. What do you do when you guys hit that?
[`10:03`](https://youtu.be/PQU9o_5rHC4?t=603) So our quadm is actually pretty short. I think it's like couple thousand tokens maybe something like that. Um if if you hit this my recommendation would be delete your quadmd and just start fresh.
[`10:11`](https://youtu.be/PQU9o_5rHC4?t=611) Interesting.
[`10:12`](https://youtu.be/PQU9o_5rHC4?t=612) I think a lot of people like they try to overengineer this right and and really like the capability changes with every model. And so the thing that you want is do the minimal possible thing in order to get the model on track. And so if you delete your quadd and then you know the model is getting off track, it does the wrong thing. That's when you kind of add back a little bit at a time. And what you're probably going to find is with every model, you have to add less and less. For me, I consider myself a pretty average engineer to be honest. Like I don't use a lot of fancy tools. Like I I don't use like Vim. I use, you know, VS Code because it's simpler. Um I don't really
[`10:44`](https://youtu.be/PQU9o_5rHC4?t=644) Wait, really? I would have assumed that because you built this in the terminal that you were sort of like a dieh hard ter terminal like Vim Vim only person you know screw those VS code people you know
[`10:53`](https://youtu.be/PQU9o_5rHC4?t=653) well we have people like that on the team there's you know like Adam Wolf for example he's he's on the team he's like you will never take Vim for my cold dead hands like yeah so there's definitely a lot of people like that on the team and this is one of the things that I learned early on is every engineer likes to hold their dev tools differently they like to use different tools there's just no one tool that works for everyone but I think also this is one of the things that makes it possible for quad code to be so good because I kind of think about it as what is the product that I would use that makes sense to me and so to use quad code you don't have to understand Vim you don't have to understand TMX you don't have to know how to like SSH you don't have to know all the stuff you just have to open up the tool and it'll guide you it'll it'll do all this stuff
[`11:30`](https://youtu.be/PQU9o_5rHC4?t=690) how do you decide how verbose you want like sort of the terminal to be like sometimes you have to go you know control O and check it out and is it like internal bike shed battles around like longer shorter I mean every user probably has a for an opinion like how do you make those sorts of decisions?
[`11:47`](https://youtu.be/PQU9o_5rHC4?t=707) What What's your opinion? Is it is it too verbose right now?
[`11:50`](https://youtu.be/PQU9o_5rHC4?t=710) Oh, I love the verbosity cuz basically sometimes it just like goes off the deep end and I'm watching and then I can just read very quickly and it's like, "Oh, no, no, it's not that." And then I escape and then just stop it and then it just like stops an entire bug farm like as it's happening. I mean, that's usually when I didn't do plan mode properly.
[`12:07`](https://youtu.be/PQU9o_5rHC4?t=727) This is something that we probably change pretty often. Um, I remember early on, this is maybe six months ago, I tried to get rid of bash output just internally just to like summarize it because I was like these giant long bash commands, I don't actually care. And then I gave it to anthropic employees for a day and everyone just revolted. I want to see my dash because it it actually is quite useful for, you know, like for something like git output, maybe it's not useful, but if you're running, you know, like Kubernetes jobs or something like this, you actually do want to see it. We recently hit the hid the file reads and uh file searches. So you'll notice instead of saying, you know, like read food.md said, you know, like read one file, search searched one pattern. And this is something I think we could not have shipped six months ago because the model just was not ready. It would have, you know, it still read the wrong thing pretty often. As a user, you still had to be there and kind of catch it and debug it. But nowadays, I just noticed it's on the right track almost every time. And because it's using tools so much, it's actually a lot better just to summarize it. Um, but then we shipped it. Uh, we dog fooded it for like a month and then people on GitHub didn't like it. Uh so there was a big issue where people like no like I want to see the details and that was really great feedback. Um and so we added a new verbose mode and so that's just like in slash config you can enable verbose mode and if you want to see all the file outputs you can continue to do that and then I posted on the issue and people still still didn't like it which is again awesome because like my favorite thing in the world is just hearing people's feedback and hearing how they actually want to use it. Um and so we just like iterated more and more and more to get that really good and to make it the thing that people want. I'm amazed like how much I enjoy uh fixing bugs now. And then all you have to do is uh have really good logging and then even just say like hey check out that you know this particular object it messed up in this way and it like searches the log. It figures everything out. It can like go into your you can make a production tunnel and it'll look at your production DB for you. It's like this is insane. Bug fixing is just going to sentry copy markdown. You know pretty soon it's just going to be straight MCP. It's like an autobug fixing like and test making sort of uh what's the new uh term they call it like a making a startup factory. Oh yeah.
[`14:10`](https://youtu.be/PQU9o_5rHC4?t=850) Right. There's like all these concepts now of rather than having to review the code, you know, I'm I'm old school, so I like the verbosity. I like to say, "Oh, well, you're doing this, but I want you to do that." Right? But there's a totally different school of thought now that says like anytime an a real human being has to look at code uh that's bad.
[`14:30`](https://youtu.be/PQU9o_5rHC4?t=870) Yeah. Yeah. Yeah.
[`14:31`](https://youtu.be/PQU9o_5rHC4?t=871) Which is fascinating.
[`14:32`](https://youtu.be/PQU9o_5rHC4?t=872) I think like Dan Chipper talks about this a lot as kind of when whenever you see the model make a mistake try to put in the quadmd try to put it in like skills or something like that so it's reusable. But I I think there's this meta point that I actually struggle with a lot. And I people talk about like agents can do this, agents can do that, but actually what agents can do, it changes with every single model. And so sometimes there's a new person that joins the team and they actually use quad code more than I would have used it.
[`14:58`](https://youtu.be/PQU9o_5rHC4?t=898) And I'm just constantly surprised by this. Like for example, there was a we had like a memory leak and we were trying to debug it. Um and by the way, like Jared Sumar has just been on this crusade killing all the memory leaks and it's just been amazing. But before Jared was on the team, I had to do this and there was this memory leak. I I was trying to debug it. And so I I took a heap dump. I opened it in DevTools. I was looking through the profile. Then I was looking through the code and I I was just trying to figure this out. And then another engineer on the team, Chris, he just like asked Quad Code. He was like, "Hey, I think there's a memory leak. Can you like run this?" And then like try to figure it out. And Quad Code like took the heap dump. It wrote a little tool for itself to like analyze the heap dump. And then it found the leak faster than I did. And this is just something I have to constantly relearn because my brain is still stuck somewhere six months ago at times.
[`15:45`](https://youtu.be/PQU9o_5rHC4?t=945) So what would be some advice for technical founders to really become maximalists at the latest model release? It sounds like people off of fresh off of school or that don't have any assumptions might be better suited than maybe sometimes engineers who have been working at it for a long time. And how do the experts get better? I think for yourself it's kind of beginner mindset and uh I don't know maybe just like humility like I feel like engineers as a discipline we've learned to have very strong opinions and senior engineers are kind of rewarded for this in my old job at a big company when I hired like architects and this kind of a type of engineer you look for people that have a lot of experience and really strong opinions but it actually turns out a lot of this stuff just isn't relevant anymore and a lot of these opinions should change because the model is getting better um so I think actually the biggest skill is people that can think scientifically and can just think from first principles.
[`16:40`](https://youtu.be/PQU9o_5rHC4?t=1000) How do you screen for that when you try to hire someone now for for your team?
[`16:43`](https://youtu.be/PQU9o_5rHC4?t=1003) I sometimes ask about what's an example of when you're wrong. It's a really good one. You know, some of these like classic behavioral questions like not even coding questions I think are quite useful because you can see if people can recognize their mistake in hindsight, if they can claim credit for the mistake and if they learn something from it. And I think a lot of these like very senior people especially there there are some founder types like this but I think founders in particular are actually quite good at it. Um but other people sometimes will never really take uh they'll never take the blame for a mistake. But I don't know like for me personally I'm wrong probably half the time. Like half my ideas are bad and you just have to try stuff and you know you try a thing you give it to users you talk to users you learn and then eventually you might end up at a good idea. Sometimes you don't. And this is the skill that I think in in the past was very important for founders, but now I think it's very important for every engineer.
[`17:34`](https://youtu.be/PQU9o_5rHC4?t=1054) Do you think um you would ever hire someone based on the uh claude code transcript of uh them working with the agent cuz we're actively doing that right now. We just added uh just as a test like you can upload a transcript of you coding a feature with cloud code or codeex or whatever it is. Personally, I think that like it's going to work. I mean, you can figure out uh how someone thinks, like whether they're looking at the logs or not, like can they correct the agent if it goes off off the rails? Like, does do they use plan mode? You know, when they use plan mode, do they make sure that there are tests or you know, all of these different things that,
[`18:11`](https://youtu.be/PQU9o_5rHC4?t=1091) you know, do they think about systems? Do they even understand systems? Like, there's just so much that's sort of embedded in that that I imagine. I just want like a spider uh a spiderweb graph, you know, like in those video games like NBA 2K. It's like, oh, this person's really good at shooting or defense. It's like you could imagine a spiderweb graph of like, you know, someone's claude code skill level.
[`18:31`](https://youtu.be/PQU9o_5rHC4?t=1111) Yeah. What would what would the skills be? What would be those?
[`18:34`](https://youtu.be/PQU9o_5rHC4?t=1114) I mean, I think it's like systems testing must be like user behavior. I mean, there's got to be a design part like product sense maybe also just like automating stuff. Mhm. My favorite thing in CloudMD uh for me is I have a thing that says for every plan decide whether it's overengineered, underengineered, or perfectly engineered and why.
[`18:54`](https://youtu.be/PQU9o_5rHC4?t=1134) I think this is something that we're trying to figure out, too, cuz I I think uh when I look at engineers on the team that I think are the most effective, there's essentially two, it's very biodal. Um there's one side where it's extreme specialists. Um and so like I named Jared before, like he's a really good example of this and kind of the bun team is a really good example. Just hyper specialist. They understand dev tools better than anyone else. They understand JavaScript runtime systems better than anyone else. And then there's the flip side of kind of hyper generalists and that's kind of the rest of the team. And a lot of people they span like product and info or product and design um or you know like product and user research, product and business. I really like to see people that just do weird stuff. I think that's one of these things that was kind of a warning sign in the past because it's like can these people actually build something useful?
[`19:39`](https://youtu.be/PQU9o_5rHC4?t=1179) Um that's the limits test. Yeah, that's what must but but nowadays like for example an engineer on the team Daisy, she was on a different team and then she transferred onto our team and the reason that I wanted her to transfer is she put up a PR for Claude Code like a couple weeks after she joined or something and the PR was to add a new feature to Claude Code and then instead of just adding the feature what she did is first she put up a PR to give Claude code a tool so that it can test an arbitrary tool and verify that that works. And then she put up that PR and then she had Quad write its own tool instead of herself implementing it. And I think it's this kind of out of the box thinking that is is just so interesting because not a lot of people get it yet. You know, like we use the Quad agents SDK to automate pretty much every part of development. It automates code review, security review. Uh it labels all of our issues. It shephards things to production. It does pretty much everything for us. But I think externally I'm seeing a lot of people start to figure this out, but it's actually taken a while to figure out how do you use LMS in this way? How do you use this new kind of automation? So it's kind of a new skill.
[`20:42`](https://youtu.be/PQU9o_5rHC4?t=1242) I guess one of the uh funnier things that I've been having office hours with various founders about is um you have like sort of the visionary founder who has like the idea they've like built this like crystal palace of the product that they want to build. they've totally loaded in their brain, you know, who the user is and what they feel and what they're motivated by and then they're sitting in claude code and they can do like, you know, 50x work and then but they have engineers who work for them who like don't have the, you know, crystal memory palace of like the platonic ideal of the product that the pro founder has and they can only do like 5x work. Are you hearing stories like that? there's usually a person who's like the core like designer of a thing and they're just like, you know, trying to blast it out of their brain. What's the nature of like teams like that? You know, it seems like that's almost a stable configuration. Like you're going to have the visionary who like now is unleashed, but you know, maybe going back to the top of it, like I'm experiencing this right now. I was like, "Oh, well, I'm only a solo person and you know, I need to eat and sleep and I have, you know, a whole job. It's like, how am I going to do this?" You know,
[`21:52`](https://youtu.be/PQU9o_5rHC4?t=1312) you know, like we just launched quad teams and, you know, this is a way to do it, but you can also just build your own way to do it. It's pretty easy.
[`21:59`](https://youtu.be/PQU9o_5rHC4?t=1319) What's the vision for cloud teams?
[`22:01`](https://youtu.be/PQU9o_5rHC4?t=1321) Just collaboration. It's like there's this whole new field of like agent top apologies that people are exploring. Like what are the ways that you can configure agents? There's this one sub idea which is uncorrelated context windows. And the idea is just multiple agents, they have fresh context windows that aren't essentially polluted with each other's context or their own previous context. And if you throw more context at a problem, that's like a form of test time compute. Um, and so you just get more capability that way. And then if you have the right topology on top of it, so the agents can communicate in the right way, they're laid out in the right way, then they can just build bigger stuff. And so Teams is kind of like one idea. There's a few more that are coming pretty soon. Um, and the idea is just maybe it can build a little bit more. I think the first kind of big example where it worked is our plugins feature was entirely built by a swarm over over a weekend. It just ran for like a few days. There wasn't really human intervention. And plugins is pretty much in the form that it was when when it came out.
[`22:54`](https://youtu.be/PQU9o_5rHC4?t=1374) How did you set that up? Like did you spec out sort of the outcome that you were hoping for and then let it sort of figure out the details and then like let it run?
[`23:04`](https://youtu.be/PQU9o_5rHC4?t=1384) Yeah. an engineer on the team just gave uh gave Quad a spec and um told Quad to use a Asauna board and then Quad just put up a bunch of tickets on a sauna and then spawned a bunch of agents and the agent started picking up tasks. The main quad just gave it instructions and they all just figured it out
[`23:21`](https://youtu.be/PQU9o_5rHC4?t=1401) like independent um agents that didn't have the context of the bigger spec. Right.
[`23:25`](https://youtu.be/PQU9o_5rHC4?t=1405) Right. If you if you think about the way that uh you know like how our agents actually started nowadays and you know I haven't pulled the data on this but I would bet the majority of agents are actually prompted by quad today in the form of uh sub agents cuz like a sub agent is just like a recursive quad code that's all it is in the code and it's just prompted by we call her mama quad
[`23:45`](https://youtu.be/PQU9o_5rHC4?t=1425) and that that's all it is and I think probably if you look at most agents they're launched in this way
[`23:49`](https://youtu.be/PQU9o_5rHC4?t=1429) my claude insights just told me to do this more for debugging so that I get like I spend a lot of time on debugging And it would just be better to have like multiple sub agents spin up and like debug something in parallel. And so then I just like added that to my claude MD to just be like, hey, like next time you try and fix a bug like have one agent that like looks in the log, like one that looks in the code path. That just seems sort of inevitable.
[`24:11`](https://youtu.be/PQU9o_5rHC4?t=1451) For weird scary bugs, I try to uh fix bugs in plan mode and then it seems to use the agents to sort of search everything. Whereas like when you're just trying to do it in line, it's like, okay, I'm going to do like this one task instead of search wide. This is something I do all the time too. I I just say if the if the test seems kind of hard, this kind of research test, I'll calibrate the number of sub aents I ask it to use based on the difficulty of the task.
[`24:33`](https://youtu.be/PQU9o_5rHC4?t=1473) So if it's like really hard, I'll say like use three or maybe five or even 10 sub aents, research in parallel and then see what they come up with.
[`24:40`](https://youtu.be/PQU9o_5rHC4?t=1480) I'm curious. So then why don't you put that in your clawed MD file?
[`24:44`](https://youtu.be/PQU9o_5rHC4?t=1484) It's kind of case by case, you know, like quadm like what is it? It's just a it's a shortcut. Like if you find yourself repeating the same thing over and over, you put in the quad MD. But otherwise, you don't have to put everything there. You can just prompt quad.
[`24:56`](https://youtu.be/PQU9o_5rHC4?t=1496) Are you also in the back of your mind thinking that maybe like in six months, you won't need to prompt that explicitly? Like the model will just be good enough to figure out on its own.
[`25:05`](https://youtu.be/PQU9o_5rHC4?t=1505) Maybe in a month.
[`25:07`](https://youtu.be/PQU9o_5rHC4?t=1507) No more need for plan mode in a month.
[`25:07`](https://youtu.be/PQU9o_5rHC4?t=1507) Oh my god.
[`25:09`](https://youtu.be/PQU9o_5rHC4?t=1509) I think plan mode probably has a limited lifespan.
[`25:11`](https://youtu.be/PQU9o_5rHC4?t=1511) Interesting.
[`25:12`](https://youtu.be/PQU9o_5rHC4?t=1512) That's some alpha for everyone here. What would the world look like without plan mode? Do you just describe it at the prompt level and it would just do it? One shot it? Yeah, we've uh we've started experimenting with this because quad code can now enter plan mode by itself. I don't know if you've you guys have seen that.
[`25:26`](https://youtu.be/PQU9o_5rHC4?t=1526) Yeah.
[`25:28`](https://youtu.be/PQU9o_5rHC4?t=1528) So, we're trying to kind of get this experience really good. So, it would enter plan mode the same point where a human would have wanted to enter it. So, I think it's like I think it's something like this, but actually plan mode there's no there's no big secret to it. All it does is it adds one sentence to the prompt that's like please don't code.
[`25:44`](https://youtu.be/PQU9o_5rHC4?t=1544) That's all it is. You can you can actually just say that.
[`25:47`](https://youtu.be/PQU9o_5rHC4?t=1547) Yeah. So it sounds like a lot of the feature development for clock code is very much a what we talk about a YC talk to your users
[`25:54`](https://youtu.be/PQU9o_5rHC4?t=1554) and then you come and implemented it. It wasn't the other way that you had this master plan and then implemented all the features.
[`25:59`](https://youtu.be/PQU9o_5rHC4?t=1559) Yeah. Yeah. I mean that that's all it was like plan mode was we saw users that that were like hey quad come up with an idea plan this out but don't write any code yet. And there was kind of various versions of this. Sometimes it was just talking through an idea. Sometimes it was these very sophisticated specs that that they were asking Claude to write, but the common dimension was do a thing without coding yet. And so literally like this was like Sunday night at 10 p.m. I was I was just like looking at GitHub issues and kind of seeing what people were talking about and looking at our internal Slack feedback channel and I just wrote this thing in like 30 minutes and then uh shipped it that night. It went out Monday morning. That was plan mode. So do you mean that there will be no need for plan mode to in the sense of I'm worried that the model's going to do like it's going to do like the wrong thing or head off in the wrong direction but there will still be a need for that. You need to think through the idea and figure out exactly what it is that you want and you have to do that somewhere.
[`26:49`](https://youtu.be/PQU9o_5rHC4?t=1609) I kind of think about it in terms of like kind of increasing model capabilities. So maybe 6 months ago a plan was insufficient. So you get Claude to make a plan. Let's say even with plan mode you still have to kind of sit there and babysit cuz it can go off track. Nowadays what I do is probably 80% of my sessions I say I say plan mode has a limited lifespan but I I'm a heavy plan mode user. Um I probably 80% of my sessions I start in plan mode and claude will you know it'll start it'll start making a plan. I'll move on to my second terminal tab and then I'll have it make another plan and then when I run out of tabs I open the desktop app and then I go to the code tab and then I just start a bunch of tabs there and they all start in plan mode probably know like 80% of the time. Once the plan is good, and sometimes it takes a little back and forth, they just get clawed to execute. And nowadays, what I find with Opus 4.5, I think it started with 4.6 it got really good. Once the plan is good, it just stays on track and it'll just do the thing exactly right almost every time. And so, you know, before you had to babysit after the plan and before the plan, now it's just before the plan. So, maybe the next thing is you just won't have to babysit. You can just kind of give a prompt and Quad will figure it out.
[`27:53`](https://youtu.be/PQU9o_5rHC4?t=1673) The next step is Claude just speaks to your users directly. Yeah, it just bypasses you entirely.
[`27:59`](https://youtu.be/PQU9o_5rHC4?t=1679) It's funny. This is actually the current stuff for us. Our quads actually like they talk to each other. They talk to our users on Slack, at least internally pretty often. Um, my quad will like tweet once in a while.
[`28:08`](https://youtu.be/PQU9o_5rHC4?t=1688) No way.
[`28:11`](https://youtu.be/PQU9o_5rHC4?t=1691) Um, but I actually like delete it. It's just like it's a little like cheesy. Like I don't love the tone.
[`28:16`](https://youtu.be/PQU9o_5rHC4?t=1696) What does it want to tweet about?
[`28:17`](https://youtu.be/PQU9o_5rHC4?t=1697) Sometimes it'll just like respond to someone cuz I always have like co-work in the background and it's like it's the co-work that really loves to do that because it likes using a browser.
[`28:25`](https://youtu.be/PQU9o_5rHC4?t=1705) That's funny. A a really common pattern is I ask Quad to build something. It'll look in the codebase. Uh it'll see some engineer touch something in the git flame and then it'll message that engineer on Slack. Um just like asking a clarifying question and then once it gets answer back, it'll keep going.
[`28:40`](https://youtu.be/PQU9o_5rHC4?t=1720) What are some tips for founders now on how to build for the future? Sounds like everything is really changing. What are like some principles that will stay on and what will change?
[`28:49`](https://youtu.be/PQU9o_5rHC4?t=1729) So I think some of these are pretty are pretty basic, but I think they're even more important now than they were before. Um, so one example is latent demand. Like I mentioned it a thousand times for me. It's just like the single biggest idea in product. It's a it's a thing that no one understands. It's a thing I certainly did not understand my first few startups. And and the idea is like people will only do a thing that they already do. You can't get people to do a new thing. If people are trying to do a thing and you make it easier, that's a good idea. But if if people are doing a thing and you try to make them do a different thing, they're not going to do that. And so you just have to make the thing that they're trying to do easier. And I think quad is going to get increasingly good at kind of figuring out these kind of product ideas for you just because it can look at feedback, it can look at debug logs, it can kind of figure this out.
[`29:30`](https://youtu.be/PQU9o_5rHC4?t=1770) That's what you mean by plan mode was latent demand that people were already like I don't know had their clawed chat window open in a browser and were like talking to it to figure out like the spec and and what it should do. And now that like pi mode just became that you just do it in claw code.
[`29:45`](https://youtu.be/PQU9o_5rHC4?t=1785) Yeah. Yeah, that's it. Some sometimes what I'll do is I'll just walk around the office on on our floor and I'll just kind of stand behind people like I I'll say like hi so it's not and then um I'll I'll just see kind of like how they're using quad code. Um and this is also just something I saw a lot um but it also came up in GitHub issues like people were talking about it. It seems like so you're surprised how far the terminal has gone and how far it's been pushed like how far do you think it has left to go just given with this world of swore multiple agents like do you think there's going to be a new a need for a different UI on top of it?
[`30:19`](https://youtu.be/PQU9o_5rHC4?t=1819) It's funny if you asked me this a year ago I would have said the terminal has like a threemonth lifespan and then we're going to move on to the next thing. Uh and you can see us experimenting with this right because quad code started in a terminal but now it's in you know it's on web you can like quadcode it's in the desktop app you know we've had that for you know like three months or six months or something just in the code tab um it's in the iOS and Android apps just like in the code tab it's in slack it's in GitHub there's VS Code extensions there's Jet Brains extensions so we're just like we're always experimenting with different form factors for this thing to figure out what's the next thing I've been wrong so far about the of the CLI. So, I'm probably not the person to forecast that.
[`30:58`](https://youtu.be/PQU9o_5rHC4?t=1858) What about like your advice to DevTool founders? Like, someone's building a DevTool company today. Should they just like be building for engineers and humans or should they be thinking more about like what Claude going to think and want and build for sort of like the agent?
[`31:13`](https://youtu.be/PQU9o_5rHC4?t=1873) The way I would frame it is think about the thing that the model wants to do and figure out how do you make that easier. And that's something that we saw, you know, like when I first started hacking on quad code, I I realized like this thing just wants to use tools. It just wants to interact with the world. And how how do you how do you enable that? Well, the way you don't do it is you put it in a box and you're like, here's the API, here's how you interact with me, and here's how you interact with the world. The way you do it is you see what tools it wants to use. You see what it's trying to do, and you enable that the same way that you do for your users. And so, like for if you're building a dev tool startup, I would think about like what is the problem you want to solve for the user? And then when you use when you apply the model to solving this problem, what is the thing the model wants to do?
[`31:54`](https://youtu.be/PQU9o_5rHC4?t=1914) And then what is the technical and product solution that serves the weight and demand of both? YC's next batch is now taking applications. Got a startup in you? Apply at y combinator.com/apply. It's never too early and filling out the app will level up your idea. Okay, back to the video. Back in the day, more than 10 years ago, you were a very heav heavy user and you wrote a book about TypeScript, right? Before Typescript was cool. This is when everyone was a deep in JavaScript. This is back in early 2010s, right?
[`32:27`](https://youtu.be/PQU9o_5rHC4?t=1947) Yeah, something like that.
[`32:29`](https://youtu.be/PQU9o_5rHC4?t=1949) Before Typescript was a thing because back then is a very weird language. It's not supposed to do a lot of things with being typed in JavaScript and now it's the right thing and it feels like clot code in the terminal has a lot of parallels with TypeScript at the beginning.
[`32:47`](https://youtu.be/PQU9o_5rHC4?t=1967) TypeScript makes a lot of really weird language decisions. So if you look at the type system pretty much anything can be a literal type for example and this is like this is super weird cuz like even like like Haskell doesn't even do this. It's just like it's too extreme or it has like conditional types which I don't think any language thought of at all.
[`33:06`](https://youtu.be/PQU9o_5rHC4?t=1986) It was like very strongly typed.
[`33:08`](https://youtu.be/PQU9o_5rHC4?t=1988) Yeah, it was very strongly and and the idea was like when you know like when Joe Pamer and Anders and the early team was like building this thing, the way they built it is we okay, we have these teams with these big untyped JavaScript code bases. We have to get types in there, but we're not going to get engineers to change that the way that they code. You're not going to get JavaScript people to have like, you know, 15 layers of class inheritance like you would a Java programmer, right? They're going to write code the way they're going to write it. They're they're going to use reflection and they're going to use mutation and they're going to use all these features that traditionally are very very difficult to type.
[`33:38`](https://youtu.be/PQU9o_5rHC4?t=2018) They're a very unsafe type to any strong functional programmer.
[`33:41`](https://youtu.be/PQU9o_5rHC4?t=2021) That's right. That's right. That's right. And so the thing that they did instead of getting people to kind of change the way that they code, they they built a type system around this. And it was just it's brilliant because there's all these ideas that no one was thinking about even in academia like no one thought of a bunch of these ideas. It purely came out of the practice of observing people and seeing how JavaScript programmers want to write code. And so you know for for quad code it there there are some ideas that are kind of similar in that you know like you can use it like a Unix utility. You can pipe into it. You can pipe out of it. Um in some ways it is kind of rigorous in this way but in in almost every other way it's just the tool that we wanted. like I I build a tool for myself and then the team builds the tool for themselves and then for anthropic employees and then for users and it just ends up being really useful. It's not it's not this like principled and academic thing which I think the the proof is actually in the results. Now fast forward more than 15 years later not many codebases are in Haskell which is more academic and there's tons of them now on TypeScript because it's way more practical
[`34:42`](https://youtu.be/PQU9o_5rHC4?t=2082) right
[`34:43`](https://youtu.be/PQU9o_5rHC4?t=2083) which is interesting. Yeah, it is interesting, right? It's like TypeScript solves a problem.
[`34:47`](https://youtu.be/PQU9o_5rHC4?t=2087) I guess one thing that's cool, I don't know how many people know, but the terminal is actually one of the most beautiful terminal apps out there and is actually written with React terminal.
[`34:58`](https://youtu.be/PQU9o_5rHC4?t=2098) When I first started building it, you know, like I I did front-end engineering for for a while. So, and I was also like a, you know, I'm I'm sort of like a hybrid, like I do like design and user research and, you know, write code and all this stuff. And we love hiring engineers that are like this. Um, so we just we love generalists. So for me it's like okay, I'm building a thing for the terminal. I'm actually kind of a shitty Vim user. So like how do I build a thing for people like me that um you know are are going to be working in a terminal. And I think just the delight is so important. And I feel like at YC this is something you talk about a lot, right? It's like build a thing that people love. If the product is useful but you don't fall in love with it, that's not great. Um so it kind of has to do both. Designing for the terminal honestly has been hard, right? It's like uh it's like 80 by 100 characters or whatever. you have like 256 colors, you have one font size, you don't have like mouse interactions, there's all this stuff you can't do, and there's all these very hard trade-offs. So, like a little known thing, for example, is you can actually enable mouse interactions in a terminal. So, you can enable like clicking and stuff.
[`35:54`](https://youtu.be/PQU9o_5rHC4?t=2154) Oh, how do you do that in cloud code? I've been trying to figure out how to do this.
[`35:58`](https://youtu.be/PQU9o_5rHC4?t=2158) We don't we don't have it in cloud code because we actually prototyped it a few times and it felt really bad because the trade-off is you have to virtualize scrolling and so there's all these weird trade-offs because like the way terminals work is like there's no DOM, right? It's like there's like anti- escape codes and these kind of weird organically evolved specs since like the 1960s or whatever.
[`36:16`](https://youtu.be/PQU9o_5rHC4?t=2176) Yeah. It feels like BBS's. It's like a BBS door game.
[`36:17`](https://youtu.be/PQU9o_5rHC4?t=2177) Yeah.
[`36:18`](https://youtu.be/PQU9o_5rHC4?t=2178) Oh my god.
[`36:19`](https://youtu.be/PQU9o_5rHC4?t=2179) That's like that's like a great compliment. Yeah. Yeah. Like it should feel like you're discovering
[`36:24`](https://youtu.be/PQU9o_5rHC4?t=2184) Lord of the Red Dragon. It's fantastic. Oh my god.
[`36:26`](https://youtu.be/PQU9o_5rHC4?t=2186) Yeah. But we have we've had to just like discover all these kind of UX principles for building the terminal cuz no one really writes about this stuff. And if you look at the big terminal apps of, you know, like the 80s or 90s or 2000s or whatever, they use like ed curses and they have all these like windows and things like this. And it just looks kind of like janky by modern standards. It just looks too heavy and complicated. And so we had to like reinvent a lot. And you know, for example, something like the terminal spinner, like just like the spinner words, it's gone through probably I want to say like 50 maybe 100 iterations at this point. And probably 80% of those didn't ship. So we tried it, it didn't feel good, move on to the next one. try it, didn't feel good, move on to the next one. Uh, and this was like sort of one of the amazing things about quad code, right? Is like you can write these prototypes and you can just do like 20 prototypes back to back, see which one you like, and then ship that and the whole thing takes maybe a couple hours.
[`37:14`](https://youtu.be/PQU9o_5rHC4?t=2234) Whereas in the past, what you would have had to do is like wen to use origami or framer or something like this. You built like maybe three prototypes, it took like two weeks. It just took much much longer.
[`37:24`](https://youtu.be/PQU9o_5rHC4?t=2244) And so we have this luxury of we have to discover this new thing. We have to build a thing. We don't know what the right endpoint is, but we can iterate there so quickly and that's what makes it really easy and that's what lets us build a product that's like joyous and that people like to use.
[`37:38`](https://youtu.be/PQU9o_5rHC4?t=2258) Boris, you had other advice for for builders and we kept interrupting you because we have so many questions, but
[`37:45`](https://youtu.be/PQU9o_5rHC4?t=2265) I would say um so okay, so maybe two pieces of advice that are kind of weird because it's like about building for the model. So one is uh don't build for the model of today, build for the model of 6 months from now. This is like sort of weird, right? Because like you can't find PMF if the product doesn't work. But actually this is the thing that you should do because otherwise what will happen is you spend a bunch of work you find PMF for the product right now and then you're just going to get leaprogged by someone else um because they're building for the next model and a new model comes out every few months. Use the model, feel out the boundary of what it can do and then build for the model that you think will be the model maybe 6 months from now. I think the second thing is um you know actually in the in the quad code where in the quad code area where we sit we have a framed copy of the bitter lesson on the wall. Um and this is this like rich sutton uh I like everyone should read it if if you haven't uh and the idea is the more general model will always be the more specific model and there's a lot of corlaries to this but essentially what it boils down to is never bet against the model. Uh, and so this is just like a thing to that that we always think about where we could build a feature into cloud code. We could make it better as a product and we call this scaffolding. That's all this code that's not the model itself. But we could also just wait like a couple months and the model can probably just do the thing instead. Um, and there's always this trade-off, right? It's like engineering work now and you can kind of extend the capability a little bit, maybe 10 20% or whatever in whatever domain on this like, you know, like the spider chart of what you're trying to extend. Um, or you can just wait and the next model will do it. So just always always think in terms of this trade-off where where do you actually want to invest and assume that whatever the scaffolding is it's just tech.
[`39:18`](https://youtu.be/PQU9o_5rHC4?t=2358) How often do you rewrite the code ways of uh clock code is every six months with this with this
[`39:24`](https://youtu.be/PQU9o_5rHC4?t=2364) is there scaffolding that you've deleted because you don't need it anymore because the model just improved.
[`39:29`](https://youtu.be/PQU9o_5rHC4?t=2369) Oh so much. Yeah. Like all of quad code has just been written and rewritten and rewritten and rewritten over and over and over. We unhip tools every couple weeks. We add new tools every couple weeks. There's no part of quad code that was around six months ago. It's just constantly rewritten.
[`39:43`](https://youtu.be/PQU9o_5rHC4?t=2383) Would you say most of the code base for current cloud code is only say 80% of it is only less than a couple months old.
[`39:49`](https://youtu.be/PQU9o_5rHC4?t=2389) Yeah, definitely. It might it might even be like less than Yeah, maybe like a couple months. That that feels about right.
[`39:55`](https://youtu.be/PQU9o_5rHC4?t=2395) So it's like the life cycle of code now. That's another alpha is expecting it to be the shelf life to be just couple months.
[`39:59`](https://youtu.be/PQU9o_5rHC4?t=2399) Yeah.
[`40:00`](https://youtu.be/PQU9o_5rHC4?t=2400) For the best founders.
[`40:02`](https://youtu.be/PQU9o_5rHC4?t=2402) Do you see uh Steve Yaggi's uh post about how awesome working at Anthropic is? And I think there's a line in there that says that an anthropic engineer uh currently averages 1,000x more productivity than a Google engineer at Google's peak which is really an insane number honestly like 1,000x like you know we're 3 years ago we were still talking about 10x engineers now we're talking about 1000x on top of a Google engineer in the prime like this is unbelievable honestly. Yeah, I mean internally if you if you look at like technical employees, they all use quad code every day. Um, and even non-technical employees, I think like half the sales team uses quad code. Um, they they've started switching to co-work because it's a little easier to use. It has like a VM, so it's a little bit safer. But yeah, we actually we just pulled a stat and the I think the team doubled in size last year, but productivity per engineer grew something like 70%.
[`40:54`](https://youtu.be/PQU9o_5rHC4?t=2454) It's measured by
[`40:56`](https://youtu.be/PQU9o_5rHC4?t=2456) just like the simplest stupidest measure, pull requests. Um, but we also kind of cross check that against like commits and like uh the lifetime of commits and things like this. And since quad code came out, productivity per engineer at anthropic has grown 150%.
[`41:07`](https://youtu.be/PQU9o_5rHC4?t=2467) Oh my god.
[`41:10`](https://youtu.be/PQU9o_5rHC4?t=2470) Um, and this is crazy because I in my old life I was responsible for code quality at Meta.
[`41:14`](https://youtu.be/PQU9o_5rHC4?t=2474) Um, and I was responsible for the quality of all of our code bases across every product across like you know Facebook, Instagram, WhatsApp, whatever.
[`41:22`](https://youtu.be/PQU9o_5rHC4?t=2482) And one of the things that the team worked on was improving productivity. And back then seeing a gain of something like 2% in productivity that was like a year of work by hundreds of people. And so this like 100% this is just like unheard of just completely unheard of.
[`41:36`](https://youtu.be/PQU9o_5rHC4?t=2496) What drove you to come over to Anthropic? I mean basically as a builder you could go anywhere. What was the moment that made you say like actually this is the set of people or this is the approach. I was living in rural Japan and I was opening up Hacker News every morning and I was reading the news and uh it was all it just started to be like AI stuff at some point and uh I started to use some of these early products and uh I remember like the first couple times that I used it I was just like it just took my breath away. That was like very cheesy to say, but that was actually that was actually the feeling. Like it was just like it was amazing like as a as as a builder, I've just never kind of felt felt this feeling like using these very very early products. That was like in the quad 2 days or you know something like that. And so I I just talking started talking to friends at Labs um just to kind of see what was going on. Um and uh I met Ben man who's one of the founders at uh at Anthropic and uh he just immediately won me over. Um and as soon as I met kind of the rest of the team at an it just won me over and I think I think probably in two ways. So one is it operates as a research lab. Um so the product was teeny teeny tiny. It's really all about building a safe model. That's all that matters. Um and so this idea of just being very close to the model and being very close to development and being not the most important thing because the product isn't anymore. It's just the model is the thing that's the most important. Um that really resonated with me after building product for many years. And then the second thing was just how missiondriven it is. Um like I'm I'm a huge sci-fi reader. My bookshelf is just like filled with sci-fi. And so like I just know how bad this can go.
[`43:11`](https://youtu.be/PQU9o_5rHC4?t=2591) And when I kind of think about what's going to happen this year, it you know it's going to be totally insane. And in the worst case it can go very very bad.
[`43:19`](https://youtu.be/PQU9o_5rHC4?t=2599) Um and so I just wanted to be at a place that really understood that and kind of really internalized that. And at Ant, you know, like if you overhear conversations in the lunchroom or in the hallway, people are talking about AI safety. this is really the thing that everyone cares about more than anything. Um, and so I just wanted to be in a place like that. I I know I know for me personally the mission is just so important.
[`43:40`](https://youtu.be/PQU9o_5rHC4?t=2620) What is gonna happen this year?
[`43:42`](https://youtu.be/PQU9o_5rHC4?t=2622) Okay. So if you think back like six months ago and uh kind of what are the predictions that people are making? So Daario predicted that 90% of the code at Anthropic would be would be written by Quad. This is true. Um for me personally it's been 100% for like since Opus 4.5. Um I just I uninstalled my IDE. I don't edit a single line of code by hand. It's just 100% quad code and Opus. Um and you know I land you know like 20 PR a day every day. If you look at Enthropic overall it ranges between like 70 to 90% uh you know depending on the team. For a lot of teams it's also like 100% for a lot of people it's 100%. And I remember making this prediction back in May when we ged cloud code that you wouldn't need an ID to code anymore. Uh and it was totally crazy to say. I feel like people in the audience gasped
[`44:28`](https://youtu.be/PQU9o_5rHC4?t=2668) because it was such like a silly prediction at the time. But really all it is is like you just like trace the you know the exponential
[`44:34`](https://youtu.be/PQU9o_5rHC4?t=2674) and this is just like so deep in you know the DNA at cuz like you know three of our founders were co-authors of the scaling laws paper they kind of they saw this very early and so this is just like tracing the exponential this is what's going to happen and yes that happened. So continuing to trace the exponential I think what will happen is coding will be generally solved for everyone. Um, and I think today coding is practically solved, you know, for me and I think it'll be the case for everyone. Um, you know, regardless of domain, I think we're going to start to see the title software engineer go away. And I think it's just going to be maybe builder, maybe product manager, maybe we'll keep the title as kind of a vestigial thing, but the work that people do, it's not just going to be coding. It's software engineers are also going to be writing specs. They're going to be talking to users. like this thing that we're starting to see right now in our team where engineers are very much generalists and every single function on our team codes like our PM's code, our designers code, our EM codes, our um like everyone our our finance guy codes like everyone on our team codes. We're going to start to see this everywhere. So this is sort of uh this is kind of like the lower bound if we just continue the trend. The upper bound I think is a lot scarier. Um, and this is something like, you know, we hit ASL4. Um, and this, you know, at anthropic, we talked about these safety levels. ASL3 is where the models are right now. ASL4 is the model is recursively self-improving. Um, and so if this happens, essentially, we have to meet a bunch of criteria before we can release a model. And so the the extreme is that, you know, this happens um or there's some kind of catastrophic misuse like people are using the model to design bioiruses, design zero days, stuff like this. Um, and this is something that we're really really actively working on so that doesn't happen. I think uh it's just been honestly it's just been like so exciting and humbling like seeing how people are using quad code like uh you know I just wanted to build a cool thing and it ended up being really useful uh and that was so surprising and so exciting.
[`46:23`](https://youtu.be/PQU9o_5rHC4?t=2783) My impression from Twitter or just the outside is basically everyone went away over the holidays and then like found out about Claude code and it's just been crazy ever since. Is that how it was for you at like internally? Did you were you having like a nice Christmas break and then came back and like what happened? Well, actually for all of December, I was traveling around. Uh, and I I took a coding vacation. So, we were kind of traveling around and I was just like coding every day. So, that was really nice. Uh, and then I also started to use Twitter at the time cuz like I I worked on Threads back then way back when. So, I've been a Threads user for a while. So, I just like tried to see kind of like other platforms where people are. Yeah. I think for a lot of people they kind of discovered that was the moment where they discovered Opus 4.5. I kind of already knew.
[`47:01`](https://youtu.be/PQU9o_5rHC4?t=2821) Mhm.
[`47:02`](https://youtu.be/PQU9o_5rHC4?t=2822) Uh, and internally quad code's just been on this like exponential tear for many many months now. So that just like it it became even more steep. That's what we saw. And if you look at cloud code now, you know, there was some stat from Mercury that like 70% of startups are you know choosing cloud as their model of choice. There was some other stat from like semi analysis that 4% of all public commits are made by cloud code. um like of all code written everywhere. All the companies, you know, use squad code from like the biggest companies to kind of, you know, smallest startups, you know, like it it wrote it it plotted the course for Perseverance like for like the Mars rover. This is just like this is the coolest thing for me. And we like we even printed posters cuz the team was like, "Wow, this is just like so cool that NASA chooses to use this thing." So, yeah, it's just like it's humbling. Um but it also feels like the very beginning. What's the sort of interaction between uh claude code and then co-work like you know was it a fork of cla code? Was it like you had cla code look at the cloud code and say let's make a new spec for nontechnical people that you know keeps all the lessons and then you know it sort of went off for a couple days and did that. What's the genesis of that and you know where do you think that goes?
[`48:12`](https://youtu.be/PQU9o_5rHC4?t=2892) This is going to be like my fifth time using the word wait and demand. It was just that I mean like we we were looking at Twitter and there was like that one guy that was using quad code to like monitor his tomato plants.
[`48:21`](https://youtu.be/PQU9o_5rHC4?t=2901) Mhm.
[`48:23`](https://youtu.be/PQU9o_5rHC4?t=2903) Uh there was like this other person that was using it to like recover wedding photos off of a corrupted hard drive. There were people that using it for like uh for finance. When we looked internally at anthropic, every designer is using it all the entire finance team at this point is using it. The entire data science team is using it not for coding. People are jumping over hoops to install a thing in the terminal so that they could use this. So we knew for a while that we wanted to build something and so we're experimenting with a bunch of different ideas and the thing that kind of took off was just you know a little cloud code wrapper in a guey in the desktop app and that's all it is. It's just quad code under the hood. It's the same agent.
[`48:55`](https://youtu.be/PQU9o_5rHC4?t=2935) Oh wow.
[`48:58`](https://youtu.be/PQU9o_5rHC4?t=2938) Um and uh Felix and the team and Felix was early Electron contributor. He kind of knows that stack really well and he was hacking on various ideas and uh they they built it in I think something like 10 days. It was it was just like 100% written by quad code. Uh and it just felt ready to release. There was a lot of stuff that we had to build for nontechnical users. So it's a little bit different than a technical audience. Uh it runs in a all the code runs in a virtual machine. Uh there's a lot of delete uh protections for deletion and things like this. There's a lot of permission prompting and kind of other guardrails for users. Um yeah, it was honestly pretty obvious. Boris, thank you so much for making something that uh is taking away all my sleep, but in return, it's making me feel creator mode again, sort of founder mode again. It's been an exhilarating 3 weeks. I like can't believe I waited that long since November to actually get into it. Thank you so much for being with us. Thank you for building what you're building.
[`49:54`](https://youtu.be/PQU9o_5rHC4?t=2994) Yeah, thanks for having me. And uh send bugs.
[`49:59`](https://youtu.be/PQU9o_5rHC4?t=2999) Sounds good.
---
## Sources
- [Inside Claude Code With Its Creator Boris Cherny — Y Combinator — YouTube](https://youtu.be/PQU9o_5rHC4)
- [Y Combinator](https://www.ycombinator.com/)
+462
View File
@@ -0,0 +1,462 @@
# The Secrets of Claude Code From the Engineers Who Built It — Every
Transcript of the interview with Cat & Boris (Claude Code engineers) on the Every podcast, published October 29, 2025.
<table width="100%">
<tr>
<td><a href="../">← Back to Claude Code Best Practice</a></td>
<td align="right"><img src="../!/claude-jumping.svg" alt="Claude" width="60" /></td>
</tr>
</table>
---
## Video Details
- **Guest:** Cat & Boris (Claude Code Engineers, Anthropic)
- **Host:** Every
- **Published:** October 29, 2025
- **YouTube:** [Watch on YouTube](https://youtu.be/IDSAMqip6ms)
---
## Transcript
[`0:02`](https://youtu.be/IDSAMqip6ms?t=2) What made it work really well is that quad code has access to everything that an engineer does at the terminal. Everything you can do, quad code can do. There's nothing in between.
[`0:10`](https://youtu.be/IDSAMqip6ms?t=10) There's actually an increasing number of people internally at anthropic that are using like a lot of credits [music] like spending like over a,000 bucks every month. We see this like power user behavior. This is something that they teach in YC. If you can solve your own problem, it's much more likely you're solving the problem for others. There's this like really old idea in product called latent demand. You build a product in a way that is hackable that is kind of open-ended enough that people can abuse it for other use cases it wasn't really designed for and you build for that cuz you kind of know there's demand for it. Do you think the CLI is the final form [music] factor? Are we going to using cloud code in the CLI primarily in a year or in 3 years or is This podcast is sponsored by Google. Hey folks, I'm Omar, product and design lead at Google DeepMind. We just launched a revamped vibe coding experience in AI [music] Studio that lets you mix and match AI capabilities to turn your ideas into reality faster than ever. Just describe your app and Gemini will automatically wire up the right models and APIs for you. And if you need a spark, hit I'm feeling lucky and we'll help you get started. Head to a.studio/bild. studio/build to create your first app.
[`1:29`](https://youtu.be/IDSAMqip6ms?t=89) Cat Boris, thank you so much for being here.
[`1:30`](https://youtu.be/IDSAMqip6ms?t=90) Thanks for having us.
[`1:32`](https://youtu.be/IDSAMqip6ms?t=92) Yeah. Um, so for people who don't know you, you are the creators of Claude Code. Thank you very much from the bottom of my heart. It's uh I love Cloud Code.
[`1:42`](https://youtu.be/IDSAMqip6ms?t=102) That's amazing to hear. [laughter] That's what we love to hear. Um I Okay, I think the place I want to start is when I first used it. Um, there was like this moment like I think it was around when uh Sonnet 37 came out where I was like I used it and I was like, "Holy this is like a completely new paradigm. It's a a completely new way of thinking about code." And the the big difference was um you went all the way and just eliminated the text editor and you're just like all you do is like talk to the talk to the terminal and and that's that's it. Um, and you know, previous paradigms of AI programming, pre previous harnesses have been like you have a text editor and you have the AI on the side and it's kind of like or it's a tab complete. So, take me through like that decision process that ar that that process of of architecting this new paradigm. How do you how did you think about that?
[`2:36`](https://youtu.be/IDSAMqip6ms?t=156) Yeah, I think the the most important thing is it was not intentional at all. [laughter] Okay.
[`2:42`](https://youtu.be/IDSAMqip6ms?t=162) Uh, we we sort of ended up with it. So at the time when I joined Enthropic um we were still on different teams at the time. Um there was this previous predecessor to quad code. It was called Clyde like CL C cliop. And it was this like research project you know it took like a minute to start up. It was this kind of like really heavy Python thing. It had to like run a bunch of indexing and stuff. And when I joined I wanted to ship my first PR and I hand wrote it like a you know like a noob in a in [laughter] a like I didn't know about any of these tools. U I didn't know any better and then I I put up this PR and um Adam Wolf who was the um manager for our team for a while. He was my ramp up buddy and he just like rejected the PR and he was like you wrote this by hand. What are you what are you [laughter] doing? Use Quide. Um cuz he was also hacking a lot on Quiet at the time. And so I tried Quaid. I gave it the description of the task and it just like one shot at this thing
[`3:39`](https://youtu.be/IDSAMqip6ms?t=219) and this was like you know sonnet 35. So I still had to fix a thing even for this kind of basic task and the harness was super old. So it took like 5 minutes to turn this thing out and just took forever and um but it but it worked and I was just mind-blown that this this was even possible and they just kind of got the gears turning. maybe you don't actually need an IDE.
[`4:02`](https://youtu.be/IDSAMqip6ms?t=242) And then later on I was prototyping using the anthropic API and the easiest way to do that was just building a little app in the terminal cuz that way I didn't have to build a UI or anything. And I started just making a little chat app and then I just started thinking maybe we could do something a little bit like Clyde. So let let me build like a little Clyde and it actually ended up being a lot more useful than that without a lot of work. And I think the biggest revelation for me was when we started to give the model tools. It just started using tools and it was just it was this insane moment. Like the model just wants to use tools. Like we gave it bash and it just started using bash writing apple script to like automate stuff uh in response to questions. And I was like this is just the craziest thing. I've never seen anything like this. Cuz at the time I had only used IDE with like you know like text editing a little like oneline autocomplete, multi-line autocomplete, whatever. Um, so that that's where this came from. It was this kind of convergence of like prototyping but also kind of seeing what's possible in kind of like a very um rough way.
[`5:03`](https://youtu.be/IDSAMqip6ms?t=303) Um, and this thing ended up being surprisingly useful and and I think it was the same for us. I think for me it was like kind of sonnet 4 opus 4. That's where that magic moment was. I was like, "Oh my god, this this thing works."
[`5:17`](https://youtu.be/IDSAMqip6ms?t=317) That's interesting. So like tell me about that that the tool moment because I think that is one of the special things about cloud code is it just writes bash and it's really good at it. And I think a lot of um previous agent architectures or even anyone building agent today, your first instinct might be okay, we're going to give it a find file tool and then we're going to give it a uh open file tool and you you build all these like custom wrappers for you know uh all the different actions you might want the agent to take, but Cloud Code just uses bash and it's like really good at it. So how do you think about um how do you think about what you learned from that? Yeah, I think we're at this point right now where Quad Code actually has a bunch of tools. I think it's like a dozen or something like this. We we actually like add and remove tools most weeks. So, this changes pretty often. Um, but today there actually is a search uh there's a tool for for searching. Um, and we do this for two reasons. One is the UX, so we can show the result a little bit nicer to the user because there's still a human in the loop right now for most tasks. Uh, and the second one is for permissions. So, if you say in your like cloud code like settings.json JSON on this file you cannot read. We we have to kind of enforce this. Uh we enforce it for bash but we can do it a little bit more efficiently for if we have a specific search tool. Um but definitely we want to like unhip tools and kind of keep it simple for the model. Um like last week or two weeks ago we unchipped the OS tool because in the past we needed it but then we actually built a way to enforce this kind of permission system for bash. Um, so in Bash, if we know that you're not allowed to read a particular directory, Quad's not allowed to OS that directory. And because we can enforce that consistently, we don't need this tool anymore. Um, and this is nice because it's a it's a little less choice for Quad. A little less stuff in context.
[`7:01`](https://youtu.be/IDSAMqip6ms?t=421) Got it. And how do you guys split responsibility on the team?
[`7:06`](https://youtu.be/IDSAMqip6ms?t=426) Um, I would say Boris sets the technical direction and has been the product visionary for a lot of the features that we've come out with. I see myself as more of like a supporting role to make sure that um that one that like our pricing and packaging resonates with our users. Um two making sure that we're shephering all our features across the launch process. So from like deciding all right like these are the prototypes that we should definitely ant food to like setting the quality threshold for ant fooding through to communicating that to our end users. And um there's definitely some new initiatives that we're working on that uh I would say historically a lot of quad code has been built bottoms up like Boris and a lot of the core team members have just had these great ideas for to-do list sub agents hooks like all these are bottoms up. As we think about expanding to more services and bring cloud code to our places, I think a lot of those are more like, all right, let's talk to customers. Let's bring engineers into those conversations and prioritize those services and knock them out.
[`8:08`](https://youtu.be/IDSAMqip6ms?t=488) Got it. What is ant fooding?
[`8:10`](https://youtu.be/IDSAMqip6ms?t=490) Oh, ant fooding is
[`8:11`](https://youtu.be/IDSAMqip6ms?t=491) Oh, ant fooding.
[`8:14`](https://youtu.be/IDSAMqip6ms?t=494) Oh, um it it means dog fooding. [laughter] So,
[`8:19`](https://youtu.be/IDSAMqip6ms?t=499) anthropic ant. I [laughter] got it.
[`8:22`](https://youtu.be/IDSAMqip6ms?t=502) Yeah. Our nickname for um internal employees is ant. And so uh ant fooding is our version of dog fooding. Uh internally over I think 70 or 80% of ants uh technical anthropic employees use cloud code every day. And so every time we are thinking about a new feature, we push it out to people internally and we get so much feedback. We have a feedback channel. I think we get a post every five minutes. And so you get really quick signal on whether people like it, whether it's buggy, um or whether uh it's not good and we should unchip it.
[`8:57`](https://youtu.be/IDSAMqip6ms?t=537) You can tell um you can tell that someone that is building stuff is using it all the time to build it. Uh because the the like its ergonomics just makes sense if you're trying to build stuff and that that only happens if you're like ant ant fooding. [laughter] Um I Yeah. Yeah. And I I think that that's a really interesting paradigm for building new stuff like that sort of bottoms up I make something for myself. Um tell me about that.
[`9:23`](https://youtu.be/IDSAMqip6ms?t=563) Yeah. And C cat is also so humble. Um I think cat has a really big role in the product direction also like it comes from everyone on the team and like these specific examples this actually came from everyone on the team like to-do lists and sub aents that was Sid Hooks Dixon shipped that plugins Daisy shipped that.
[`9:38`](https://youtu.be/IDSAMqip6ms?t=578) So like everyone on the team like these ideas come from everyone. Um, and so I think for us like we build this core agent loop and this kind of core experience and then everyone on the team uses the product all the time. Uh, and so everyone outside the team uses the product all the time. And so there's just all these chances to build things that serve these needs. Like for example, like bash mode, you know, like the exclamation mark and you can type in bash commands. This was just like many months ago. I was using quad code and I I was going back and forth between two terminals and just thought it was kind of annoying. Uh, and just on a whim, my asked squad to kind of think of ideas, the thought of this like exclamation mark bash mode. And then I was like, great, make it pink and then ship it. [laughter] It just did it. And like that that's the thing that still kind of persisted. And you know, now you see kind of others also kind of catching on to that.
[`10:24`](https://youtu.be/IDSAMqip6ms?t=624) That's funny. I actually didn't know that. And that's extremely useful because I always have to open up a new tab to like run any bash commands. So you just you just do an exclamation point and then it just like runs it directly instead of filtering it through all all the cloud stuff.
[`10:38`](https://youtu.be/IDSAMqip6ms?t=638) Yeah. And quad code sees the full output too.
[`10:40`](https://youtu.be/IDSAMqip6ms?t=640) Interesting. That's perfect. [laughter]
[`10:42`](https://youtu.be/IDSAMqip6ms?t=642) So anything you see in the cloud code view, cloud code also sees.
[`10:44`](https://youtu.be/IDSAMqip6ms?t=644) Okay, that's really interesting.
[`10:46`](https://youtu.be/IDSAMqip6ms?t=646) And this is kind of a UX thing that we're thinking about. Like in the past tools were built for engineers, but now it's equal parts engineers and model.
[`10:53`](https://youtu.be/IDSAMqip6ms?t=653) And so like as an engineer, you can see the output, but it's actually quite useful for the model also. And this is part of the philosophy also like everything is dual use. Um so for example, the model can also call slash commands. So like you know I have a slash command for slashcomit where I run through kind of a few different steps like diffing and generating a reasonable commit message and and this kind of stuff. I run it manually but also Claude can run this for me. Uh and this is pretty useful because we get to share this logic. We get to kind of define this tool and then we we both get to use it.
[`11:24`](https://youtu.be/IDSAMqip6ms?t=684) Yeah. What are the differences in uh designing tools that are dual use from designing tools that are you know used by one or the other? Surprisingly, it's the same.
[`11:32`](https://youtu.be/IDSAMqip6ms?t=692) Okay.
[`11:34`](https://youtu.be/IDSAMqip6ms?t=694) So far.
[`11:36`](https://youtu.be/IDSAMqip6ms?t=696) Yeah. I I I sort of feel like this kind of elegant design for humans translates really well to the models.
[`11:41`](https://youtu.be/IDSAMqip6ms?t=701) So, you're just thinking about what would make sense to you and the model generally, it makes sense to the model, too, if it makes sense to you.
[`11:49`](https://youtu.be/IDSAMqip6ms?t=709) Yeah. I think one of the really cool things about Cloud Code being um a terminal UI and what made it work really well is that Cloud Code has access to everything that an engineer does at the terminal. And I think when it comes to whether the tool should be dual use or not, I think making them dual use actually makes the tools a lot easier to understand. It just means that okay, everything you can do, cloud code can do. There's nothing in between.
[`12:14`](https://youtu.be/IDSAMqip6ms?t=734) Yeah, that's interesting. Yeah, there there are a couple of those decisions. So, um no no code editor, it's in the terminal, so it has access to your files. Um, and it's it's on your computer versus like in the cloud in a virtual machine. So you get like repeated you you get to use it in a repeated way where you can like you know build up your cloud MD file or you know like all all like build slash commands and all that kind of stuff where it becomes very composable um and extensible [snorts] from a very simple starting point. And I'm curious about how you think about, you know, for for people who are thinking about, okay, I want to build an agent, I want to build probably not cloud code, but like something else, how you get that that simple package that then can extend and be really powerful over time.
[`13:07`](https://youtu.be/IDSAMqip6ms?t=787) For me, I I start by just thinking about it like developing any kind of product where you have to solve the problem for yourself before you can solve it for others. And like this is something that they teach in YC is you have to start with yourself. So like if you if you can solve your own problem, it's much more likely you're solving the problem for others. And I I think for coding starting locally is the reasonable thing and you know now we have cloud code on the web. So you can also use it with a virtual machine and um you know you can use it in a remote setting and this is super useful when you're on the go you want to take that from your phone
[`13:37`](https://youtu.be/IDSAMqip6ms?t=817) and and this is sort of we we started proving this out kind of a step bat a time
[`13:43`](https://youtu.be/IDSAMqip6ms?t=823) where you can do atcloud in GitHub and uh I use this every day like on the way to work I'm like at a red light I probably shouldn't be doing this but I'm like you know on GitHub at a red light and then I'm like at claude you know fix this issue or whatever and so it's it's just real useful to be able to control it from your phone um and this kind proves out this experience. I I don't know if this necessarily makes sense for every kind of use case. For coding, I think starting local is right. Um I don't know if this is true for everything, though.
[`14:07`](https://youtu.be/IDSAMqip6ms?t=847) Got it. What are the slash commands you guys use?
[`14:11`](https://youtu.be/IDSAMqip6ms?t=851) Slashprit. [laughter]
[`14:12`](https://youtu.be/IDSAMqip6ms?t=852) Yeah.
[`14:15`](https://youtu.be/IDSAMqip6ms?t=855) Um yeah, it it's I I think the pritcomand makes it a lot faster for claw to know exactly what bash commands to run in order to make a commit.
[`14:24`](https://youtu.be/IDSAMqip6ms?t=864) And what does the prit slash command do for people who are unfamiliar? Oh, it it just tells it like exactly how to make a commit. Okay.
[`14:33`](https://youtu.be/IDSAMqip6ms?t=873) Um and you can like dynam you can say like, okay, these are the three bash commands that need to be run.
[`14:37`](https://youtu.be/IDSAMqip6ms?t=877) Got it. And and what's pretty cool is also we have um this kind of templating system built into slash commands. So we actually run the bash commands ahead of time. They're like embedded into the slash command. Um and you can also pre-allow certain tool invocations. So for that slash command we say allow um you know get commit get push gh and so you don't get asked for permission after you run the slash command because we have like a permission uh based security system. Um and then also it uses haik coup which is pretty cool. Um so it's kind of a cheaper model and faster. Um yeah and for me I I use like commit uh commit PR uh feature dev we use a lot. So like sid created this one. It's kind of cool. So it kind of like walks you through step by step um building something. So we prompt quad to like first ask me how to what exactly I want like build the specification
[`15:27`](https://youtu.be/IDSAMqip6ms?t=927) and then um you know kind of like build like a detailed plan and then make a to-do list walk through step by step. So it's kind of like more structured feature development
[`15:35`](https://youtu.be/IDSAMqip6ms?t=935) and then I think the last one that we probably use a lot so we use like security review for all of our PRs and then also code review. Um so like quad does all of our code review internally at anthropic. Um, you know, there's still a human approving it, but quad does kind of the first step in code review. That's just a slashcode review command.
[`15:51`](https://youtu.be/IDSAMqip6ms?t=951) Got it. Yeah. What are the things I would love to go deeper into like the how do you make a good plan? So, the sort of the feature dev thing because I think there's a lot of like little tricks that um I'm starting to find or people at every start starting to find that work and I'm curious like what what are things that that we're missing. So for example, one um step in the one unintuitive step of the of the you know plan development process is even if I don't exactly know what the thing that needs to be built is I just have like a little sentence in my mind like I want feature X I have Claude just like implement it just without giving it anything else and I see what it does and that helps me understand like okay here's actually what I mean because it made all these different mistakes or like it it did something that I didn't expect that might be And then I use that like the learning from the sort of throwaway development. I just clear it out. And then that helps me write a better plan spec for the actual feature development, which is something that you would never do before because it'd be too expensive to just like yolo send an engineer on a feature that you hadn't actually speced out. But because you have cloud going through your codebase and doing stuff, you can like learn stuff from it. Um that helps inform the actual plan that you make.
[`17:04`](https://youtu.be/IDSAMqip6ms?t=1024) Yeah. I feel maybe I I can start and I'm curious how you use it too.
[`17:07`](https://youtu.be/IDSAMqip6ms?t=1027) I think there's like a few different modes maybe for me like one one is prototyping mode.
[`17:12`](https://youtu.be/IDSAMqip6ms?t=1032) So like traditional engineering prototyping you want to kind of build the simplest possible thing that touches all the systems just so you can kind of get a vague sense of like what are the systems there's unknowns and just to kind of trace through everything.
[`17:24`](https://youtu.be/IDSAMqip6ms?t=1044) Um and so I I do the exact same thing as you Dan like Claude just does the thing and then I see where it messes up and then I'll ask it to just throw it away and do it again. So just hit escape twice, go back to the old checkpoint and then try again. I think there's also maybe two other kinds of tasks. So one is just things that quad can one-shot and I feel pretty confident it can do it. So I'll just tell it and then I'll just go to a different tab and I'll I'll shift tap to auto accept and then just go do something else or go to another one of my quads and tend to that while it does this.
[`17:54`](https://youtu.be/IDSAMqip6ms?t=1074) Um but also there's this kind of like harder feature development. So these are you know things are maybe in the past it would have taken like a few hours of engineering time and for this usually I would I'll shift tap into plan mode and then align on the plan first before it even writes any code. Um and and I think what's really hard about this is the boundary changes with every model and it in kind of a surprising way where the newer models they're more intelligent so the boundary of what you need plan mode for got pushed out like a little bit
[`18:21`](https://youtu.be/IDSAMqip6ms?t=1101) like before you used to need to plan now now you don't. And I think it's this general trend of like stuff that used to be scaffolding with a more advanced model, it gets pushed into the model itself and the model kind of tends to subsume everything over time. Yeah. How do you think about like building a agent harness that isn't just going to like you're you're not spending a bunch of time um building stuff that is just going to be subsumed into the model in 3 months when the new cloud comes out? like, yeah, how do you how do you know what to build versus what to just say it doesn't work quite yet, but next time it's going to work, so we're not going to spend time on it.
[`18:57`](https://youtu.be/IDSAMqip6ms?t=1137) I think we build most things that we think would improve Cloud Code's capabilities, even if that means we'll have to get rid of it in 3 months. If anything, we hope that we will get rid of it in three months.
[`19:09`](https://youtu.be/IDSAMqip6ms?t=1149) I think for now, we just want to offer the most premium experience possible and so we're not too worried about throwaway work. H
[`19:17`](https://youtu.be/IDSAMqip6ms?t=1157) interesting. Yeah. And an example of this is something like even like plan mode itself. I think we'll probably un ship it at some point when Quad can just figure out from your intent that you probably want to plan first. Um or you know, for example, I just deleted like 2,000 tokens or something from the system prompt yesterday just cuz like Sonnet 45 doesn't need it anymore. Um but Opus Opus 41 did need it. What about um you know in the case where uh the the latest frontier you know model doesn't need it but you know you're trying to figure out how to make it more efficient because you have so many users that you know you're maybe you you're not going to use Opus or Sonnet 45 for everything. Maybe you're going to use Haiku. So there's a trade-off between having a more um elaborate harness for Haiku versus just like not spending time on it using Sonnet eating the cost and working on more Frontier type stuff. In general, we've positioned Quad Code to be a very premium offering. So, our north star is making sure that it works incredibly well with the absolutely most powerful model we have, which is Sonnet 45 right now.
[`20:20`](https://youtu.be/IDSAMqip6ms?t=1220) Um, we are investigating how to make it work really well for like future generations of smaller models, but it's um it's not the top priority for us.
[`20:29`](https://youtu.be/IDSAMqip6ms?t=1229) Okay. What do you think about um you know one thing that I notice is we get models um often and thank you very much for this. We get models a lot before they come out and it's our job to kind of figure out is it any good and over the last six months when I'm testing claude for example in the claude app with a new frontier model it's actually very hard to tell whether it's how whether it's better immediately. Um, but it's really easy to tell in cloud code because the the harness matters a lot for the performance that you get out of the model. And you guys have the benefit of building cla or building cloud code inside of the um inside of enthropic. So there's like a much tighter integration between um the fundamental like model training and the harness that you're building and and they seem to kind of like really impact each other. So how does that how does that work internally and and um what are the benefits you get from having that like tight integration?
[`21:25`](https://youtu.be/IDSAMqip6ms?t=1285) Yeah, I think the biggest thing is like researchers just use this and so you know as they see what's working, what's not, they can they can improve stuff. Um we do like a lot of eval to kind of communicate back and forth and understand where exactly the model's at. Um, but yeah, there's this frontier where you need to give the model a hard enough task to really push the limit of the model. And if you don't do this, then all models are kind of equal. But if you give it a pretty hard task, you can you can tell the difference.
[`21:55`](https://youtu.be/IDSAMqip6ms?t=1315) What sub aents do you use?
[`21:57`](https://youtu.be/IDSAMqip6ms?t=1317) Um, I I have a few. I have like a planner sub agent that I use. I have a code review sub aent. Code review is actually something where sometimes I use a sub agent, sometimes I use a slash command. So usually in CI to slash command, but in synchronous use I use a sub aent for the same thing.
[`22:14`](https://youtu.be/IDSAMqip6ms?t=1334) um it's a good question. Yeah, maybe it's like a matter of taste. Yeah, I don't know. I don't know. Um I think it's maybe when you're running synchronously, it's kind of nice to fork off the the context window a little bit because all the stuff that's going on in the code review, it's not relevant to what I'm doing next. But in CI, it just doesn't matter.
[`22:32`](https://youtu.be/IDSAMqip6ms?t=1352) Are you ever spawning like 10 sub agents at once? And for what?
[`22:36`](https://youtu.be/IDSAMqip6ms?t=1356) For me, I do it mostly for like big migrations. Okay,
[`22:40`](https://youtu.be/IDSAMqip6ms?t=1360) this like the big thing. Um, actually we have so this like coder slash command that we use there's a bunch of sub aents there and so one of the steps is like find all the issues and so there's one sub agent that's like checking for quadmd compliance. There's another sub agent that's looking through git history to see what's going on. Another sub aent that's looking for kind of obvious bugs and then we do this like kind of dduping quality step after. So they find a bunch of stuff. A lot of these are false positives and so then then we spawn like five more sub aents and these are all just like checking for false positives. And in the end, the result is awesome. It finds like all the real issues without the false issues.
[`23:13`](https://youtu.be/IDSAMqip6ms?t=1393) That's great. I actually do that. Um, so one of my non-technical cloud code use cases is um expense filing. So like when I'm I'm in SF right now, so like I have all these expenses. And so I built this little cloud project that uh in in cloud code that um it uses uh one of these, you know, finance APIs to just download all my credit card transactions. And then it uh decides like these are probably the expenses that I'm going to have to like file. And then I have two sub agents, one that represents me and one that represents the company. And they like do battle to like figure out like what's the proper um like actual set of expenses. [laughter] uh it's like an auditor sub agent and like you know pro Dan sub agent. So um yeah that kind of thing the the sort of like opponent processor uh pattern seems to be like an interesting one.
[`24:00`](https://youtu.be/IDSAMqip6ms?t=1440) Yeah. Yeah. It's it's it's it's cool. I I feel like when sub aents were first becoming a thing actually what inspired us there's like a Reddit thread a while back where someone made sub agents for like there was like a front end dev and a backend dev and like a think it was like a designer
[`24:11`](https://youtu.be/IDSAMqip6ms?t=1451) testing dev
[`24:13`](https://youtu.be/IDSAMqip6ms?t=1453) testing dev like there was like a PM sub agent and this is like you know it's cute like it feels like a little maybe too anthropomorphic um maybe maybe there's something to this but I I think like the value is actually like the uncorrelated context windows where you have like these two context windows that don't know about each other and this is kind of interesting um and you tend to get better results this way. What about you? Do you have any interesting sub agents you use?
[`24:35`](https://youtu.be/IDSAMqip6ms?t=1475) So, I've been tinkering with one um that is really good at front-end testing. So, it uses Playright to like see all right, what are like all the errors that are client side and pull them in and try to test more steps of the app. Um, it's not totally there yet, but I'm seeing signs of life and I think it's the kind of thing that we could potentially um, bundle in one of our plugins marketplaces.
[`25:02`](https://youtu.be/IDSAMqip6ms?t=1502) Yeah. Um, definitely. I I' I've used something like that just with Puppeteer and just like watching it build something and then open up the browser and then be like, "Oh, I need to change this." It's like this is like, "Oh my god."
[`25:12`](https://youtu.be/IDSAMqip6ms?t=1512) Yeah. It's really cool.
[`25:13`](https://youtu.be/IDSAMqip6ms?t=1513) It's really cool. I think I think we're starting to see the beginnings of this like massive like multi- massive sub aents. I I don't know what they call this like swarms or something like that. There's a bunch of people there's actually an increasing number of people internally at anthropic that are using like a lot of credits every month like you know like spending like over a thousand bucks every month. Um and this like this percent of people is growing actually pretty fast. And I think the common use case is like code migration. And so what they're doing is like framework A to framework B. uh there's like the main agent, it makes a big to-do list for everything and then just kind of map produce over a bunch of sub agents. So you instruct quad like yeah like start 10 agents and then just go like you know 10 at a time and just migrate all all the stuff over.
[`25:53`](https://youtu.be/IDSAMqip6ms?t=1553) That's interesting. What would be like a concrete example of the kind of migration that you're talking about?
[`25:58`](https://youtu.be/IDSAMqip6ms?t=1558) I think the most classic is like lint rules.
[`26:00`](https://youtu.be/IDSAMqip6ms?t=1560) So there's like you know there's some kind of lint rule you're rolling out. There's no autofixer because it's like you know like as an analysis can't really it's kind of too simplistic for it. Um I think other stuff is like framework migrations like um we just migrated from like one testing framework to a different one. That's a pretty common one where it's super easy to verify the output.
[`26:19`](https://youtu.be/IDSAMqip6ms?t=1579) One of the things I found is and this is both for project projects inside of every and then just open source projects. It's like if you're someone building a product and you want to build a feature that's um been done before. So maybe like an an example that people might need to implement a bunch is like memory. How do you do memory? Um because we have a bunch of different products internally, you can just like spawn cloud sub agents to be like how do these three other products do it? And there's like possibility for just like tacit code sharing where you don't need to like have an API or you don't need to like ask ask anyone. You can just be like how does how do we do this already? And then use the best practices to um uh to uh build your own. And you can also do that with open source because there's like tons of open source projects where people are like you know they've been working on memory for like a year and it's like really really good. You be like what are the patterns that um people have figured out and which ones do I want to implement?
[`27:10`](https://youtu.be/IDSAMqip6ms?t=1630) Totally. You can also connect your version control system. If you've built a similar feature in the past, cloud code can use those APIs like query GitHub directly and find how people implemented a similar feature in the past and read that code and um copy the relevant parts.
[`27:27`](https://youtu.be/IDSAMqip6ms?t=1647) Yeah. Is there um have you found any use for like log files of okay you know here's here's the full history of like how I implemented it and like is that important to give to claude and and and how are you how are you um implementing that or making it useful for it?
[`27:44`](https://youtu.be/IDSAMqip6ms?t=1664) Some people swear by it. Uh there are some people at anthropic where for every task they do, they tell cloud code to write a diary entry in a specific format that just documents like what did it do, what did it try, why didn't it work, and then they even have these agents that like look over the past memory and synthesize it into observations.
[`28:02`](https://youtu.be/IDSAMqip6ms?t=1682) I think this is like the starting budding
[`28:06`](https://youtu.be/IDSAMqip6ms?t=1686) like there's like something interesting here that we could productize.
[`28:10`](https://youtu.be/IDSAMqip6ms?t=1690) Um but it's a new emerging pattern that we're seeing that works well. I think the hard thing about like oneshotting memory from just one transcript is that it's hard to know how relevant a specific instruction is to all future tasks. Like our canonical example is if I say make the button pink, I don't want you to remember to make all buttons pink in the future. And so I think um synthesizing memory from a lot of logs is a is a way to um find these patterns more um consistently. It seems like you probably need like there's some things where you're going to know um you'll be able to summar like synthesize or summarize in this sort of like top down way like this this will be useful later and and you'll you'll know the right level of abstraction at which it might be useful but then there's also a lot of stuff where it's like you actually you know any given like commit log like make the button pink it could be useful for kind of an infinite number of different reasons um that you're not going to know beforehand. So you also need the the model to be able to look up all similar past, you know, commits and surface that at the right time. Is that something that you're also thinking about? Yeah, I think I think there could there could be something like that. And maybe I think one way to see it is this kind of like traditional memory storage work like like mex like kind of stuff where you just want to like put all the information into the system and then it's kind of a retrieval problem problem after that. Um, yeah. I think as the model also gets smarter, it naturally I've seen it start to naturally do this also with Sonnet 45 where if it's stuck on something, it'll just naturally start looking like we talked about before like using bash spontaneously. So just like look through git history and be like, "Oh, okay. Yeah, this is kind of an interesting way to do it."
[`29:56`](https://youtu.be/IDSAMqip6ms?t=1796) Yeah. One of the things that like we were talking before we started recording, one of the um things that we're doing inside of every like I feel like it has really um change the way that we do engineering because everyone is cloud code build like CLI build and um we have this engineering paradigm that we call compounding engineering where in normal engineering every feature you add it makes it harder to add the next feature and in compounding engineering your goal is to make the next feature easier to build um from the feature that you just added. And the the way that we do that is we try to um codify all the learnings from um from everything that we've done to build the feature. So like you know how did we make the plan and and what parts of the plan needed to be changed or like when we started testing it like what what issues did we find? What are the things that we missed? Um and then we codify them back into all the prompts and all the sub agents and all the slash commands so that the next time when someone does something like this uh it catches it and that makes it easier. And that's why for me, for example, I can like hop into one of our code bases and start like being productive even though I'm I don't know anything about how the code works because we have this like builtup memory system of um of all the stuff that we've learned as we've implemented stuff, but we've had to build that ourselves. I'm curious, are you working on that kind of loop so it the cloud code does that automatically?
[`31:15`](https://youtu.be/IDSAMqip6ms?t=1875) Yeah, we're we're starting to think about it. Uh it's funny. We we're just uh we we heard the same thing from Fiona. She just joined the team. And you know, she she's our she's our manager. She hasn't coded in like 10 years, something like that. And she was landing PRs on her first day. And she was like, "Yeah, like not only did I kind of I forgot how to code and quad code kind of made it super easy to just get back into it,
[`31:39`](https://youtu.be/IDSAMqip6ms?t=1899) but also I didn't need to ramp up on any context because I kind of knew all this." And I think a lot of it is about like when people put up poll requests for quad code itself and I think our customers tell us that they do like similar stuff pretty often. Um if you see a mistake I'll just be like add quad add this to quad MD so that the next time it just knows this automatically and you you can kind of like instill this memory in kind of a variety of ways. So you can say like at quad add it to quadmd. You can also say add quad write a test. You know, that's like easy way to make sure this doesn't regress. And I don't feel bad asking anyone to write tests anymore, right? It's just like super easy. And like I think probably close to 100% of our tests are just written by Quad. And if they're bad, we just won't commit it. And then the good ones stay committed. Um, and then also I think lint rules are a big one. So for stuff that's enforced pretty often, we actually have a bunch of internal lint rules. Claude writes 100% of these. Um, and this is mostly just like at Claude in a PR write write this lint rule. And yeah, there's sort of this problem right now about like how how do you do this automatically? And I think generally how like Cat and I think about it is we see this like power user behavior and the first step is how do you enable that by making the product hackable so the best users can figure out how to do this cool new thing
[`32:53`](https://youtu.be/IDSAMqip6ms?t=1973) but then really the hard work starts of like how do you take this and bring it to everyone else. Um, and for me, I I kept myself in the everyone else bucket. Like, you know, I don't really know how to use Vim. Like, I don't have this like crazy like T-box setup. So, I have like a pretty vanilla setup. So, if you can make a feature that I'll use, it's a pretty good indicator that like other kind of average engineers will use it. That is interesting. Like, tell me about that because like that's something I think about all the time is um making something that is extensible and flexible enough that power users can find like novel ways to use it that you would not have even dreamed of. But it's also simple enough that anyone can use it and it's and they can be productive with it and you can you can kind of pull what the power users find back into like the basic experience. Like how do you think about making those design and product decisions so that you enable that?
[`33:41`](https://youtu.be/IDSAMqip6ms?t=2021) In general we think that like every engine environment is a little bit different from the others and so it's really important that every part of our system is extensible. Um so everything from your status line to adding your own slash commands through to hooks which let you um insert a bit of determinism at pretty much any step in quad code. So we think these are the these are like the basic building blocks that we give to every engineer that they can play with. um for plugins. Plugins is actually our um so it was built by Daisy on our team and this is this is our attempt to make it a lot easier for the average user like us um to bring these slashcomands and hooks into our workflows. And so what plugins does is it lets you browse existing MCP servers, existing hooks, existing plugins and just like or sorry existing like sash commands and just let you write one command in quad code to pull pull that in for yourself.
[`34:38`](https://youtu.be/IDSAMqip6ms?t=2078) There's this like really old idea in product called latent demand which I think is probably the main way that I personally think about product and like thinking about what to build next is it's a super simple idea. It's you build a product in a way that is hackable that is kind of open-ended enough that people can abuse it for other use cases it wasn't really designed for. Then you see how people abuse it and then you build for that cuz like you you kind of know there was demand for it,
[`35:00`](https://youtu.be/IDSAMqip6ms?t=2100) right?
[`35:02`](https://youtu.be/IDSAMqip6ms?t=2102) Um and like you know when I when I was at Meta, this is how we built kind of all the big products. I think almost every single big product had this nugget of latent demand in it. um you know like for example something like Facebook dating it came from this idea that when uh we looked at who looks at people's profiles I think 60% of views were between people of opposite gender so kind of like traditional setup that were not friends with each other and so we're like oh man okay maybe there's like maybe if we like launch a dating product we can kind of harness this demand that exists
[`35:32`](https://youtu.be/IDSAMqip6ms?t=2132) that's interesting
[`35:34`](https://youtu.be/IDSAMqip6ms?t=2134) and for you know marketplace it was pretty similar I think it was like 40% of posts in Facebook groups at the time were by sell posts and so I Okay, people are trying to use this product to buy themselves. We just build a product around it that's probably going to work. And so we think about it kind of similarly, but also we have the luxury of building for developers and developers love hacking stuff and they love customizing stuff and it's like as a user of our own product, it makes it so fun to build and and use this thing. Um, and so yeah, like like I said, we just build the right extension points. We see how people use it and that kind of tells us what to build next. Like for example, we got all these user requests where people were like, "Dude, Quad Code is asking me for all these permissions and I'm out here getting coffee. I don't know that it's asking me for permissions. How could I just get it to like ping me on Slack?" And so we built hooks. Uh Dixon built hooks um so that people could get pinged on Slack and you could get pinged on Slack for anything that you want to get pinged on Slack for. Um, and so it was very much like people really wanted the ability to do something. We didn't want to build the integration ourselves. And so we we exposed hooks for people to do that.
[`36:41`](https://youtu.be/IDSAMqip6ms?t=2201) The thing that makes me think of is um you you recently um released you kind of moved or rebranded how you talk about cloud code to be this like more general purpose agent SDK. Is that was that driven by some latent demand where you you sort of saw there's like a more general purpose use case for what you built?
[`37:00`](https://youtu.be/IDSAMqip6ms?t=2220) We realized that similar to how you were talking about using cloud code for things outside of coding, we saw this happen a lot like um we get a ton of stories of people who are using cloud code to like help them write a blog and like manage all the like data inputs and take a first pass in their own tone. Um we find people building like email assistants on this. Um I use it for a lot of just like market research. Um because at the core it's like an agent that can just go on for an infinite amount of time as long as you give it a concrete task and it's able to fetch the right underlying data. So one of the things I was working on was I wanted to look at all the companies in the world and how many engineers they had and to create a ranking. And this is something that quad code can do even though it's not a traditional coding use case. So we realized that like the underlying primitives were really general as long as you give as long as you have like an agent loop that can continue running for a long period of time and you're able to like access the internet and write code and run code pretty much you can if you squint you can kind of build anything on it. Mhm.
[`38:09`](https://youtu.be/IDSAMqip6ms?t=2289) And and I think like by at the point where we like rebranded it so like from the quad code SDK to the quad Asian SDK, there was already like many thousands of companies using this thing and a lot of those use cases were not about coding. So it's like both both internally and externally. We kind of saw that
[`38:26`](https://youtu.be/IDSAMqip6ms?t=2306) like health assistants, like financial analysts, legal assistance. Um it was pretty broad.
[`38:33`](https://youtu.be/IDSAMqip6ms?t=2313) Yeah. What are the coolest ones? I feel like actually you you had Noah Brier on the the podcast recently. I thought like the obsidian like kind of mind mapping notekeeping use case is really cool. It's funny. It's insane how many people use it for this [laughter]
[`38:47`](https://youtu.be/IDSAMqip6ms?t=2327) particular combination. Uh I think some other like some coding or kind of coding adjacent use cases that are kind of cool is um we have this like issue tracker for quad code. The team's just like constantly underwater like trying to keep up with all the issues coming in. There's just so many. And so I quad ddupes the issues and it automatically finds duplicates and it's extremely good at it. It also does first pass resolution. So usually when there's an issue it'll um proactively put up a PR internally and this is a new uh thing that Enigo on the team built. Um so this is pretty cool. Uh there's also like on call and kind of collecting signals from other places like getting like sentry logs and getting like logs from BigQuery and kind of collating all this. um plus just really good at doing this because it's all just bash in the end, right?
[`39:29`](https://youtu.be/IDSAMqip6ms?t=2369) And so these are all kind of these internal use cases that that I saw.
[`39:34`](https://youtu.be/IDSAMqip6ms?t=2374) Is it um so when it's you know collating logs or um you doing issues is that like you have clouds like continually running in the background and is that something that you're building for?
[`39:43`](https://youtu.be/IDSAMqip6ms?t=2383) Um it gets triggered for that particular one. It gets triggered whenever a new issue is filed. So it runs once but it can choose to run for as long as it needs.
[`39:52`](https://youtu.be/IDSAMqip6ms?t=2392) Got it. What about the idea of clouds always running? Oo, proactive quads. I think it's definitely where we want to get to. U I would say right now we're very focused on making quad code incredibly reliable for like individual tasks. And you know, if you think about like if you think about like multi-line autocomplete and then like single turn agents and then now we're working on like quad code that can complete tasks. I feel like if you trace this curve eventually you go to even higher levels of abstraction like even more complicated tasks and then hopefully the next step after that is a lot more productivity. So just understanding what your team's goals are what your goals are being able to say hey I think you probably want to try this feature and here's a first pass at the code and here are the assumptions I made and are these correct?
[`40:41`](https://youtu.be/IDSAMqip6ms?t=2441) I can't wait. Um and I think probably right after that is um Claude is now Um,
[`40:50`](https://youtu.be/IDSAMqip6ms?t=2450) that's not in the plan. [laughter]
[`40:52`](https://youtu.be/IDSAMqip6ms?t=2452) So, everyone on the team was like super excited that uh we were we were talking today and they gave me a bunch of questions and I want to make sure I I hit all the questions. Um, uh, oh, here's a good one. Why did you choose agentic rag over vector search in your architecture? And are like vector embeddings uh still relevant? Um so actually initially we did use vector embeddings. Um they're just a really tricky to maintain because you have to continuously reindex the code and they might get out of date and you have local changes. So those need to make it in. And then as we thought about what does it feel like for an external enterprise to adopt it, we realized that this exposes a lot more surface area and like security risk. Um we also found that actually cloud code is really good and cloud models are really good at agentic search. So um you can get to the same accuracy level with agentic search and it's just a much cleaner deployment story.
[`41:51`](https://youtu.be/IDSAMqip6ms?t=2511) H that's really interesting.
[`41:53`](https://youtu.be/IDSAMqip6ms?t=2513) Um if you do want to bring semantic search to quad code, you can do so via an MCP tool. So if you want to manage your own index and expose an MCP tool that lets Quad Code call that, that that would work. What do you think are the top MCPS to use with cloud code?
[`42:09`](https://youtu.be/IDSAMqip6ms?t=2529) Puppeteer and Playright are pretty high up there.
[`42:11`](https://youtu.be/IDSAMqip6ms?t=2531) Definitely. Yeah.
[`42:13`](https://youtu.be/IDSAMqip6ms?t=2533) Century has a really good one. Asana has a really good one. Hm. Do you think that there are um any any power user tips that you see people inside of anthropic or you know other people who are you know big power you know inside of organizations that are big cloud code power users that people don't know about but they should. Um, one thing that QuadCo doesn't naturally like to do, but that I personally find very useful is, um, QuadCo doesn't naturally like to ask questions, but you know, if you're brainstorming with a thought partner, a collaborator, usually you do ask questions back and forth to each other. And so, this is one of the things that, um, I like to do, especially in plan mode. I'll just tell Cloud Code like, "Hey, we're just brainstorming this thing. Um, please ask me questions if there's anything you're unsure about." um I want you to ask questions and it'll do it. And I think that actually helps you arrive at a better answer
[`43:11`](https://youtu.be/IDSAMqip6ms?t=2591) there. There's like there's also like so many tips that we can share. I think like there there's a few really common mistakes I see people make. One is like like you said like not using plan mode enough. This is this is just super important. And I think this is people that are kind of new to a coding. They kind of assume this thing can do anything and it can't. It's like not that good today and it's going to get better but today it can oneshot some tests. can't one-shot most things. Um, and so you kind of have to understand the limits and you have to understand like where you get in the loop. And so [snorts] like something like plan mode, it can like two 3x success rates pretty easily if you like land on the plan first. Um, other stuff that I've seen power users do really well is companies that have really big deployments of quad code and now um, you know, luckily there's a lot of these companies so we can kind of learn from them. Uh having settings JSON that you check into the codebase is really important because you can use this to pre-allow certain commands so you don't get permission prompted every time and also to block certain commands. Let's say you don't want web fetch or whatever and this way as an engineer I don't get prompted and um I can check this in and share it with the whole team so everyone gets to use it.
[`44:16`](https://youtu.be/IDSAMqip6ms?t=2656) I I get around that by just using dangerous they skip permissions. [laughter]
[`44:21`](https://youtu.be/IDSAMqip6ms?t=2661) Yeah, we kind of we kind of have this here but we don't you know we don't recommend it. It's like it's a model, you know, it can do it can do weird stuff. Um, I think another kind of cool use case that we've seen is people using stop hooks for interesting stuff. So stop hook runs whenever the turn is complete. So like the assistant did some tool calls back and forth with you know whatever and uh it's done and it returns control back to the user then we run the stop hook and so you can define a stop hook that's like um if the tests don't pass return the text keep going and essentially it's like you can just like make the model like keep going until the thing is done and this is just like insane when you combine it with the SDK and this kind of programmatic usage
[`45:00`](https://youtu.be/IDSAMqip6ms?t=2700) you can you know this is a stochcastic thing it's a nondeterministic thing but with scaffolding you can get these determin deterministic outcomes.
[`45:09`](https://youtu.be/IDSAMqip6ms?t=2709) So you guys started this sort of CLI, this CLI paradigm shift. Um, do you think the CLI is the final form factor? Are we are we going to using cloud code in the CLI primarily in a year or in three years, or is there something else that's better?
[`45:23`](https://youtu.be/IDSAMqip6ms?t=2723) I mean, it's not the final form factor, but we are very focused on making sure the CLI is like the most intelligent that we can make it and that's as customizable as possible. you can talk about the next form factors.
[`45:38`](https://youtu.be/IDSAMqip6ms?t=2738) Yeah, I mean [laughter] cat C's asking me to talk about because no one knows like this this stuff's like it's just moving like so fast, right? Like no no one knows what these form factors are. Like right now I think our team is in experimentation mode. So we have uh CLI then we came out with the ID extension. Now we have a new ID extension that's like a guey. It's a little more accessible. Um we have add quad and github so you can just add quad anywhere. Um, now there's at quad, there's quad on web and on mobile, so you can use it on any of these places. Um, and we're just in experimentation mode, so we're trying to figure out what's next. I think like if we kind of zoom out and see where this stuff is headed. I think one of the big trends is longer periods of autonomy. And so with every model, we kind of time how long can the model just keep going and do tasks autonomously and just, you know, in dangerous mode in a container, keep autocompacting until the task is done. And now we're on the order of like double digit hours. I think it's like the last model is like 30 hours, something like this. And I, you know, the next model is going to be days. And as you think about kind of parallelizing models, um there's kind of a bunch of problems that come out of this. So one is what is the container this thing runs in because you don't want to have to like close your laptop. I have that right now because I'm doing a lot of uh disb I don't know I've only heard I've only read it but DSPY or disb prompt optimization and like it's on my laptop and it's like I don't want to close I'm like in the way [laughter] middle like with my laptop open because I'm like I don't want to close it. Yeah.
[`47:03`](https://youtu.be/IDSAMqip6ms?t=2823) Yeah. That's right. Yeah. We've like visited companies before like like customers that everyone's just like walking around with their like quad codes. [laughter]
[`47:11`](https://youtu.be/IDSAMqip6ms?t=2831) Is this running? So, I think like one is kind of getting getting away from this mode and then I also think pretty soon we're going to be in this mode of like quads monitoring quads.
[`47:17`](https://youtu.be/IDSAMqip6ms?t=2837) Yeah.
[`47:19`](https://youtu.be/IDSAMqip6ms?t=2839) Um and kind of I I don't know what the right form factor for this is because as as a human you need to be able to inspect this and kind of see what's going on. Um but also it needs to be quad optimized where um you're optimizing for kind of bandwidth between like the quad to quad communication. Um so my prediction is terminal is not the final form factor. My prediction is there's going to be a few more form factors in the coming months, you know, maybe like year or something like that. And it's going to keep changing very quickly.
[`47:48`](https://youtu.be/IDSAMqip6ms?t=2868) What do you think about, you know, I teach a lot of cloud code to a lot of every subscribers and
[`47:52`](https://youtu.be/IDSAMqip6ms?t=2872) thank you.
[`47:54`](https://youtu.be/IDSAMqip6ms?t=2874) You're welcome. Doing doing your work for you. [laughter] Um uh and I think the like one of the big things is just the terminal is intimidating and uh just like being on a call with subscribers being like here's how you open the terminal and you're allowed to do this even if you're non-technical is like a big deal. [laughter] How do you think about that? Yeah, I um one of the people on our marketing team uh started using cloud code because she was writing some content that touched on cloud code and I was like you should really experience it and she got like 30 popups on her screen where she had to accept various permissions because she'd never used a terminal before. So I completely see eye to eye with you on that. It's definitely um hard for non-engineers and there's even some engineers we've found who aren't fully comfortable with working day-to-day in the terminal. Um, our VS Code GUI extension is our first step in that direction because you don't have to think about the terminal at all. It's like a traditional interface with a bunch of buttons. Um, I think we are working on more um graphical interfaces. Uh, so quad code on the web is a guey. I think that actually might be a good starting point for people who are less technical.
[`49:05`](https://youtu.be/IDSAMqip6ms?t=2945) Yeah. Yeah. There there was this like magic moment maybe like a few months ago where like I walked into the office and the some of the data scientists at Anthropic like sit right next to the quad code team and the data scientist just had like quad code running on their computers and I was like what what is this like how did you figure this out? I think it was like Brandon uh was like the first one to do it and he was like, "Oh yeah, I just like installed it. Like I work on this product so like I should use it." And I was like, "Oh my god." So he like he figured out how to like use a terminal and JS like you know he hasn't really done this kind of workflow before. Obviously like very technical. Um so I think now we're we're starting to see all these kind of like code adjacent uh like functions. people you use quad code and um yeah it's kind of interesting like from a latent demand point of view these are people hacking the product so there's like demand to use it for this and so we want to make it a little bit easier with more accessible interfaces but at the same time for us for quad code we're laser focused on building the best product for the best engineers and so um we're focused on software engineering and we want to make this like really good but we want to make it a thing that other people can can hack
[`50:11`](https://youtu.be/IDSAMqip6ms?t=3011) some sometimes cloud code will write code that's a bit verbose post. Um, but you can just tell it to simplify it and it does a really good job.
[`50:20`](https://youtu.be/IDSAMqip6ms?t=3020) Interesting. And so, and how are how and when are you doing that? So, you're you're using a slash command or you're
[`50:26`](https://youtu.be/IDSAMqip6ms?t=3026) I just say it. I just
[`50:27`](https://youtu.be/IDSAMqip6ms?t=3027) Sometimes you're like, "Hey, this should be a oneline change and I'll write five lines and you're like, simplify it and it understands immediately what you mean and it'll fix it."
[`50:35`](https://youtu.be/IDSAMqip6ms?t=3035) Yeah. I think a lot of people on our team do that, too. Um, that's that's interesting. Why do you like why not then if you're saying that all the time why not then you know push that into like a slash command or the harness or something like that to yeah make it just happen automatically.
[`50:51`](https://youtu.be/IDSAMqip6ms?t=3051) We do have instructions for this in the cloud MD. I think it impacts such a low percentage of conversations that we don't want it to like over rotate in the other direction.
[`51:03`](https://youtu.be/IDSAMqip6ms?t=3063) Um and then the reason why not a slash command is because you actually don't need that much context. I think slash command's really good for situations where you would otherwise need to write two three lines but for simp like even for plan mode you actually can use a few words but sometime but it actually takes two or three lines to capture the entirety of what you want in plan mode. Um for simplify it you can just write simplify it and it gets it.
[`51:28`](https://youtu.be/IDSAMqip6ms?t=3088) Yeah. Yeah, that makes sense. Cool.
[`51:29`](https://youtu.be/IDSAMqip6ms?t=3089) Yeah.
[`51:33`](https://youtu.be/IDSAMqip6ms?t=3093) Um okay, now we're we can [laughter] um that's interesting. Yeah, but but this stuff like you know it still feels just so early.
[`51:39`](https://youtu.be/IDSAMqip6ms?t=3099) Yeah.
[`51:41`](https://youtu.be/IDSAMqip6ms?t=3101) You know, like we we were talking before before the recording about like kind of where are we on the adoption curve and it still
[`51:47`](https://youtu.be/IDSAMqip6ms?t=3107) the hian curve or whatever [laughter] whatever that term was.
[`51:50`](https://youtu.be/IDSAMqip6ms?t=3110) Exactly. And it just feels it just feels like we're you know like first 10% still like the stuff is going to change so fast it's going to keep changing. Even when I talk to researchers outside of enthropic who who abuse quad code um they also get stuck on things like this like not realizing that they can just tell the LLM to simplify it and I think that just goes to show that even for people who are like working in this industry they don't always realize that you can just talk to the model. That's the thing is like I I think that there's this underlying expectation that using AI shouldn't have to be a skill like because it just does whatever you say and you're like well I mean whatever you say is going to matter for what it does. So if you can say things better it's going to do better. [laughter]
[`52:33`](https://youtu.be/IDSAMqip6ms?t=3153) Yeah. I mean it it changes with every model though. That's the that's the hard part. like you know prompt engineer was a job and now famously it's not a job anymore and there's going to be more jobs that are then like not not jobs anymore of these kind of like little micro skills that you have to learn to use this thing and as the model gets better it can just like interpret it better
[`52:50`](https://youtu.be/IDSAMqip6ms?t=3170) but I think that's also like for us this is part of this kind of humility that we have to have building a product like this that we just really don't know what's next and we're just trying to figure it out kind of along with everyone else we're just here for the ride
[`53:00`](https://youtu.be/IDSAMqip6ms?t=3180) and that's why it's cool that you're building it for yourself cuz I think that's the that's the best way to know that is just like you're and this is what we do too is like you're sort of living in the future. You're using it all the time. And uh it's pretty clear what's missing. You're like I just want this thing and you can just do the next thing rather than being like hm let me ask like some enterprise product manager at like some gigantic company like what kind of AI feature do you want? And they're like I don't know like you know put a little chatbot on the side of my you know IDE and you're like okay. [laughter]
[`53:28`](https://youtu.be/IDSAMqip6ms?t=3208) Yeah.
[`53:30`](https://youtu.be/IDSAMqip6ms?t=3210) Yeah. This is like the luxurious thing about building dev tools right you're your own customer. I think it's also really um a unique thing about AI because um it sort of reset the game board for all software. So um you know we have Kora this like email assistant and we have like Sparkle which organizes your files and it's like anything that you do for something that you want to use on your computer if you're if you're building it with AI there's a good chance that hasn't been done before because like the whole whole landscape has been reset. And so it's a it's a uniquely exciting time to build stuff for yourself.
[`54:06`](https://youtu.be/IDSAMqip6ms?t=3246) Totally. I think it totally opens the playing field, too. It's like any individual can now build an app to fill their need and then distribute it to everyone else.
[`54:14`](https://youtu.be/IDSAMqip6ms?t=3254) Yeah,
[`54:15`](https://youtu.be/IDSAMqip6ms?t=3255) it's really cool.
[`54:17`](https://youtu.be/IDSAMqip6ms?t=3257) I've been prototyping all these like random pet projects. Um
[`54:23`](https://youtu.be/IDSAMqip6ms?t=3263) um I just moved into a new apartment and it's empty. And so I've been um I've been building this like shopping advisor assistant on like the Cloud Agent SDK cuz who has time to like read all the reviews and like look at all the options and find their pricing and everything's like really hard to discover. And so it just like asks me a bunch of questions and I tell it what I want and it shows me a bunch of Yeah, exactly. and it shows me a bunch of photos of like different sofas and options and what people say online
[`54:49`](https://youtu.be/IDSAMqip6ms?t=3289) and then I tell it what I don't like and it's literally feels like working with a shopping assistant
[`54:55`](https://youtu.be/IDSAMqip6ms?t=3295) and it it's been really cool.
[`54:56`](https://youtu.be/IDSAMqip6ms?t=3296) That's really cool.
[`54:58`](https://youtu.be/IDSAMqip6ms?t=3298) Um I also have my little email response agent that like drafts responses for me but I don't use email that much so
[`55:05`](https://youtu.be/IDSAMqip6ms?t=3305) Oh, and I knew it wasn't you responding. [laughter] The agent's just take doing a very thorough job. [laughter]
[`55:16`](https://youtu.be/IDSAMqip6ms?t=3316) Yeah,
[`55:18`](https://youtu.be/IDSAMqip6ms?t=3318) agent SDK is cool though.
[`55:20`](https://youtu.be/IDSAMqip6ms?t=3320) Yeah, agent SDK is cool.
[`55:22`](https://youtu.be/IDSAMqip6ms?t=3322) Yeah, it's it always just feels amazing like how much we're able to build with such a small team.
[`55:24`](https://youtu.be/IDSAMqip6ms?t=3324) Yeah.
[`55:25`](https://youtu.be/IDSAMqip6ms?t=3325) So, I feel like
[`55:26`](https://youtu.be/IDSAMqip6ms?t=3326) the other thing that's really cool is that I think people are just shifting their mindset from docs to demos. Like internally, our currency is actually demos. It's like you want people to be excited about your thing. Yeah,
[`55:38`](https://youtu.be/IDSAMqip6ms?t=3338) show us like show us 15 seconds of what it can do.
[`55:42`](https://youtu.be/IDSAMqip6ms?t=3342) And we find that everyone on the team now has this kind of indoctrinated
[`55:45`](https://youtu.be/IDSAMqip6ms?t=3345) demo culture for sure. And I think that's better because
[`55:49`](https://youtu.be/IDSAMqip6ms?t=3349) there's a lot of things that you might have in your head that if you're a great writer, maybe you could figure out how to explain it, but it's just even then it's just really hard to explain. But if someone can see it, they like get it immediately. And I think that's happening for product building, but it's also happening for like all sorts of other types of creative endeavors like making a movie for example. Like you had to pitch it, but now you can just be like I made this Sora video and like you know check like you can kind of see like like the glimmer of the thing you're trying to make for very cheap. And so that means you don't have to spend time convincing people as much. You can just be like here I made it.
[`56:24`](https://youtu.be/IDSAMqip6ms?t=3384) Yeah. And and also as a builder like you can just make it and then like make it again and then make it again [laughter] until you're happy. Like
[`56:31`](https://youtu.be/IDSAMqip6ms?t=3391) I I feel like that like the flip side is like you used to make a dock or you know like whiteboard something or you know like I I would draw stuff in like Sketch or Figma or whatever and now we'll just like build it until until I like how it feels.
[`56:42`](https://youtu.be/IDSAMqip6ms?t=3402) And it's just like so easy to get that feeling out of it now. And I I think it's like you could see it visually before or you could describe it in words but it's like you could never get the vibe. And now like the vibe is really easy.
[`56:53`](https://youtu.be/IDSAMqip6ms?t=3413) Yeah. And you built plan mode like three times.
[`56:55`](https://youtu.be/IDSAMqip6ms?t=3415) Yeah. Yeah.
[`56:56`](https://youtu.be/IDSAMqip6ms?t=3416) Because of this.
[`56:57`](https://youtu.be/IDSAMqip6ms?t=3417) Like you you built it and then you threw it out and rebuilt it and then threw it out and rebuilt it.
[`57:02`](https://youtu.be/IDSAMqip6ms?t=3422) Yeah. Or like Tudos's uh like Sid built the original version like also like three or four he built like three or four prototypes and then I prototype maybe like 20 versions after that like in like a day. Yeah. I think this is like a lot of pretty much everything we released there was at least a few prototypes behind it. How do you like um keep track of and carry forward the things you learn from prototype to prototype? And especially if it's like, you know, some one person is prototyping it and then you're like, I'm going to take it over. I'm going to do 20 more. Like how do you how do you maximize what you get out of that? You know, it's it's like there there's maybe a few elements of it. One is the style guide. So there's like some elements of style that we discover. And I think a lot of this is like building for the terminal or like we're kind of discovering a new design language for for the terminal and kind of building it as we go. Um, and I think some of this you can codify in a style guide. So this is our quad MD, but then there's this other part of it that's like kind of product sense where I don't think the model totally gets it yet. And I think maybe we should be trying to find ways to like teach the model this this kind of product sense about like this works and this doesn't, right? Because in in product, you want to solve the person's problem in the simplest way possible and then delete everything else that's not that and just get everything out of the way. So you kind of you you align the product to the intent as cleanly as possible. And maybe the model doesn't totally get that yet.
[`58:24`](https://youtu.be/IDSAMqip6ms?t=3504) Yeah. It's never it doesn't really feel what it's like to use quad code. Like the model doesn't use quad code.
[`58:31`](https://youtu.be/IDSAMqip6ms?t=3511) Yeah. Yeah. And so I think like when you know like quad code can like test itself and it can kind of use itself. Um and like we we do this when developing and it can see like UI bugs and things like that. I don't know maybe we should just try prompting it though. It could like honestly a lot of the stuff is as simple as that. Like when there's some new idea usually you just prompt it and often it just works. Maybe we should just try that.
[`58:57`](https://youtu.be/IDSAMqip6ms?t=3537) A lot of the prototypes are actually the UX interactions. Um, and so I think once we discover a new UX interaction like shift tab for auto accept, I think uh Boris figured out. Um, then
[`59:11`](https://youtu.be/IDSAMqip6ms?t=3551) that was Eigor actually.
[`59:12`](https://youtu.be/IDSAMqip6ms?t=3552) Oh, Eigor.
[`59:13`](https://youtu.be/IDSAMqip6ms?t=3553) Yeah, we went back and forth can like fit into that.
[`59:16`](https://youtu.be/IDSAMqip6ms?t=3556) We did like dueling prototypes for like a week. [laughter]
[`59:20`](https://youtu.be/IDSAMqip6ms?t=3560) Yeah, shift tab felt really nice. And then one of the the now current plan mode iteration um uses shift tab because it's actually just like another way to tell the model how agentic it should be.
[`59:35`](https://youtu.be/IDSAMqip6ms?t=3575) And so I think as as more features use the same uh interaction, you form like a stronger mental model for what should go where.
[`59:42`](https://youtu.be/IDSAMqip6ms?t=3582) Yeah. Or like thinking I think is another really good one. Like first we were like before we released quad code or maybe it was like the first thinking model was it like 37? I forget what the first one was.
[`59:51`](https://youtu.be/IDSAMqip6ms?t=3591) Yeah.
[`59:54`](https://youtu.be/IDSAMqip6ms?t=3594) But yeah and it it was like it was able to think and we're like brainstorming like how do we like toggle thinking? And then someone was just like what if you just like ask the model to think in natural language and it knows how to think and we're like okay sweet let's do that. [laughter] And so like we we did that for a while and then um we realized that people were accidentally toggling it. So they were like don't think and then the model was like oh I should think. it just started thinking
[`1:00:15`](https://youtu.be/IDSAMqip6ms?t=3615) and so we had to kind of like tune it out so you know don't think didn't trigger it but then it still wasn't obvious but then we made a UX improvement to like highlight the thinking that
[`1:00:23`](https://youtu.be/IDSAMqip6ms?t=3623) and that was like that was so fun and it felt really magical
[`1:00:25`](https://youtu.be/IDSAMqip6ms?t=3625) when you do ultra think it's like rainbow or whatever exactly [laughter] and then with uh with sonet 45 we actually find like a really really big performance improvement when you turn on extended thinking um and so uh we made it really easy to toggle it because sometimes you want it sometimes you because you you kind of for a really simple task, you don't want the model to think for like five minutes. You want it to just do the thing. And so we used tab as the interaction to toggle it. And then we unchipped a bunch of the thinking words. Although I I think we kept ultra think just for like sentimental reasons. [laughter] It was such a cool UX.
[`1:01:02`](https://youtu.be/IDSAMqip6ms?t=3662) Interesting. Do you think there's some there's some new metric that's about what you deleted? And I I think programmers have always felt like, you know, deleting a bunch of code feels really good, but there's something about because you can build stuff so fast, it becomes more important to like also delete stuff. I think my favorite kind of diff to see is a red diff. [laughter] This is the best whenever I'm like, "Yeah, bring it on. Another one. Another one." Um, but it, you know, but it's hard because like anything you ship, people are using it. And so you got to keep people happy. And so I think generally our principle is if we un ship something, we need to ship something even better. um that can kind of um that people can can take advantage of that that kind of matches that intent uh even better. Um and yeah, I think this is kind of back to like how do you measure like quad code and the impact of it and this is something like every company every customer asks us about and I think like in so internally at anthropic I think we like doubled in size since January or something like that but then productivity per engineer has increased like almost 70% in that time. um
[`1:02:01`](https://youtu.be/IDSAMqip6ms?t=3721) measured by
[`1:02:03`](https://youtu.be/IDSAMqip6ms?t=3723) uh I think we actually measured it yeah in a few ways but kind of PRs are the the simplest one and the main one um but like you said like this doesn't capture the full extent of it because a lot of this is like making it easier to prototype making it easier to try new things making it easier to these things that you never would have tried because they're way below the cut line. Um you're launching a feature and there's this kind of like wish list of stuff now you just do all it because it's so easy
[`1:02:25`](https://youtu.be/IDSAMqip6ms?t=3745) and you just wouldn't have done it.
[`1:02:27`](https://youtu.be/IDSAMqip6ms?t=3747) So yeah, it's really hard to talk about it. And then there's this flip side of it where more code is written. So you have to delete more code. You have to code review more carefully and you know automate automate code review as much as you can. There's also like an interesting like new product management challenge because you can ship so much that you end up it it ends up not feeling as cohesive because you could just like add button here and like a tab there and like a little thing here. Like it's just it's much easier to build a product that has all the features you want but doesn't have any sort of organizing principle because you're just shipping lots of stuff all the time. I think we try to be pretty disciplined about this and making sure that all the abstractions are really easy to understand for someone even if they just hear the name of the feature. We have this principle that I believe Boris brought to the team that I really like where we don't want a new user experience. Everything should be so intuitive that you just drop in and it just works. And I think that's that's really set the bar really high for making sure every feature is really intuitive. How do you do that with um a conversational UI? Because um you know when there's not a bunch of but buttons and knobs and it's just a blank text box to start, how do you think about making it intuitive?
[`1:03:37`](https://youtu.be/IDSAMqip6ms?t=3817) Um there's a lot of like little things that we do like um we teach people that they can use the question mark to see tips. Um we show tips as quad code is working. We have like the change log on the side. um we tell you about like oh there's a new model that's out or like we show you at the bottom we have a notification section for thinking. I think there's just like subtle ways in which we tell users about features. I think the other thing that's really important is to just make sure that all the primitives are very clearly defined like hooks have a common meaning um in the developer ecosystem. plugins have a very common meaning in the developer ecosystem and just making sure that what we build matches what like the you know the average developer would immediately think of when they hear that
[`1:04:25`](https://youtu.be/IDSAMqip6ms?t=3865) there there's this also this like progressive disclosure thing like you know to to any anytime in quad code when you run it you can hit control O to see like the full raw transcript the same thing the model sees and we don't like show you this until it's actually relevant so when there's a tool result that's collapsed then we'll say use control O to see it so we kind of we don't want to put too much complexity on you at the start because this thing can do you know anything. Um I think there's this other kind of new principle which we've just started exploring which is like the model teaches you how to use the thing and so you can ask quad code about itself and it it kind of knows to look up its own documentation to tell you about it but we can also go even deeper like for example slash commands are a thing that people can use but also the model can call slash commands and maybe you see the model calling it and then you'll be like oh yeah I guess I can do that too.
[`1:05:13`](https://youtu.be/IDSAMqip6ms?t=3913) Yeah. Yeah. Yeah. Interesting. How has it changed like you know when you first started doing this cloud code was this sort of like singular thing this singular way of thinking about you know using AI through a CLI other people had stuff like this but it it felt like this shift and now there's a whole landscape of everyone is like going CLI CLI CLI like how has that changed how you think about building how it feels to build and how are you dealing with the sort of pressure of the race that you're in? I
[`1:05:39`](https://youtu.be/IDSAMqip6ms?t=3939) think for for me like imitation is the greatest flattery. Mhm. Um, so it's like it, you know, it's it's awesome and it's just like it's cool to see all this other stuff that everyone else is building like inspired by this. And I think this is ultimately the goal is to kind of inspire people to build this next thing for this just incredible technology that's that's coming. And that that's just really exciting. Personally, I don't really use a lot of other tools. So, usually when something new comes out, I'll I'll maybe just try it to get a vibe. Um, but otherwise [snorts] I think we're pretty focused on just solving problems that we have and our customers have and kind of building the next thing.
[`1:06:15`](https://youtu.be/IDSAMqip6ms?t=3975) Cool. Sweet. Um, I I loved this part of the interview, too. [laughter]
[`1:06:22`](https://youtu.be/IDSAMqip6ms?t=3982) Did we answer all of your team's questions?
[`1:06:23`](https://youtu.be/IDSAMqip6ms?t=3983) Questions?
[`1:06:25`](https://youtu.be/IDSAMqip6ms?t=3985) Do Oh, did we get through all my team's questions? Let's see. Uh, I think we did. Um, uh, I'm curious also how you would answer like the unshipping question cuz also like if you're doing this kind of like AIdriven development, you ship a lot. You have a small team, so it's a lot of operational load.
[`1:06:43`](https://youtu.be/IDSAMqip6ms?t=4003) The reason I asked that is because I don't think we do a good job of that. Um, and I have this feeling that some of the products are like a little bit messy because of that. And I think particularly for Kora, um there's just a big product surface area and it can do a lot of different things like it we have an email assistant so you can ask it like you know uh tell me about the trip I'm taking and it'll go through all your emails and you know summarize the the trip. Um or we have this feature that it automatically archives any email that you don't need to respond to immediately. Um, and then twice a day you get a brief that summarizes all the stuff that you probably need to see but you don't need to like actually do anything with and you just scroll through it and you're done. Um, and there's just like all this there's all this complexity that around you know for example how are emails categorized? So now we have a whole view of we have all these categorization rules and you can order them and whatever, but like it's just complicated and hard to communicate and and uh and I want to retain a lot of the like all the power and flexibility, but also you can't look at a screen and be like I have no idea what's going on. This is like way too complicated. So that's I'm just like I'm processing all that stuff. So the the kind of like deletion, you know, un unshipping idea feels like an interesting um cultural principle that we haven't really explored.
[`1:08:14`](https://youtu.be/IDSAMqip6ms?t=4094) Yeah, it's really hard. I think there's like a social cost to it, too, where like you kind of want to be the person who tells your coworker to unship their thing. [laughter]
[`1:08:24`](https://youtu.be/IDSAMqip6ms?t=4104) It's definitely tricky. It's more than just the code. Yeah, I I definitely learned this at Instagram honestly cuz I I think Facebook does a terrible job at unshipping and we had this problem where
[`1:08:34`](https://youtu.be/IDSAMqip6ms?t=4114) every time we I think even like unshipping pokes was like really spicy cuz there's a bunch of these like old-timers. They're like, "No pokes, you're never going to take it away." But like if you look at the data, no one really uses it anymore.
[`1:08:44`](https://youtu.be/IDSAMqip6ms?t=4124) But for sentimental reasons, they were kind of tied to it.
[`1:08:47`](https://youtu.be/IDSAMqip6ms?t=4127) And so like for Facebook, it always maybe nothing ever got unchipped. It always got moved to like a secondary place like a, you know, like an overflow menu somewhere that no one looks at, like a graveyard.
[`1:08:55`](https://youtu.be/IDSAMqip6ms?t=4135) Yeah.
[`1:08:57`](https://youtu.be/IDSAMqip6ms?t=4137) And I think Instagram was just very principled. There was like, you know, very strong in a product and design point of view that was like, if this thing isn't used by like half of people, you know, 50% of WOW or whatever, we're just going to delete it and deal with it and then we'll figure out some next thing that's used by more people.
[`1:09:13`](https://youtu.be/IDSAMqip6ms?t=4153) I love it. Um, well, thank you. This was amazing. I'm really uh glad I got to talk to you and uh keep building.
[`1:09:18`](https://youtu.be/IDSAMqip6ms?t=4158) Thank you for having us.
[`1:09:29`](https://youtu.be/IDSAMqip6ms?t=4169) Oh my gosh, folks. You absolutely positively have to smash that like button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat, craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. [snorts] So, do yourself a favor, hit like, smash subscribe, and strap in for the ride of your life. And now, without any further ado, let me just say, Dan, I'm
---
## Sources
- [The Secrets of Claude Code From the Engineers Who Built It — Every — YouTube](https://youtu.be/IDSAMqip6ms)
- [Every](https://every.to/)