/

/

navigation.content

Sign in or Join the community to continue

Confidence scoring and skill hardening with Codex

Posted Jun 18, 2026 | Views 88

# Codex

# champions

# AI Techniques

Share

SUMMARY

Confidence checks and human review and feedback help teams decide what to trust, what to remove, and how to improve the skill over time. Bryant explains how he uses confidence gaps and reviewer feedback to harden skills over time.

+ Read More

TRANSCRIPT

1 00:00:00.000 --> 00:00:05.000 Kenna Valdez: Great, and what was that confidence? Score?

2 00:00:05.000 --> 00:00:09.286 Bryant McCombs: That's something that I've just kind of built out on my own. So

3 00:00:09.286 --> 00:00:13.571 over time I've kind of developed my own. Just like confidence. Score for any output

4 00:00:13.571 --> 00:00:17.857 from either ChatGPT or Codex. But it's basically if you. As, like you as encodex,

5 00:00:17.857 --> 00:00:22.143 or you, as in ChatGPT, are not one hundred percent. Confident in the answer, just

6 00:00:22.143 --> 00:00:26.429 tell me why. So give me like a better sense of okay. These are the things that

7 00:00:26.429 --> 00:00:30.714 would give me one hundred percent confidence in. If it can just go, look at those

8 00:00:30.714 --> 00:00:35.000 things up. I'll have, like a follow-up prompt that says okay, great look. Those

9 00:00:35.000 --> 00:00:39.286 things up to make yourself one hundred percent confident. Or if it's external,

10 00:00:39.286 --> 00:00:43.571 facing communication or internal facing communication that I need to be. One hundred

11 00:00:43.571 --> 00:00:47.857 percent accurate. I'll say, just remove anything where you don't have. One hundred

12 00:00:47.857 --> 00:00:52.143 percent confidence until you get to one hundred percent confidence, but yeah it's

13 00:00:52.143 --> 00:00:56.429 basically it's not like a super like quantitative metric necessarily. It's more

14 00:00:56.429 --> 00:01:00.714 so okay like did you pull? In? Every necessary resource, and then, if not, like

15 00:01:00.714 --> 00:01:05.000 what assumptions are you making? Say that I could remove those assumptions.

16 00:01:05.000 --> 00:01:09.429 Kenna Valdez: Yeah, and that actually ties into the last question that I have.

17 00:01:09.429 --> 00:01:13.857 In the chat, and then we'll we'll wrap things up in terms of what exactly you review

18 00:01:13.857 --> 00:01:18.286 and approve. Have you ever thought about? And this sounds like part in part. The

19 00:01:18.286 --> 00:01:22.714 confidence score describing what steps you take to come to a judgment. And then,

20 00:01:22.714 --> 00:01:27.143 briefing Codex on that like exactly exact approach. So where do you kind of draw

21 00:01:27.143 --> 00:01:31.571 that line between? What the confidence score can tell you where you want to stop?

22 00:01:31.571 --> 00:01:36.000 And really have that that human review point.

23 00:01:36.000 --> 00:01:40.500 Bryant McCombs: No I love that so. We internally a lot of folks will call this

24 00:01:40.500 --> 00:01:45.000 skill hardening. So if you have a skill and it's working well like eighty five

25 00:01:45.000 --> 00:01:49.500 percent of the time, right? I think. What a lot of folks would expect is that you

26 00:01:49.500 --> 00:01:54.000 know just kind of leave. It, as is, and you get comfortable with that. Eighty five

27 00:01:54.000 --> 00:01:58.500 percent right. But I think what's. As we all kind of become builders, like as knowledge

28 00:01:58.500 --> 00:02:03.000 workers start to become builders. It's going to become incumbent upon us to have

29 00:02:03.000 --> 00:02:07.500 more of kind of like a iterative mindset when it comes to how we build these things

30 00:02:07.500 --> 00:02:12.000 right. So what I'll typically do is I'll just say okay great. This is a great output

31 00:02:12.000 --> 00:02:16.500 in the future. Here are three or four things that I would like you to kind of.

32 00:02:16.500 --> 00:02:21.000 Consider, as a part of this skill going forward. And I'm just doing that over and

33 00:02:21.000 --> 00:02:25.500 over again. And then I found that the fidelity of these use cases, or of these

34 00:02:25.500 --> 00:02:30.000 skills over time, is just significantly increased. So if you're willing to just

35 00:02:30.000 --> 00:02:34.500 kind of like put in the work with some of these skills and say, I'll give an example.

36 00:02:34.500 --> 00:02:39.000 I have, like a email triage skill, right that responds to all of my emails. Every

37 00:02:39.000 --> 00:02:43.500 morning, or at least creates drafts for all of my emails every morning. And I'd

38 00:02:43.500 --> 00:02:48.000 say, in the beginning I was editing a lot of these drafts right like I was like

39 00:02:48.000 --> 00:02:52.500 that's not how I sound or like, that's not how I'd respond to this particular situation.

40 00:02:52.500 --> 00:02:57.000 Et cetera, or it was being like overly formal. If it were like, you

+ Read More

Sign in or Join the community

Watch More

Workflow clip: Proactively monitor accounts with Codex

Posted Jun 12, 2026 | Views 79

# Codex for Work

# Activators

# Use Cases

Make Work Flow: Proactively monitor accounts with Codex

Posted Jun 12, 2026 | Views 79

# Codex for Work

# Activators

# Use Cases

Recording: Make Work Flow: Automate CRM Updates with Codex

Posted Jun 18, 2026 | Views 68

# Codex for Work

# Use Cases

# Activators