Discussion about this post

User's avatar
Daisy | tinydoozy's avatar

Interesting read! The distinction between models getting better at surface-level policy compliance versus still finding sophisticated ways to override or reinterpret the intent felt particularly sharp. It’s making me reconsider some assumptions as I scope a small hackathon project. Thanks for writing it.

Paul Gibbons's avatar

This is frontier thinking. Thanks!

1 more comment...

No posts

Ready for more?