Instant SQL for results as you type in DuckDB UI

378 points by ryguyrg 2 months ago

I just watched the author of this feature and blog post give a talk at the DataCouncil conference in Oakland, and it is obvious what a huge amount of craft, ingenuity, and care went into building it. Congratulations to Hamilton and the MotherDuck team for an awesome launch!

ryguyrg 2 months ago

wohoo! glad you noticed that. Hamilton is amazing.
wodenokoto 2 months ago

Is that talk available online?
- carlineng 2 months ago
  
  Not yet, but I believe the DataCouncil staff recorded it and will post it to their YouTube channel sometime in the next few weeks: https://www.youtube.com/@DataCouncil/videos

ryguyrg 2 months ago

In DuckDB UI and MotherDuck.

Awesome video of feature: https://youtu.be/aFDUlyeMBc8

Disclaimer: I’m a co-founder at MotherDuck.

rancar2 2 months ago

Thanks for sharing this update with the world and including it on the local ui too.
Feature request: enable the tuning of when Instant SQL is run and displayed. The erroring out with flashing updates at nearly every keystoke while expanding on a query is distracting for me personally (my brain goes into troubleshooting vs thinking mode). I like the feature (so I will keep it on by default), but I’d like to have a few modes for it depending on my working context (specifically tuning of update frequency at separation characters [space, comma], end of statement [semicolon/newline], and injections [paste/autocomplete]).
- hamilton 2 months ago
  
  Great feedback! Thanks. We agree w/ the red errors. It's not helpful when it feels like your editor is screaming at you.
strgcmc 2 months ago

This is probably stupid, but at the hope of helping others through exposing my own ignorance -- I'm having trouble actually installing and running the preview... I've downloaded the preview release duckdb binary itself, then when I try to run "duckdb -ui", I'm getting this error:
Extension Autoloading Error: An error occurred while trying to automatically install the required extension 'ui': Failed to download extension "ui" at URL "http://extensions.duckdb.org/0069af20ab/osx_arm64/ui.duckdb_..." (HTTP 403) Extension "ui" is an existing extension.
Is it looking to download the preview version of the extension, but getting blocked/unauthorized (hence the 403 forbidden response)? Or is there something about the auto-loading behavior that I'm supposed to disable maybe?
- 1egg0myegg0 2 months ago
  
  Sorry you hit that! This is actually already working on version 1.2.2. Could you install that version? That should get you going for the moment! We will dig into what you ran into.
  
  strgcmc 2 months ago
  
  All good, v1.2.2 works fine, thank you!
theLiminator 2 months ago

Curious if there has been any thought given to open sourcing the UI? Of course there's no obligation to though!
- hamilton 2 months ago
  
  We do have plans. It's a question of effort, not business / philosophy.
  
  rastignack 2 months ago
  
  It’s good to know it. I live in a heavily regulated workplace and our data usage is constantly monitored.
  Good to know a totally offline tool is being considered.
  Thanks for the great tool BTW.
  
  d0100 2 months ago
  
  That would be nice as it would spare us the effort of replicating the UI, half-baked as we can
  
  theLiminator 2 months ago
  
  Thank you, that's awesome to hear!

jakozaur 2 months ago

It would be even better if SQL had pipe syntax. SQL is amazing, but its ordering isn’t intuitive, and only CTEs provide a reliable way to preview intermediate results. With pipes, each step could clearly show intermediate outputs.

Example:

FROM orders |> WHERE order_date >= '2024-01-01' |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id |> WHERE total_spent > 1000 |> INNER JOIN customers USING(customer_id) |> CALL ENRICH.APOLLO(EMAIL > customers.email) |> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country

tstack 2 months ago

The PRQL[1] syntax is built around pipelines and works pretty well.
I added a similar "get results as you type" feature to the SQLite integration in the Logfile Navigator (lnav)[2]. When entering PRQL queries, the preview will show the results for the current and previous stages of the pipeline. When you move the cursor around, the previews update accordingly. I was waiting years for something like PRQL to implement this since doing it with regular SQL requires more knowledge of the syntax and I didn't want to go down that path.
[1] - https://prql-lang.org [2] - https://lnav.org/2024/03/29/prql-support.html
- RyanHamilton 2 months ago
  
  If you want to get started with prql check out qstudio https://www.timestored.com/qstudio/prql-ide it allows running prql easily against mysql postgresql duckdb etc
- mritchie712 2 months ago
  
  there's a PRQL extension for duckdb:
  https://community-extensions.duckdb.org/extensions/prql.html
hamilton 2 months ago

Obviously one advantage of SQL is everyone knows it. But conceptually, I agree. I think [1]Malloy is also doing some really fantastic work in this area.
This is one of the reasons I'm excited about DuckDB's upcoming [2]PEG parser. If they can pull it off, we could have alternative dialects that run on DuckDB.
[1] https://www.malloydata.dev/ [2] https://duckdb.org/2024/11/22/runtime-extensible-parsers.htm...
da_chicken 2 months ago

While I would certainly agree with you that putting the FROM clause first would be a significant improvement to SQL and was a genuine design mistake, this otherwise feels more like just wanting SQL to be less declarative and more imperative. Wanting it to be more like LINQ and less like relational algebra.
That, I think, is most developers' real sticking point with SQL. It's not object-relational impedance mismatch between their application and the data store, it's imperative-declarative impedance mismatch between their preferred or demonstrated talent. They are used to thinking about problems in exactly one way, so when they struggle to adapt to a different way of thinking about the problems they assume their familiarity is what's more correct.
I think this is why the same developers insist that XML/HTML is "just a markup language." Feeding a document into an executable to produce output isn't really significantly different than feeding imperative language into a compiler. The only real difference is that one is Turing complete, but Turning completeness is not a requirement of programming languages.
metadata 2 months ago
Google SQL has it now:
https://cloud.google.com/blog/products/data-analytics/simpli...
It's pretty neat:
```
    FROM mydataset.Produce
    |> WHERE sales > 0
    |> AGGREGATE SUM(sales) AS total_sales, COUNT(\*) AS num_sales
       GROUP BY item;
```
Edit: formatting
- ryguyrg 2 months ago
  
  note that DuckDB allows that reverse ordering (FROM-first)
  FROM table SELECT foo, bar WHERE zoo=“goo”
  
  viggity 2 months ago
  
  it makes intellisense/autocomplete work a hell of a lot easier. LINQ in dotnet does the same thing.
crooked-v 2 months ago

I suspect you'll like PRQL: https://github.com/PRQL/prql
wodenokoto 2 months ago

I haven’t tested but I believe there’s a prql extension for duckdb
NDizzle 2 months ago

This is the stuff nightmares are made out of. Keep that style of coding out of any project I’m involved in, please.
- sidpatil 2 months ago
  
  What do you dislike about that style?
  
  bb86754 2 months ago
  
  He/she isn't used to it. Any R, Elixir, or F# developer would be right at home with this syntax.
cdchhs 2 months ago

that syntax is horrendous.

motoboi 2 months ago

DuckDb is missing a killer feature by not having a pipe syntax like kusto or google's pipe query syntax.

Why is it a killer feature? First of all, LLMs complete text from left to right. That alone is a killer feature.

But for us meatboxes with less compute power, pipe syntax allow (much better) code completion.

Pipe syntax is delightful to work with and makes going back to SQL a real bummer moment (please insert meme of Kate Perry kissing the earth here).

ergest 2 months ago

There’s an extension for that https://github.com/ywelsch/duckdb-psql
- Philpax 2 months ago
  
  Also https://github.com/ywelsch/duckdb-prql (by the same author!)
hantusk 2 months ago

CTEs go a long way towards left to right readability while keeping everything standard SQL.
gervwyk 2 months ago

Nothing comes close to the power of mongodb aggression pipelines.. when used in production apps it reduces the amount of code significantly for us by doing data modeling as close as possible to the source
- sterlinm 2 months ago
  
  [grizzled kdb+ user considers starting an argument but then thinks better of it]

XCSme 2 months ago

I hope this doesn't work with DELETE queries.

falcor84 2 months ago

Maybe in the next version they could also implement support for DROP, with autocorrect for the nearest (not yet dropped) table name.
- munk-a 2 months ago
  
  Or, for extra fun, it auto completes to DROP TRIGGER and just drops a single random trigger from your database. It'll help counter automation fears by ensuring your DBAs get to have a wonderful weekend on payroll where, very much in the easter spirit, they can hunt through the DB looking for the one thing that should be there but isn't!
  
  falcor84 2 months ago
  
  Wow, that's perhaps the most nefarious version of chaos engineering I had ever heard of. Kudos!
- clgeoio 2 months ago
  
  LLM powered queries that run in Agent mode so it can answer questions of your data before you know what to ask.
  
  XCSme 2 months ago
  
  That's actually not a bad idea, to have LLM autocomplete when you write queries, especially if you first add a comment at the top saying what you want to achieve:
  // Select all orders for users registered in last year, and compute average earnings per user
  SELECT ...
  
  ako 2 months ago
  
  That already works in windsurf, I’ve created unit tests in go, where I just wrote a short comment in the unit test what data to query and windsurf would autocomplete with the full sql.
  
  XCSme 2 months ago
  
  I mean, all LLMs do this already, but I never saw LLM autocomplete in a db tool (e.g. phpmyamdin, MongoDB Compass, etc).
  
  Covenant0028 2 months ago
  
  Vibe SQLing is where it's at
- krferriter 2 months ago
  
  DELETED 0 rows. Did you mean `where 1=1`? (click accept to re-run with new where clause)
matsonj 2 months ago

for clarity: Instant SQL won't automatically run queries that write or delete data or metadata. It only runs queries that read data.
- d0100 2 months ago
  
  And this is a happy coincidence that json_serialize_sql doesn't work with anything but select queries
worldsayshi 2 months ago

Can't it just run inside a transaction that isn't committed?
crmi 2 months ago

Young bobby tables at it again
ryguyrg 2 months ago

ROFL
- codetrotter 2 months ago
  
  ROFL FROM jokes WHERE thats_a_new_one;

jpambrun 2 months ago

I really like duckdb's notebooks for exploration and this feature makes them even more awesome, but the fact that I can't share, export or commit them into a git repo feels extremely limiting. It's neat-ish that it dodfoods and store them in a duckdb database. It even seems to stores historical versions, but I can't really do anything with it..

akshayka 2 months ago

You can try marimo notebooks, which are stored as pure Python and support SQL cells through duckdb. (I’m one of its authors.)
https://github.com/marimo-team/marimo
hamilton 2 months ago

Definitely something we want too! (I'm the author / lead for the UI)
RyanHamilton 2 months ago

Local markdown file based sql notebooks: https://www.timestored.com/sqlnotebook Disclaimer: I'm the author

ayhanfuat 2 months ago

CTE inspection is amazing. I spend too much time doing that manually.

hamilton 2 months ago

Me too (author of the post here). In fact, I was watching a seasoned data engineer at MotherDuck show me how they would attempt to debug a regex in a CTE. As a longtime SQL user, I felt the pain immediately; haven't we all been there before? Instant SQL followed from that.
RobinL 2 months ago

Agree, definitely amazing feature. In the Python API you can get somewhere close with this kind of thing:
input_data = duckdb.sql("SELECT * FROM read_parquet('...')")
step_1 = duckdb.sql("SELECT ... FROM input_data JOIN ...")
step_2 = duckdb.sql("SELECT ... FROM step_1")
final = duckdb.sql("SELECT ... FROM step_2;")
ako 2 months ago

In datagrip you can select part of a query and execute it to see its result.

biophysboy 2 months ago

If there are any DuckDB engineers here, I just want you to know that your tool has been incredible for my work in bioinformatics/biotech. It has the flexibility/simplicity that biological data (messy, changing constantly) requires.

cess11 2 months ago

At times I've done crude implementations of similar functionality, by basically just taking the current string on change and concatenating with " LIMIT 20" before passing it to the database API and then rerendering a table if the result is an associative array rather than an error message.

I think this would be better if it was combined with information about valid words in the cursor position, which would likely be a bit more involved but achievable through querying the schema and settling on a subset of SQL. It would help people that aren't already fluent in SQL to extract the data they want. Perhaps allow them to click the suggestions to add them to the query.

I've done partial implementations of this too, that query the schema for table or column names. It's very cheap even on large, complex schemas, so it's fine to just throw every change at the database and check what drops out. In practice I didn't get much out of either beyond the fun of hacking up an ephemeral tool, or I would probably have built some small product around it.

jwilber 2 months ago

Amazing work. Motherduck and the duckdb ecosystem have done a great job of gathering talented engineers with great taste. Craftsmanship may be the word I’m looking for - I always look forward to their releases.

I spent the first two quarters of 2024 working on observability for a build-the-plane-as-you-fly-it style project. I can’t express how useful the cte preview would have been for debugging.

arrty88 2 months ago

it looks cool, but i wish i could just see the entire table that im about to query. i always start my queries with a quick `select * from table limit 10;` then go about adding the columns and joins

matsonj 2 months ago

`from my_table`
will do the same!
We are working on how to make it easy to switch from instant sql -> run query -> instant sql

Vaslo 2 months ago

I moved from pandas and SQLite to polars and DuckDB. Such an improvement in these new tools.

arsalanb 2 months ago

Check out livedocs.com, we built a notebook around Polars and DuckDB (disclaimer: I'm the founder)

mritchie712 2 months ago

a fun function in duckdb (which I think they're using here) is `json_serialize_sql`. It returns a JSON AST of the SQL

    SELECT json_serialize_sql('SELECT 2');



    [
        {
            "json_serialize_sql('SELECT 2')": {
                "error": false,
                "statements": [
                    {
                        "node": {
                            "type": "SELECT_NODE",
                            "modifiers": [],
                            "cte_map": {
                                "map": []
                            },
                            "select_list": [
                                {
                                    "class": "CONSTANT",
                                    "type": "VALUE_CONSTANT",
                                    "alias": "",
                                    "query_location": 7,
                                    "value": {
                                        "type": {
                                            "id": "INTEGER",
                                            "type_info": null
                                        },
                                        "is_null": false,
                                        "value": 2
                                    }
                                }
                            ],
                            "from_table": {
                                "type": "EMPTY",
                                "alias": "",
                                "sample": null,
                                "query_location": 18446744073709551615
                            },
                            "where_clause": null,
                            "group_expressions": [],
                            "group_sets": [],
                            "aggregate_handling": "STANDARD_HANDLING",
                            "having": null,
                            "sample": null,
                            "qualify": null
                        },
                        "named_param_map": []
                    }
                ]
            }
        }
    ]

hamilton 2 months ago

Indeed, we are! We worked with DuckDB Labs to add the query_location information, which we're also enriching with the tokenizer to draw a path through the AST to the cursor location. I've been wanting to do this since forever, and now that we have it, there's actually a long tail of inspection / debugging / enrichment features we can add to our SQL editor.
krferriter 2 months ago

This is a very cool feature. I don't know how useful it is or how I'd use it right now but I think I am going to get into some benchmarking and performance tweaking soon and this could be handy.
RobinL 2 months ago

Can you go the other way? (E.g. edit the above and turn it back into SQL string)
I've used sqlglot to do this in the past, but doing it natively would be nice
- hamilton 2 months ago
  
  it can, but it doesn't format. You can even run the ast!

owlstuffing 2 months ago

Cool tool, even cooler when paired with the manifold project for SQL[1], which has fantastic support for type-safe, native DuckDB syntax.

1. https://github.com/manifold-systems/manifold/blob/master/man...

Jgrubb 2 months ago

There's something about this commercial company embracing this OSS project that I love that I very much don't love.

wodenokoto 2 months ago

Will this be available in duckdb -ui ?

Is mother duck editor features available on-prem? My understanding is that mother duck is a data warehouse sass.

1egg0myegg0 2 months ago

It is already available in the local DuckDB UI! Let us know what you think!
-Customer software engineer at MotherDuck
- ukuina 2 months ago
  
  Does local DuckDB UI work without an internet connection?
  
  jephly 2 months ago
  
  (DuckDB UI developer here)
  It doesn't currently - the UI assets are loaded at runtime - but we do have an offline mode planned. See https://github.com/duckdb/duckdb-ui/issues/62.
  
  wodenokoto 2 months ago
  
  I’m pretty sure it doesn’t. My understanding is it gets downloaded at startup and then runs offline.
  Kinda like regex101, draw.io or excalidraw.

hk1337 2 months ago

First time seeing the from at the top of the query and I am not sure how I feel about it. It seems useful but I am so used to select...from.

I'm assuming it's more of a user preference like commas in front of the field instead of after field?

ltbarcly3 2 months ago
Yes it comes from a desire to impose intuition from other contexts onto something instead of building intuition with that thing.
SQL is a declarative language. The ordering of the statements was carefully thought through.
I will say it's harmless though, the clauses don't have any dependency in terms of meaning so it's fine to just allow them to be reordered in terms of the meaning of the query, but that's true of lots and lots of things in programming and just having a convention is usually better than allowing anything.
For example, you could totally allow this to be legal:
```
  def
      for x in whatever:
          print(x)
  print_whatever(whatever):
```
There's nothing ambiguous about it, but why? Like if you are used to seeing it one way it just makes it more confusing to read, and if you aren't used to seeing it the normal way you should at least somewhat master something before you try to improve it through cosmetic tweaks.
I think you see this all the time, people try to impose their own comfort onto things for no actual improvement.
- whstl 2 months ago
  
  No, it comes from wanting to make autocompletion easier and to make variable scoping/method ordering make sense within LINQ. It is an actual improvement in this regard.
  LINQ popularized it and others followed. It does what it says.
  Btw: saying that "people try to impose their own comfort" is uncalled for.
  
  ltbarcly3 2 months ago
  
  In that case you are just objectively incorrect, you can build a far, far more efficient autocomplete in the standard query order. I will guess something like half as many keystrokes to type the same select and from clauses. You are imagining a very niave autocomplete that can only guess columns after it knows the tables, but in reality you can guess most of the columns, including the first one, the tables, and the aliases. Names in dbs are incredibly sparse, and duplicate names don't make autocomplete less effective.
  If you are right about why they did it its even dumber than my reason, they are changing a language grammar to let them make a much worse solution to the same problem.
  
  whstl 2 months ago
  
  An autocomplete that shows only the column names of the desired table BEFORE the from clause is typed by the user would require a time machine.
  Sure you can do something that is close enough, but the LINQ authors were looking for precision in the autocompletion and for the LINQ query to have the same ordering as expression syntax.
  The goals of this syntax are very precise and people seem to like it. Once again: calling it dumb is uncalled for.
  
  ltbarcly3 2 months ago
  
  So you want it to work this way, regardless of how well autocomplete works? Sounds like its about your personal comfort to make it work like another system you are more familiar with, which is exactly what I suggested.
  It doesn't require a time machine, just a basic understanding of statistics or probability.
  
  whstl 2 months ago
  
  I’m comfortable with pretty much anything. It took me like 2 mins to get used to this syntax in LINQ.
  On the other hand, statistical autocomplete is not as good as having a precise autocomplete that does’t require jumping around lines.
  My point here is that different people enjoy different things. There is no need to shit on other people’s accomplishments or preferences.
  
  ltbarcly3 2 months ago
  
  Name one thing thst uses autocomplete like i am describing.
  
  pests 2 months ago
  
  I don’t want to type any column names. When you start with FROM the only autocomplete suggestions available are the columns from the specific table, not the entire database. How many columns do I need to type before you can single down a single table? What if you have multiple tables with the same column names?
  
  ltbarcly3 2 months ago
  
  This is extremely easy to check. It depends on the schema.
  If your tables have very heterogeneous column names, like 1 column will identify any table on average. There will be some duplicates but the median columns will be one or two, but generally you can even complete those after a few characters.
  If your database has very homogenous column names you don't need to identify a single table for autocomplete to be very precise, unless there is no correlation between column name co occurence within tables. However if there is no correlation you are back to very low number of columns to identify the table.
hamilton 2 months ago

You can use any variation of DuckDB valid syntax that you want! I prefer to put from first just because I think it's better, but Instant SQL works with traditional select __ from __ queries.

crazygringo 2 months ago

Edit: never mind, thanks for the replies! I had missed the part where it showed visualizing subqueries, which is what I wanted but didn't think it did. This looks very helpful indeed!

Noumenon72 2 months ago

The article says it does subqueries:
> Getting the AST is a big step forward, but we still need a way to take your cursor position in the editor and map it to a path through this AST. Otherwise, we can’t know which part of the query you're interested in previewing. So we built some simple tools that pair DuckDB’s parser with its tokenizer to enrich the parse tree, which we then use to pinpoint the start and end of all nodes, clauses, and select statements. This cursor-to-AST mapping enables us to show you a preview of exactly the SELECT statement you're working on, no matter where it appears in a complex query.
geysersam 2 months ago

> What would be helpful would be to be able to visualize intermediate results -- if my cursor is inside of a subquery, show me the results of that subquery.
But that's exactly what they show in the blog post??
hamilton 2 months ago

You should read the post! This is what the feature does.

r3tr0 2 months ago

We are working on something similar over at yeet.

Except for system performance data.

You can checkout our sandbox at

https://yeet.cx/play

potatohead24 2 months ago

It's neat but the CTE selection bit errors out more often than not & erroneously selects more than the current CTE

hamilton 2 months ago

Can you say more? Where does it error out? Sounds like a bug; if you could post an example query, I bet we can fix that.

almosthere 2 months ago

Wow, I used DuckDB in my last job, and have to say it was impressive for its speed. Now it's more useful than ever.

acdanger 2 months ago

Does DuckDB UI support spatial visualizations ? Would be great to be able to use the UI with the spatial extensions.

1egg0myegg0 2 months ago

We support spatial calculations in the UI, but not spatial visualizations just yet. Thanks for the feedback!
- acdanger 2 months ago
  
  Just emphasizing that the ability to display a map with geo data on it would be a killer feature for me and for many others I work with! Hope it lands on the roadmap.

xdkyx 2 months ago

Does it work as fast with more complicated queries with joins/havings and large tables?

porridgeraisin 2 months ago

This is just so good. I wish redash had this...

gitroom 2 months ago

honestly this kind of instant feedback wouldve saved me tons of headaches in the past - you think all these layers of tooling are making sql beginners pick it up faster or just overwhelming them?

makotech221 2 months ago

Delete From dbo.users w...

(129304 rows affected)

CurtHagenlocher 2 months ago

The blog specifically says that they're getting the SQL AST so presumably they would not execute something like a DELETE.
- hamilton 2 months ago
  
  Correct. We only enable fast previews for SELECT statements, which is the actual hard problem. This said, at some point we're likely to also add support for previewing a CTAS before you actually run it.
  
  buremba 2 months ago
  
  I remember your demos of visualizing the CTEs of a huge query in the editor. I'm looking forward to trying it!
- makotech221 2 months ago
  
  Cool. Now, there's this thing called a joke...

ltbarcly3 2 months ago

This is such a bizarre feature.

thenaturalist 2 months ago

On first glance possibly, on second glance not at all.
First, repeat data analyst queries are a usage driver in SQL DBs. Think iterating the code and executing again.
Another huge factor in the same vein is running dev pipelines with limited data to validate a change works when modelling complex data.
This is currently a FE feature, but underneath lies effective caching.
The underlying tech is driving down usage cost which is a big thing for data practitioners.
hamilton 2 months ago

What about it is bizarre?
- pixl97 2 months ago
  
  It's probably different for duckdb, but from something like Microsoft SQL tossing off these random queries at a database of any size could have some weird performance impacts. For example statistics on columns you don't want them on, unindexed queries with slow performance, temp tables being dumped out to disk, etc.
  
  hamilton 2 months ago
  
  I agree; one thing that is neat about Instant SQL is for many reasons, you can't do this with in any other DBMS. You really need DuckDB's specific architecture and ergonomics.

sannysanoff 2 months ago

Please finally add q language with proper integration to your tables so that our precious q-SQL is available there. Stop reinventing the wheel, let's at least catch up to the previous generation (in terms of convenience). Make the final step.

cess11 2 months ago

Maybe they're busy so it might be faster if you do it instead.
- sannysanoff 2 months ago
  
  My intention is good. I advise right way. It's not an offense. (I have my job to do. They may consider doing what I say, it will be then their job to do)
datadrivenangel 2 months ago

What is q-SQL?
- indeyets 2 months ago
  
  https://code.kx.com/q/basics/qsql/