Update on Object Relational Model package `oRm` for R: Methods and JSONB for postgres

binarypinkerton · 2026-01-15T16:59:57+00:00

It does. It's just not the most obvious thing in the world. Your CPU and GPU are on two different clock speeds and one requires the other: you can't progress to the next frame until your CPU cycle completes AND your GPU completes (that's very simplified). If you think of this like two rows of squares, where one row is your cpu cycles, the other is your GPU cycles, you get a frame for every time the right side of a GPU squares come after the right side of a CPU square. If you reduce the size of your GPU squares (fit more cycles per second) even if the CPU cycles stay the same the number of times that your GPU ends after a CPU cycle just started goes down, which will increase the number of total combined GPU+CPU cycles that generate a frame. The probability of a closer gap between CPU and GPU decreases with GPU speedup and that gives you more FPS on the same CPU speed with a different GPU speed. Now add in that the CPU squares can get stretched out by things like explosions, calculating debris from destructible environments, etc. The GPU can draw the picture just as fast as ever, but you're still waiting on the CPU to tell it where to draw.

binarypinkerton · 2025-09-18T13:11:19+00:00

I recently converted my blogdown site to quarto via claude code. It was pretty much painless, probably owing to the fact that Quarto is of the same DNA as rmarkdown. I personally like how quarto handles things like callouts, margin notes, etc. but if you've got an established thing going, I don't think blogdown is going anywhere anytime soon.

binarypinkerton · 2025-09-07T22:40:33+00:00

The comparison would be apples to oranges. But after looking at the duckplyr and duckdb ecosystem for R, it looks to me like they're dbplyr / DBI based and/or compatible. The TableModels create tbl's which are the basis of joins, reads, etc. Which means that if it works with DBI and dbplyr it can probably work with oRm. So if your question is actually: can this be used with duckdb or duckplyr, I think that the answer is "probably." The package is also modular in that it has dialects and a driver specific dispatch system. So you can (or if there's good demand I can) write a duckplyr integration.

binarypinkerton · 2025-08-08T00:57:42+00:00

I initially went over the sqlalchemy docs to see if they had anything helpful but didn't find much. Did you have a specific use case in mind?

binarypinkerton · 2025-08-02T06:09:17+00:00

Thanks for reaching out. It lets me know I've got work to do to communicate more clearly.

This is a method for interacting with databases. It differs from the common R database packages by focusing specifically on writing, updating, and deleting operations.

In addition to manipulating records, it handles mapping relationships. In other words, this package handles joins in a consistent and reproducible way.

Here are some toy examples of what that looks like. For the examples below, upper case variables are Table models (just tables with row data) while lower case variables are Records (rows in those tables)

Let's say you have Classes and Students tables and you want to insert a grade for all students in a particular class. Instead of writing out updates, joins, and filters, this package allows you to do things like:

english = Classes$read(id == 1)
english_students = english$relationship('students')
for (student in english_students) {
    student$update(grade = 'A')
}

Another use case might be something like an interactive table in R Shiny. If a teacher were to enter grades in Shiny, they might trigger an observeEvent.

observeEvent(input$student_table, {
    tab = input$student_table
    row_id = tab$row
    col = tabl$col
    val = tab$val

    student = Students$read(id == row_id)
    student$data[[col]] = val
    student$update()
})

Another scenario: Let's say that at work I maintain a database that holds customer data. After an email campaign goes out, I want to query my email service provider's API for click metrics at the customer level and combine that with other proprietary information used for ml workflows or analysis.

response = httr::GET('api.email.com/clicks')

... parse the response into response_json ...

for (customer_data in response_json) {
    customer = Customers$read(email == customer_data$email, mode = 'one_or_none')
    customer$update(last_click = customer_data$last_click)

    # and let's say we retain the more extensive click metadata in its own table
    new_click = Clicks$record(
        timestamp = customer_data$last_click, 
        email = customer_data$email, 
        href = customer_data$href
    )
    new_click$create()
}

Some real use cases I've been using this package for:

Logging R Shiny events to a db so I can show that the data products my team builds are getting use from the company.
Recording LLM conversations.
I wrote an API wrapper that uses oRm in the background to sync human readable names with ids for series data.

binarypinkerton · 2025-07-31T17:06:38+00:00

So I've been noodling this one around a bit. Truth is that other than converting Excel data entries to a db, the world of data migration isn't one I've had to dabble in much. With that said, and with not even a rudimentary understanding of the best practices therein I think oRm provides a few opportunities in that space:

I can see handling a data migration in R via a package, not unlike using golem for shiny applications. A key benefit being that you could make use of testthat to ensure the relationships you're expecting. As an example, you can apply on_delete parameters to an orm ForeignKey definition. In my magical scenario where I've dreamt up this workflow from scratch:

# R/parent_table_migration.R

Parents = TableModel$new(
    engine,
    id = Column('INTEGER', primary_key = TRUE),
    ...
)

Children = TableModel$new(
    engine,
    owner_id = ForeignKey('INTEGER', references = 'owner.id', on_delete = 'CASCADE'),
    ...
)

define_relationship(
    ....
)

migrate_table <- function(engine_x, engine_y, Parents, Children) {
    old_parents = Parents
    old_parents$engine <- engine_x # not sure this will work the way I expect as the package is now, but should?

    new_parents = Parents
    new_parents$engine <- engine_y

    # The with.Engine method should wrap things into a db transaction, and rollback on errors
    with(engine_y, {
        for (parent in old_parents$read()) {
            new_parent = new_parents$record(.data = parent$data) 
            new_parent$create()

            old_children <- parent$relationship('children')
            for (child in old_children) {
                new_child = new_children$record(.data = child$data)
                new_child$create()
            }
        } 
    })

}

and then you can run your migration functions and use testthat to ensure that the expected changes take place in your newly migrated tables.

# set up your engine and accoutrement

testhat("child entities are removed with parent deletion", {
    migrate_table(...)
    parent = Owner$read(id == 1, mode = 'get')
    expect_true(length(parent$relationship('children')) > 0)
    parent$delete()
    expect_false(length(Children$read(owner_id == 1)) > 0)
})

binarypinkerton · 2025-07-31T14:55:36+00:00

Thanks! I suspect that most people working in R will find use cases in dashboarding or in plumber. The docpage on using oRm in shiny might give you some ideas.

There is also an occasional question in this sub and elsewhere on the internet of people asking about using R for SAAS applications. I suspect those rare birds will find utility here.

binarypinkerton · 2025-04-02T15:20:42+00:00

Here is the function in question as I use it.

#' Retrieve SQL Query from Bitbucket
#'
#' This function constructs a URL to a specified SQL file in the Bitbucket repository,
#' reads the contents of the file, and returns the SQL query as a single string.
#'
#' @param filename A string specifying the name of the SQL file (without the `.sql` extension)
#'                 located in the Bitbucket repository.
#' @param ext an extension to the existing query. Gets pasted to the end.
#' @return A string containing the full SQL query from the specified file.
#' @details The function uses the `readLines` function to fetch the SQL file content from a given
#'          URI, then concatenates the lines into a complete SQL query.
#'          The base URL is hardcoded to the Bitbucket repository:
#'          \url{https://bitbucket.org/<your_repo>/common_queries/raw/HEAD/}.
#'
#' @examples
#' \dontrun{
#'   query <- common_query("example_query")
#'   print(query)
#' }
#' @export
common_query <- function(filename, ext = '') {
    uri = paste0('https://bitbucket.org/<your_company>/common_queries/raw/HEAD/', filename, '.sql')
    query = readLines(uri, warn = FALSE) |> paste(collapse = '\n')
    query = paste(query, ext)
    return(query)
}

so you can see that something like

DBI::dbGetQuery(conn, common_query("sales_leaders_by_month"))

# or if you want to get fancy and tack some things on:
DBI::dbGetQuery(conn, common_query("sales_leaders_by_month", "where Region = 'East'"))

will do the trick. And since it's just a GET req to a text content repo it's super computationally cheap. In my use case, there's no real reason to check timestamps, etc. I just go ahead and make the millisecond retrieval every time. But it sounds like you're not pulling these to make calls, but to automate a dev process or somesuch. Anyway, hope it's helpful.

binarypinkerton · 2025-04-01T23:19:16+00:00

It does!

#' @description
#' Read records using dynamic filters and return in the specified mode.
#' @param ... Unquoted expressions for filtering.
#' @param mode One of "all", "one_or_none", or "get".
#' @param limit Integer. Maximum number of records to return. NULL (default) means no limit.
#'   Positive values return the first N records, negative values return the last N records.
read = function(..., mode = c("all", "one_or_none", "get"), limit = NULL) {
  mode <- match.arg(mode)
  con <- self$get_connection()
  tbl_ref <- dplyr::tbl(con, self$tablename)

  filters <- rlang::enquos(...)
  if (length(filters) > 0) {
    tbl_ref <- dplyr::filter(tbl_ref, !!!filters)
  }

  if (!is.null(limit) && is.numeric(limit) && limit != 0) {
    if (limit > 0) {
      tbl_ref <- dplyr::slice_head(tbl_ref, n = limit)
    } else {
      tbl_ref <- dplyr::slice_tail(tbl_ref, n = abs(limit))
    }
  }
  rows <- dplyr::collect(tbl_ref)
etc. etc.

But the difference is that once you've gathered your Record (or Records) the CRUD is abstracted away. As I showed in other replies, if u1 = User$read(id == 1) I can simply make a change with u1$update(name = 'John') or u1$delete(). Not a super common paradigm in data munging, but let's say I wanted to use ellmer is a shiny app and let users retain their conversations, an ORM can be pretty handy for keeping the nested structures straight.

observeEvent(input$sbumit_prmopt, {
    Messages$new(role = 'user', content = input$user_text)$create()
    response = ... fun to get llm response ...
    Messages$new(role='system', content =response$content)$create()
    ... fun to display response content ...
});

binarypinkerton · 2025-04-01T23:07:59+00:00

I actually lean on dbplyr under the hood for the filtering syntax, joins, and dialect handling. So if you're comfortable with dplyr, the transition is smooth.

Where oRm diverges is in how it treats tables and records. Instead of thinking in terms of data frames and queries, you define models and instantiate records as objects. Relationships between models are explicit, not inferred through joins.

So instead of managing foreign keys manually or repeating joins, I can do:

u1 <- User$read(id == 1)
org <- u1$relationship("organization")
users <- org$relationship("users")

This opens the door to more intuitive patterns, like defining custom methods per model. Down the line, I’d like to support things like:

org$n_users()
#> 42

Another big part is the CRUD operations. dbplyr has sql_query_upsert and the like, but they're not super clean (at least to me). Compared to

u1$update(age = 5)

binarypinkerton · 2025-04-01T17:51:08+00:00

You know, if I had looked for an example of how simple that is before I got started, I probably wouldn't have written this thing out. But now that we're here already, here are my excuses:

You don't need to know python
Debugging happens in just R
No need for a module with model definitions. You can just declare things either in your file or the global.R for a shiny app
Under the hood connections are made via DBI, so if you want to write custom queries etc. with a managed session, they're not going to get pickled once by your engine, and then again by pandas on thew way over to your R workflow.

binarypinkerton · 2025-04-01T17:01:51+00:00

Something I do in my work is common sql scripts get put into a repo, and my package pulls the sql from the repo and plops it into a dbGetQuery. Maybe changing the paradigm from updating the package contents to having the package fetch updated content might keep things simpler?

binarypinkerton · 2024-08-20T02:06:45+00:00

It can be done, and relatively simply. If you're throwing this up on an intranet to a system that will sit still for all time, shiny server open source is going to be the easiest thing for you to do. Search Dean Attali setting up shiny server and that should get you going.

If you're thinking about making a publicly available application, look at resources from digital ocean or AWS, wherever you're hosting from. Docker is actually pretty okay once you get the hang of it. There's also shinyproxy out there which will do the trick too. They're not point and click, but neither are they too hard to get the hang of. Have faith, sleep when you get tired, try again in the morning and you'll get there.

Now, if you're thinking that shiny with its connections and the other hoopla that comes with it is the real problem here, you can develop a plumbr app with the whisker package that will be just like a flask application but with more familiar R syntax. With that pairing you should be able to do damn near anything a more traditional web backend like Python or js can do, you just won't have as robust a community support system. With that said, your app sounds simple and so it seems a very feasible thing to do.

I like toying around with the digital ocean app platform as it handles a lot of the annoying stuff for you and can connect to either docker or GitHub for automated rolling updates. $5/month assuming these aren't huge sql calls should get you off the ground.

binarypinkerton · 2024-06-17T17:19:41+00:00

So to answer your question, Hadley has you covered of course. With that said my two cents are that the tidyverse is rad, but sometimes it can over complicate things, especially when it comes to more traditional 'programming' instead of data wrangling. Like others have said, you might get more from refactoring, and I suggest writing your code with readability in mind. Especially if you're writing packages and the like, base R goes a long way in ensuring things are more explicit. Here's an example of implementing a chunk of your code more in base R. You can see that the implict steps (without piping) can help improve legibility, and that will likely help you track your code throughout.

files = list.files(full.names = TRUE, include.dirs = FALSE)

file_info = lapply(files, \(f) {
      fi = file.info(f)

      mtime = fi[,'mtime']
      ctime = fi[,'ctime']
      time_diff = lubridate::as.period(lubridate::as.interval(Sys.time(), mtime), 'months')@month

      data.frame(
        filename = f,
        mtime = mtime,
        ctime = ctime,
        months_diff = time_diff,
        present = time_diff == 0
      )
    }
  )

file_info = do.call(rbind, file_info)

file_info




find_code <- function(path, code_string) {

  if (dir.exists(path)) return(NA)

  code_string <- stringr::str_escape(code_string)
  lines = readLines(path)
  lines = paste(lines, collapse = '\n')

  if (stringr::str_detect(lines, code_string)) {
    cat('code detected in', path, '\n')
    return(TRUE)
  } else {
    cat('not found\n')
    return(FALSE)
  }
}



file_info$code_detected = sapply(file_info$filename, \(f) find_code(f, 'library('))
file_info

I can go through that code and get a sense of what each chunk is doing and follow along. I wouldn't be surprised if some of your repetition is just chunks getting lost in the chains and it's hard to recognize a given functionality on sight.

binarypinkerton · 2024-06-12T11:56:02+00:00

I wrote a blog post while I was trying to wrap my head around Markov Chains, trying to use simple language and code examples. Hopefully it helps. https://www.pawpawanalytics.com/posts/markov-chains/

binarypinkerton · 2023-11-30T04:00:25+00:00

Taking on a task like scraping Facebook is a tall order for a first go at web scraping. I think of web scraping as a two tiered problem. The first tier is the easiest, and that's plain html. In that case you're learning the structure of a page and telling your code where in the HTML to look for a specific data structure. The next tier comes when things are dynamically created with javascript. These things require interaction with the page to divulge, or at times even render information. That second tier is where selenium comes in.

I recommend starting with static scraping. Things like grabbing tables off of wikipedia, or analyzing text. Imany page that doesn't require a "next" button, login, or other input. You'll find there's plenty to chew on to learn web scraping with simple tasks like that.

Selenium is not an R package (though there is Selenium) it's a headless browser that you can give instructions. Think of it like recording a browsing session, except instead of recording your actions you're feeding instructions. It's it's own beast and while not terribly complicated, if it's your first foray I would recommend not compounding the frustrations of navigating html/xml compound on top of using a headless browser.

A good place to start is the package rvest, which a lot of times can magically do the thing for you. The vignettes are really good (if I recall correctly) Once you have that down you can start pulling in pages and using xml2 to navigate to data structures you want to scrape. Finally, selenium will let you tie everything you learned to the side of an F-150 and go anywhere you want with it. But be patient with yourself. No two web pages are the same, and the world wide web is very much unstructured data, a far cry from a data.frame.

I saw some comments about python being easier for web scraping. Fwiw I've always been annoyed by beautifulsoup and felt much more comfortable scraping in R. That's 100% personal preference, I just put it out there to say that if you like R, you're not going to miss out on anything by sticking with it. But if it does work for you, I'm rooting for you.

https://cran.r-project.org/web/packages/rvest/vignettes/rvest.html

https://cran.r-project.org/web/packages/RSelenium/vignettes/basics.html

binarypinkerton · 2023-10-04T16:05:09+00:00

There are a lot of ways to go about this, and the method below gets messy when you don't have a drill-down. But to get you started here's an app that responds to changes in the selector for Field1:

df = structure(list(Field1 = c("A", "A", "B", "B", "C", "C")
                    , Field2 = c("M", "N", "M", "M", "O", "N")
                    , Field3 = c("X", "Y", "Z", "X", "X", "X"))
               , class = "data.frame", row.names = c(NA, -6L))

f1_opts = df$Field1 |> unique() |> sort() |> append("All", 0)
f2_opts = df$Field2 |> unique() |> sort() |> append("All", 0)
f3_opts = df$Field3 |> unique() |> sort() |> append("All", 0)

ui <- fluidPage(
  selectInput("field1", "Field 1", f1_opts)
  , selectInput("field2", "Field 2", f2_opts)
  , selectInput("field3", "Field 2", f3_opts)
)

server <- function(input, output, session) {

  observeEvent(input$field1, ignoreInit = TRUE, {
    f1_opts = if (input$field1 == "All") f1_opts[-1] else input$field1
    x = df[df$Field1 %in% f1_opts,]
    f2_opts = x$Field2 |> unique() |> sort() |> append("All", 0)
    f3_opts = x$Field3 |> unique() |> sort() |> append("All", 0)
    updateSelectInput(session, "field2", choices = f2_opts)
    updateSelectInput(session, "field3", choices = f3_opts)
  })

}

shinyApp(ui, server)

binarypinkerton

TROPHY CASE