Show HN: Generate coherent, synthetic data at scale

github.com

2 points by darshanime 7 hours ago

Internally at our org, we have an ecosystem of over 200 microservices, implementing various parts of the business logic. To test any changes, we provide developers with on-demand sandboxed environments. One issue we had to solve for that was the creation of synthetic data across services, which respected the business rules and was coherent.

Today, we are happy to introduce datagen, a tool we developed internally to solve this problem. It generates coherent, synthetic data with the ability to model complex relationships. It is a new DSL (domain specific language) using which the user specifies the shape of the entity they wish to generate, and generator functions describing the logic for generating each field. The entity can be a table in a relational dbms, or a json document in a document store, or a csv file to upload on S3 etc.

The user writes models in .dg files that are transpiled to golang code, which can then be used to generate coherent, synthetic data.

Here is a simple example:

  // users.dg
  model users {
    fields {
      name() string
      age() int
    }
    gens {
      func name() {
        return "Arthur Dent" // hardcoded value
      }
      func age() {
        return IntBetween(18, 65)
      }
    }
  }

Checkout the website for more information: https://ds-horizon.github.io/datagen/

edit: demo video - https://www.youtube.com/watch?v=ly0DfzTup28