Generative model for text mock-data
pip install textgen :rocket:
A generative model is basically a overfitted model for trying to describe the underlying data and being able to generate predictions based on it, by providing either a encoding and a decoder or a distribution to sample from.
This can be used for generating mock-data for any type of text-based entity or column (bare in mind that it should not be cruicial to your organization ofc). Having a set of known text-based features that gets presented in demos or at any upstart of a project can be generated by a generative model on a learned distribution that serves as the dummy data creator.
As with every python implementation, there is usually always a python package for that, as with minimaxir/textgen.
However, it took some time to get everything setup so I prepared for myself some handy little snippets and docker file for getting everything setup, as had to get randomly generated data for a multitude of various sources and found that I would provide a simple snippet to interact with this wonderful package.
If you have not used keras/deep learning but want to get it setup and use it. In the repo there is documentation on how to get started eleijonmarck/data-generator
- Get the data to
sample-datasets, with a
column_nameand the text data you want to generate.
- Train the generator using the