Draw Data: Creating Synthetic Datasets with Ease

Bryan Paget
2 min readNov 29, 2023

Ever wished you could effortlessly generate a dataset by visually sketching points on a Cartesian plane? Meet Draw Data, a handy Python app designed for Jupyter notebooks. This tool allows you to craft toy or synthetic datasets by simply drawing your ideas directly onto the chart. It proves particularly valuable when teaching machine learning algorithms.

Installation:

%%capture
! pip install -U drawdata
# If on Linux you may need to install xsel and xclip with:
# sudo apt-get install xsel xclip -y

Getting Started:

To draw a dataset, execute the following cell. You can sketch up to four classes of points. Afterward, click “Copy CSV,” and your data points, presented as x, y, z comma-separated values, will be copied to the clipboard. To import the data into a Pandas DataFrame, use the following code:

from drawdata import draw_scatter

draw_scatter()

Viewing the Data Table:

Once you’ve completed your drawing, copy the data to the clipboard. The next step involves using Pandas to read the clipboard and populate a DataFrame. Here’s a glimpse of the initial entries:

import pandas as pd

# Reading the clipboard into a DataFrame
df = pd.read_clipboard(sep=",")
df

Plotting the Drawn Data:

Visualizing the drawn points becomes a breeze with Plotly, which provides an interactive chart. The following code snippet accomplishes this:

import plotly.express as px
import plotly

plotly.offline.init_notebook_mode(connected=True)

# Creating an interactive scatter plot
fig = px.scatter(df, x='x', y='y', color='z')
fig.update_layout(
autosize=False,
width=800,
height=800,
)
fig.show()

This comprehensive guide empowers you to seamlessly draw, analyze, and visualize your synthetic dataset, making the process of teaching machine learning concepts more intuitive and engaging.

--

--