Understand how your data flows

The hidden gold of metadata

Different technologies, different languages

A common abstraction: graph theory

  • a database? It’s a node
  • a table? It’s a node + an edge to the database it belongs to
  • a column? It’s a node + an edge to the table it belongs to
  • a user? It’s a node + an edge to the database it belongs to + an edge to every node it can query/view/edit
  • an Apache Kafka source connector? It’s a node + an edge for every source of data + an edge for every destination topic

Sounds difficult, how to do it? Welcome to the metadata parser

Get the metadata-parser to work

  • Python 3.7+
  • a valid Aiven account
  • the name of the project that you want to parse
  1. Clone the metadata parser repository and navigate in the metadata-parser folder.
git clone https://github.com/aiven/metadata-parser.git cd metadata-parser
pip install -r requirements.txt
python main.py
  • A file nx.html containing the complete interactive graph.
  • A file graph_data.dot containing the information in DOT format.
  • A file graph_data.gml containing the information in GML format.
python app.py

Wow, can I use it?

  • Data lineage — Where is this column coming from?
  • GDPR assessments — Who can see this piece of data? How is my data manipulated?
  • Security audits — What user can edit this dataset?
  • Impact assessments — What happens if I remove X?

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store