Metrics and telemetry are fundamental in any engineering activity to evaluate, learn and improve. They are also needed to consolidate a culture in which opinion and experience are continuously challenged, in which experimentation and evidence becomes the norm and not the exception, in which transparency rules so co-workers are empowered, in which data analysis leads to conversations so evaluations are shared.
Open Source projects has been traditionally reluctant to promote telemetry, based on privacy concerns. Some factor that comes to my mind are helping to change this perception:
- As FLOSS projects grow and mature the need for information grows.
- It is easier now to process big amounts of data while keeping high levels of anonymity.
- The proliferation of company driven and consortium driven FLOSS projects, specially those related with SaaS/cloud technologies and products, showing how useful telemetry is. In general, corporations are less concerned about personal data privacy than many Open Source projects though.
- The DevOps movement is spreading like a pandemic and telemetry is an essential action for practitioners.
So the last few years data analytics is becoming more popular among Open Source projects.
Finding the right metrics is frequently tough. Most of the times projects, teams or departments get drowned in data and graphs before they realize what actually matters, what does it have real business value. When you find the right metrics, somehow it means that the right questions are being asked which I find the hardest part. To identify those questions, I recommend organizations or projects to invest in exploring and learning before moving into automating the data collection, processing, plotting and irradiate the results to be analysed.
So when BuildStream is getting into its third year of life, I thought it could be interesting to invest some effort in digging into some numbers, trying to find a couple of good questions that provide value to the project and the stakeholders involved.
The outcome of this exploratory effort was published and spread across the BuildStream / BuildGrid community. The steps taken to publish the report has been:
- Select a question to drive this exploratory effort, in my case: are we growing?
- Select data sources: in my case, information from the ticketing system and the git repositories.
- Collect the data: in this case, the data sets from the BuildStream ticketing system were exported from gitlab.com and the data sets from git obtained through a script developed by Valentin David.
- Clean the data set (data integrity, duplications, etc.) in this case the data was imported into GSheets and worked there.
- Data processing: the data was processed and metrics were defined using GSheet since the calculations in this phase were simple enough and the amount of data and processing power did not represent a challenge for the tool.
- Plot the data: since the graphs were also simple enough, GSheet was also used for this purpose.
- Initial analysis: the goal here was to identify trends, singularities, exceptions, etc and point them to the BuildStream community looking for debate and answers.
- Report: provided in .pdf and .odt, it has been publishing in the BuildStream Group in gitlab.com and sent to the community mailing list. The report include several recommendations.
The data set could lead us to a deeper analysis but:
- It would have also take me more time.
- I wanted to involve the contributors and stakeholders early in the analysis phase.
- Some metrics which collection, processing and plotting can be automated has been identified already so to me it is better to consolidate them to bring value to the project on regular basis than to keep exploring.
I understand that my approach is arguable but it has worked for me in the past. The debate of just half way cooked analysis increases the buy-in in the same way that developers love to put their hands in half-broken tools. Feel free to suggest a better approach that I can try in the coming future. I would appreciate it.
Link to the report on gitlab.com. Download it.
I am looking forward to have a fruitful debate about the report within the BuildStream community and beyond. From there, my recommendation is to look for an external provider (it is all about providing value as fast as possible) that, working with Open Source tools, can consolidate what we’ve learnt from this process and can help us to find more and better questions… and hopefully answers.
What is BuildStream
I have been putting effort on BuildStream since May 2018. Check the project out.
One thought on “BuildStream metrics: exploration”