Big data engineering has been ranking high among emerging jobs on LinkedIn, surpassing data scientists in many ways. If you want to make a dent in real-world projects and accelerate your career, make sure to augment your data engineering skills.
Interviewee: Dave Bunten, Two-Times Omdena Collaborator & Technical Mentor.
Can you describe your journey into data engineering?
Before my first data-specific job, I had a technical background in systems engineering (Linux, Windows, etc.) and application administration. Work in those areas often involved creating code or artifacts which could help complete processes more efficiently. A data-orientated pattern emerged in my work and my personal side projects – I noticed I was always excited by data movement, transformation, and applying those things to scale positive impacts.
Recognizing there were gaps in my understanding, I studied Data Science and Big Data topics through online courses (Coursera) and began attending related meetups.
Doing these things helped me discover areas within my job that could benefit from data engineering. I applied my skills towards creating automated dashboards that were otherwise unavailable. This eventually enabled me to transition into a data-orientated role in the same organization where I’ve continued to learn with guidance from a great team. Around the same time, I heard about Omdena and was very excited to participate in projects that had a global impact. Those initiatives have allowed me to continue this journey with data science experts around the world.
What is the role of data engineering in AI projects?
Data engineering is about reproducing understandable data science results at scale for AI projects. Engineering, whether it’s for roadways, plumbing, or data, is often an application of many scientific findings in non-experimental scenarios. Experiments are conducted to help us find answers to questions with the scientific method. When we need to apply those answers at scale we use engineering (which in-and-of-itself may involve further experimentation). Think of it this way; driving a car at 50 MPH or 80 KM/H could have drastically different results if the road is considered “experimental” vs “production-ready”.
In this way, data engineering helps implement data science results into real-world deployments. Data science provides a framework of conceptual or mathematical truths that data engineering then leverages at scale for these real-world deployments. Data engineering often will use tools that are purpose-built for scale and resilience. This isn’t to say the fields are mutually exclusive of one another, however, and in fact, they usually are best thought of as involving one another.
How can data engineers, data scientists, and ML engineers work more effectively in a team?
Work in data-orientated teams often benefits from good communication. Data is an abstraction – we can’t always “see” or “hold” it in the real world. This abstraction creates many opportunities for communication gaps that can cause issues in data teams. For any team, I’d recommend conversing regularly about what you’re doing and why to help stay on track. Team goal-setting and assigning those goals to specific members of the team can help accomplish your work without overlaps. Be sure to set reasonable or sometimes “stretch” deadlines for those goals to avoid the implications of Parkinson’s Law. Provide the smallest comfortable unit of time (for ex. 15 minutes) for goals and meetings, give yourselves permission for more time as needed.
In my own experience, each project and team have had unique components. I’d recommend being flexible and embracing the fact that each role has overlaps with the others. For example, a data scientist may need to engage data engineering principles to demonstrate that a value proposition just isn’t feasible. ML engineers may need to employ data science skills to reach an answer. Avoid entrenchment with roles and responsibilities by leaning in on results. Make sure your results are visible and reproducible by others (be that code in a repository, documents in a shared location, etc). Ask yourselves: What worked? What didn’t? How do you know what worked and what didn’t (measures, reactions, etc)?
More practically, it’s ideal to find technologies and workflow that make sense for those on the team. Technology in the data sector changes rapidly but there are often common themes for how different work can fit together. Consider the following, replacing the specific examples as needed (they’ll likely change based on the team and with time). Each part helps inform the other and I feel are best suited as continuous improvement patterns (instead of linear). Each part also will have a natural fit with each team member based on their background and the problem-space. Communicate with one another to find out where you stand, what’s needed, and how you can deliver results.
- Problem Illumination: Conversations, quick diagrams (PlantUML, BPMN, etc), shared docs (Google Docs, O365, etc), Jupyter Notebooks.
- Data Leverage: Screen shared demos, Jupyter Notebooks, code (language of choice), trained and packaged models
- Orchestration and Pipelining: Screen shared demos, infrastructure as code (Ansible, Terrform, etc), workflow (Prefect, Airflow, etc)
For our fellow data engineering enthusiasts, what resources and additional tips can give to build data engineering skills?
A few things which have helped me towards data engineering and many other areas:
- Pomodoro Technique: Make the most of your time by staying focused, goal-orientated, and limiting the time you provide to what you need to get done.
- Enterprise Architecture: Data engineering benefits greatly from sound architectural considerations. Consider the impacts and importance of your work in the context of other parts of the project or business (technical or otherwise).
- Lifelong or Continuous Learning: Commit yourself to lifelong or continuous learning and remain humble about your understanding of any area. Stay hungry for knowledge and seek ways to implement or share it!
How did the Omdena experience help you grow your skills? How can someone make the best out of this experience?
My experiences with Omdena have helped grow my data science, engineering, and collaboration skills. The projects provided a space where I was able to contextualize my experiences with others around the world towards a common objective. This background created a humbling and productive environment by which I could better understand my own gaps (and work towards addressing them). It’s humbling; everyone brings to the projects their own unique blend of skills, tools, and personality which builds a synergy you really can see and feel.
It’s very empowering to feel the lift of working towards resolving UN sustainability goals in very real ways. Seeing these goals one can recognize there’s so much in the world that needs fixing. Omdena projects empower data craftspeople to achieve progress towards these goals in very real ways. This has been a beacon of applied practical good for me amidst the backdrop of 2020’s challenges.
Others in the Omdena can make the best out of their experience through the following:
- Ride the Wave: Attend meetings and embrace the speed of the projects. Things move fast; find your rhythm and balance with the flow of things by being present and communicating.
- Show Your Work: Lean towards showing your work with Git commits or Google Drive additions. Showing others what you’ve made is often more impactful than telling them. Ultimately, objective results are what will enable the group to reach a solution for the project.
- Agile and Git: It’s helpful to understand the basics of Agile (methodology) and Git (source control) before or during an early phase of the project. Take time to read about these and practice them if you can.
Can you share a point in your career where things got a bit difficult? And how did you overcome roadblocks?
There have been points in my career where I felt excitement and simultaneous immobility towards achieving my goals. These moments are crucial because they represent a choice; there’s always an option to give up, to remain as-is. There are tangible ways to overcome this feeling when it happens:
- Connect with Others in Real-time: Talking with others in real-time about your and their experiences is a fantastic way to find positive motivation. Real-time conversations help avoid misinterpretations and can feel less isolating. Take time to listen to others; we all have something to learn from one another. Not only does it help you contextualize but it also serves to remind others you’re interested in specific work or projects (they or those they know may have a job for you in the future). The people you connect with may be those you can lean on in times of distress and those you can celebrate with when you succeed. They may also be the ones who get you your next job.
- Practice Gratitude: Shifting your mind towards that of gratitude can work wonders at helping you remember what you have already accomplished and what’s ahead. This type of joy can unlock solutions that otherwise may have seemed unavailable or unobvious. More practically, try writing down or reminding yourself of at least one thing you’re grateful for each day – even if that’s in a chat or email message to someone by genuinely thanking them. (reference: Gratitude Journal)
- Escape Learned Helplessness: Understand that there are psychological reasons for feeling helpless and that you can train yourself towards patterns of self-empowerment. Consider: there’s always some way you can improve your situation, even if it’s small. Find incremental improvements and make a routine of accomplishing them. (references: Learned Helplessness, Learned Optimism)
If you would start all over again in your career, what are 1-3 things you would do differently?
I wish I had learned how to engage with history and open-source earlier in my career. There’s a certain novelty and excitement that comes along with learning how to code; it can make anything seem possible. This creativity is important but not always practical. Leveraging open-source and keeping yourself open to the historical context of a problem space is crucial. It’s fairly likely that someone else has attempted solutions for the same challenge you face and that you can learn from their experiences (whether directly with code or indirectly by hearing their story). To me, this is one of humanity’s superpowers; we have the ability to contextualize our stories and successes together in order to make a better world.
Any closing words?
Many thanks go to Omdena for providing such fantastic opportunities for myself, other volunteers, and organizations making an impact around the world. I’m so grateful for a chance to participate and look forward to future experiences!