Just another study guide for “Databricks Certified Associate Developer for Apache Spark 3.0”

David Suarez
8 min readJun 17, 2021

I’ve recently cleared the exam for the “Databricks Certified Associate Developer for Apache Spark 3.0” certification.

In return to the community, I’ve decided to write this post sharing my own study guide. It’s not the first and it won’t be the last you’ll find, it’s just another study guide :)

In my case I went for the Python version of it. I guess the content can’t be too different for Scala since the Spark API syntax is practically the same regarding the kind of content included in this exam. Therefor, I think the content here can be applicable for both.

The first step to prepare for this exam is, obviously, checking the official description. But if you are reading this, you probably did it already. After that you can also go through this little “course” that shows what to expect from the actual exam and how to register for it.

The tips and opinions that you will find here are mainly targeted for people who already have some knowledge and experience working with Spark. If you are a total beginner, better start with some online courses to build some experience before you start preparing this exam.

Preparing for the exam

What book should I read?

In addition, Sections I, II, and IV of Spark: The Definitive Guide and Chapters 1–7 of Learning Spark should also be helpful in preparation.

If you are thinking about what book you should get… I personally got and read both. Is it needed? Definitely not. Let’s compare both, so you can make your own decision:

  • Spark: The Definitive Guide: From Section I and II, chapters 1–10 (also 11 for Scala Datasets), which are focused mainly about Spark architecture, DataFrames and SQL APIs. From Section III, chapter 14 which is about Accumulator and Broadcast Variables. From section IV I chapter 15 would be enough, since 16–19 are quite deep, and I don’t consider them necessary but… better safe than sorry! This book is Spark 2.4, so you will miss all the Spark 3.0 specifics that you’ll need to complement online.
  • Learning Spark (2nd Edition): Chapters 1–5 and 7 (also 6 for Scala Datasets) that basically includes the same content as Section I and II of the previous book but in less deep. And most important, chapter 12, which actually contains Spark 3.0 specifics such as Adaptive Query Execution and Dynamic Partition Pruning. In case of this book, you will mainly miss information about Accumulator, Broadcast Variables, and Garbage Collection. So again, you’ll need to complement it online.

In my personal opinion, I would recommend going for Spark: The Definitive Guide. This one contains way more details and in deep information that can also be useful in the future. Its negative point is the lack of Spark 3.0 specifics, but these can be easily found online.

Databricks Academy Courses

I personally completed the full Data Engineering Pathway myself. Is it needed? Absolutely not.

If you don’t know what courses I’m talking about. Please, have a look at this.

If you are new to Spark, you don’t have the fundamentals very clear, or you mainly use the Spark SQL API for your code, these courses are definitely a good starting point towards this certification.

As I mentioned, they are not necessary for the exam at all, but they can still help you during the learning process. So if you want some recommendations towards the exam here you have:

  • “Apache Spark Programming with Databricks” is a very nice to put your knowledge in practice with some hands on exercises.
  • From the “Optimizing Apache Spark on Databricks” course, the modules “Optimizing with Adaptive Query Execution and Dynamic Partition Pruning” and “Designing Clusters for High Performance” can be very nice for learning about the Spark 3.0 specifics, and how to configure clusters for specific scenarios (surprisingly I encountered two questions regarding this on my exam).

Other complementary resources I used

In this section I drop some links to concepts that I felt less prepared after reading the books, and that are always handy to quickly review right before the exam:

Practicing for the exam

Spark documentation must become your best friend

Besides the Spark fundamentals theory, this is by far the most important thing you need to do to success.

As you may know, a PDF copy of the Spark documentation will be given to you during the exam. This is not a coincidence, I’d say it’s almost impossible to complete this exam without looking at it. Many questions are related to DataFrame API, and in most of the cases the options are very similar to each other and will make you hesitate for sure.

If you go to the online documentation, all hyperlinks are enabled and CTRL+F is there to help you out. In the exam, none of this is available since you’ll get just a few plain PDFs. Be prepared to scroll as if there were no tomorrow.

Unfortunately, after a long search I were not able to find a replica PDF to practice. So I decided to use the following trick: Go to the most useful part of the documentation that includes the DataFrame API, right click, print page, and save as a PDF document. There you go! Your own PDF version of the documentation, and do you know what? It’s practically the same to what you get on the exam!

Once you have it, it’s time to get familiar with the PDF document itself. Memorize in what position of the scrollbar you can find sql.functions module, the DataFrameReader, DataFrameWriter and so forth. In this way, it will be way easier to find what you need during the exam, and still it will take time.

Practice Tests

Something that was super useful to me was practicing with some practice tests. In my case I went for these I found in Udemy.

They are not exam dumps (which are illegal btw), but the questions are very similar to what you will get on the real exam, and very handy to assess your knowledge and discover your weak points.

In the set of tests I shared, there are only two tests available and they are a bit similar to each other. If you go for this option, I will personally keep the second one for mocking the real test with the documentation next to you (once you feel ready), so you can learn better how to manage your time during the exam and design your own strategies.

Furthermore, in the section 4 of the little “course” you will find a few sample questions. Those follow the real question style of the exam.

The exam

Regarding the exam, it’s quite easy to book. Once again, follow the instructions in the little “course” and everything will be fine. You can basically book an exam at any time you like, morning, afternoon, evening… It doesn’t matter. You will have to install some software in you machine in order to be watched during the exam. It also requires to disable firewall, pop-up blockers, etc. So better use you personal computer.

If you expect to see someone during the exam, forget about it, they are on the shadows. Funny fact, after a few minutes they asked me to show my glasses to camera.

The screen layout

The exam layout is conformed of a big section on the left where you will read and answer the questions, and another one on the right side with the documentation on the top, and a notepad on the bottom (I personally didn’t use this at all).

For each question you have the option to mark them for review, so later you can go to the review tab and see an overview of the questions and select the one you want to review.

The size of your screen matters. Second screen is not allowed, so I decided to take the exam on my MacBook Pro using its trackpad. I think it was a mistake. If your screen is small, you’ll need to resize the documentation section all the time, and that implies wasting time. You’ll also need to scroll a lot, so using a mouse for better precision would make your life easier.

Strategies

I read on internet people saying that it’s better to answer the questions without the documentation, and then in the remaining time use it to check everything. I personally didn’t do that. Just as I mentioned before I think it’s quite hard to pick to correct answer without checking the documentation on syntax related questions (sometimes even with it).

In my particular case, I decided to go one by one, checking the documentation when needed, marking for review when I was not sure, and leaving questions that were taking longer for later. In the end I had around 30min left for reviewing the questions and I was done by the very last minute. Keeping in mind that I was struggling with the size of my screen and the trackpad, I’m sure you will be faster than I was.

So basically, pick or design the strategy that works better for you.

Other study guides

Because this is not the first (and won’t be the last) study guide for this exam. Here you have the guides I used to prepare myself:

The more you read the more prepared you’ll be.

Conclusions

After following this guide, being familiar with the Spark documentation, and passing the practice tests with more than 90%, I think you will be more than ready to pass the real exam.

I hope this detailed guide is useful to you as other guides were to me when I was preparing the exam. If you still have any questions, please feel free to contact me!

Good luck with your exam!

--

--