I recently wrote a children’s book called D is for Data: The ABCs of Data Analytics. This is the second in a series of behind-the-scenes, companion articles that will dive a little deeper into each term. We’ll explore the illustration used to define the term, how the word is used in the data world, and other interesting (to me) trivia.

One of my strategies for understanding words is to break them down into their component parts. In this case, broad and cast. To cast is to toss or scatter. You can cast a fishing line into water, you can cast pearls before swine, or you can cast seed in your field. Broad, in this case, means to cast all around evenly or broadly. I grew up on a farm and we had a broadcast spreader that we would pull behind the tractor and it would toss fertilizer all over everything—don’t get too close! Now, I could have used a TV or radio broadcast as an illustration, but those things are invisible. I liked the seed illustration because you can see the seeds going in every direction.
Broadcasting has a couple of uses in the data world. One usage is a broadcast join. A broadcast join is used when you join a small list to a big list. For example, if you have a small list of stores and their addresses along with a big list of every item those stores sold in the last 2 years, you would use a broadcast join. The system would break up the sales into multiple files and give each subset to a different computer. Then, you would broadcast the store addresses to each computer. They would join the store address with the sales and then shuffle the data back to one computer which would compile all of the results. You might do this to find the total sales in each state, or region for a particular item or category of items.
The other usage of broadcast is much more like radio and TV. Many IoT technologies like RFID and Bluetooth LE broadcast when they get close to a sensor. This is used to track inventory and to quickly check that shelves are stocked properly and that a store hasn’t run out of a particular size or color of your favorite shirt. One of the side effects of the way that these devices broadcast their signal is that the signal can get picked up by multiple devices at once. This can lead to confusion about where the item really is. So, you may have to do some data analysis to check the signal strength and compare it to some reference points to triangulate the location. When you triangulate a location, you’re doing data analysis!






Leave a comment