People counting have been widely used in life, including public transportations such as train, airplane, and others. Service operators usually count the amount of passengers manually using a hand counter. Nowadays, in an era that most of human-things are digital, this method is certainly consuming enough time and energy. Therefore, this research is proposed so the service operator doesn't have to count manually with a hand counter, but using an image processing with You Only Look Once (YOLO) method. This project is expected that people counting is no longer done manually, but already based on computer vision. This Final Project uses YOLOv4 that is the latest method in detecting untill 80 classes of object. Then it will use transfer learning as well to change the number of classes to 1 class. This research was done by using Python programming language with various platforms. This research also used three training data scenarios and two testing data scenarios. Parameters measured are accuration, precision, recall, F1 score, Intersection of Union (IoU), and mean Average Precision (mAP). The best configurations used are learning rate 0.001, random value 0, and sub divisions 32. And the best accuration for this system is 69% with the datasets that has been trained before. The pre-trained weights have 72.68% of accuracy, 77% precision, and 62.88% average IoU. This research has resulted a proper performance for detecting and counting people on public transportations.