We are pleased to cordially invite junior Data Scientists to attend the First International Data Science Competition, from March 15 to June 1, 2019, organized by DataAI@SG and Global Vietnamese Data Scientists Facebook Groups, faculty members across universities in USA and Asia, industry Data Scientists, The International Society of Data Scientists/Scriptedin Corporation. The competition creates a playground for young Data Scientists to grow skills, knowledge, experience, and collaboration for their career path.
Competition Committee Composition ![]() Dr. Van Vu, Percey F. Smith Professor of Mathematics, Yale University, CT, USA & VinGroup Big Data Institute Director Dr. Christopher Do, President of the International Society of Data Scientists, CT, USA Dr. Nguyet Nguyen, Assistant Professor, Youngstown State University, OH, USA Dr. Linh Nguyen, Associate Professor, University of Idaho, ID, USA ![]() Dr. Nam Nguyen, Data Analytics Scientist, Schlumberger, TX, USA Mr. Tinh H. Nguyen, Faculty of Information Technology, Industrial University of Ho Chi Minh City, Vietnam. Dr. Hung Ta, Head of Mathematics and Computer Science, Hanoi University, Vietnam Mr. Cory Wang, Chief Operating Officer, ScriptedIn Inc., USA Dr. Vinh Nguyen, Senior Manager of Operations Research, Marriott International ![]() Dr. Tu-Anh Vu-Thanh, Dean of Fulbright School of Public Policy and Management, Fulbright University Vietnam, Vietnam Dr. Hien Nguyen, Assistant Professor, University of Houston, TX, USA
Dataset Committee ![]() Dr. Vinh Dang, Data Scientist, Vietnam (Dataset Team) Dr. Phu Vu, Data Scientist, Vietnam (Dataset Team)Mr. Yuan Xie, Data Scientist, DHL, China (Dataset Team) Mr. Vy Bui, Data Scientist, Vietnam (Dataset Team) ![]() Mr. Khanh Tran, Data Scientist, FPT Telecom, Vietnam (Dataset Team)
University Representatives ![]() Dr. Bao The Pham, Associate Professor, Saigon University (SGU) Dr. Bay Dinh Vo, Associate Professor, HUTECH University of Technology Dr. Loan T. T. Nguyen, International University, VNU-HCM Mr. Hien D. Nguyen, University of Information Technology, VNU-HCM
Industry Committee Mr. Thai Nguyen, Project Director at FPT USA Corp, WA, USA (*)
Prizes and Sponsorship Prizes include: First Place Second Place Third Place Winning Teams receive winner certificates with the names of all the committee members and organizers. There are sponsorships from the following sponsors: Vingroup VinID sponsors internship positions to winners (Ms. Linh Pham, Lead Technical Recruiter, 0972702047, Fulbright School of Public Policy and Management sponsors internships, RA in AI and Big Data Projects (Dr. Tu-Anh Vu-Thanh, Dean of Fulbright School of Public Policy and Management) FPT internships (Mr. Thai Nguyen, FPT USA) (*) The committee members, organizers, and ISODS/Scriptedin are not responsible for any award promised by any sponsor.
Problem & Datasets The data is collected from the Forex Market by Yuan Xie - a DHL Data Scientist. You are tasked to predict the future. Our hunting target is the most liquid (most traded) currency in the world -- EUR/USD, and fortunately you only need to prediction the Up or Down of this currency pair and we provide you with data containing information from 2008-01-01 to 2018-03-19, which was from www.dukascopy.com Historical Data Feed. Given historical currency performance and a lot of pricing features and most basic knowledge about market hours, can you predict the up and down of that day without being deceived by all the noise? And Forex market comes different from stock market for its unique global market. It may stay at a price for a while without a single trade for several minutes or even hours and then move dramatically as people starting to trade it more frequently. In our dataset, we collected 5-min Bid price of EUR/USD from 2008-01-01 to 2018-03-19, and each 5-min price comes with over 200 features containing pricing, volatility and volume information of different kinds. And to help you have a better understanding of the dataset and to just try out some experiments, we also provided a much smaller subset of data. The list of fields is below: Gmt time: timestamp, marked as the starting time of that 5 min period;
Evaluation Once the model can score a test data set, the result set should be submitted to the project/contest by selecting the Add a Submission button and upload the result. The result will be scored and ranked based on the accuracy of the test result, which is the number of correct classified instances divided by the total number of instances. Participants should predict labels using the public test dataset. Results can be submitted and scored unlimited times. The current ranking between submitters can be seen by clicking the Leaderboad button. Previous submission results can be seen by clicking the View Submissions button. The format of the result CSV should be 2 columns. The number of instances must be the same as that in the test dataset. The first column is the row number (1, 2, 3, etc.). The second column is the predicted label (like 1, 0). The first row is always the column headers (such as num and label). See the example below: num, label 1, 0 2, 0 3, 1 4 1 5 0
Rules The competition is from March 15 to June 1, 2019. Each team may have up to 3 members. It is possible that multiple teams are from the same university. Team members should be either current students of a university by the time joining the competition, or have graduated within 5 years. Each team may have at most 1 member from a different university. Compliance to the rules is checked only if a team wins. If you are the team lead, you need to create a team using the Create Team button, and invite other team members to your team using Invite Team button. Update your university information using the Update Info button. Each team may choose to invite a coach, who may be a data scientist, a university professor, etc. Each university may have a representative. However a team can participate without a university representative. By the time of the deadline, each team must (1) submit a CSV file of predicted labels and get scored and ranked, (2) share a report/article with details using Write Article button (code included), and (3) optionally a notebook using Share Notebook button. Currenly only Jupyter Python notebooks are supported on the platform. The committee determines the winners by running the final version of code submitted by each team before the deadline on a private test set outside the platform. Results will be announced once the test is completed. Winning teams are encouraged to submit a demo, which should include proofs for compliance to the competition rules. The demo will be recorded as a video and posted on Youtube. The winning teams should record the videos within 1 week after the competition finishes, and post the videos on ISODS Q&A forum. The demo videos are going to be included in a summary of the competitition. The selected language is Python or R; Participants use data up to time t-1 to predict up/down movement at time t. Code will be included and checked when the competition finishes. Please login the competition platform to download the datasets: https://www.scriptedin.com/contests/view/25
Disclaimers All competition participants are required to read the disclaimers
Questions and Answers Please join the Q&A to post questions, and get them answered, or to help answer others' questions. Social activities via DataAI@SG, Global Vietnamese Data Scientists Facebook Groups, International Data Science Competition Facebook Page.
Getting Involved How to get involved? Please join the Get Involved section in the Questions and Answers menu. This section is for ones who are interested in being team advisors; for companies which may contribute insights, advisors, and sponsorships (*) All the committee positions are voluntary. |