1. Title of Database: Machine Learning based ZZAlpha Stock Recommendations 2. Sources: (a) Original owners of data: ZZAlpha Ltd., 4729 E. Sunrise #109, Tucson AZ 85718 USA, info '@' (b) Donor of database: Kevin Pratt, Chief Scientist, ZZAlpha Ltd. on behalf of ZZAlpha Ltd. (c) Date received: 6-Jun-2015 3. Past Usage: (a) Pratt, Kevin. Proof Protocol for a Machine Learning Technique Making Longitudinal Predictions in Dynamic Contexts. (ACM KDD 2015) (b) Attribute predicted: for each stock in each portfolio, opening price change over 5 trading days (c) Indication of study's results. Significant predictions in multiple portfolios. See publications. 4. Relevant Information Paragraph: The files were zipped (on a Windows machine) by year. The data here are the ZZAlpha machine learning recommendations made for various US traded stock portfolios the morning of each day during the 3 year period Jan 1, 2012 - Dec 31, 2014. They are deposited here in .txt form for easy accessibility (and as a convenience to users, have had results included for each recommendation). A .pdf version of the original recommendations file was certified by Digistamp (.p7s) at time of creation each day for stringent auditability. Please contact info '@' if you desire to purchase the set of the certified .pdf, .p7s file pairs. The certified set contains only the recommendations, not the results included in the deposited set here. For convenience, the data deposited includes calculated returns (outcomes) for each recommended transaction. The returns obviously were not part of the original recommendations when made, but were appended later. Returns were calculated several days after sale day. The date inside the file reflects the date for which morning trading recommendation was made. The evaluation of the recommendations, as described in the KDD 2015 article mentioned above, involved comparison of the opening price of the day of recommendation to the opening price five market days later. As mentioned in the article, evaluation must be adjusted by trading costs and constraints. A recommendation portfolio consists of a segment, a size (1,2,5,10,20), and a side (Long or Short), and the ticker symbols of the companies recommended for price increase (or decrease in the case of Short) from the opening price to the opening price 5 trading days later. The stocks included must first pass a general screen of $3 recent price and 80,000 recent daily average share volume for inclusion. Thus penny stocks and micro-caps are not present and even some large cap, but very low prices stocks are omitted. All stocks must be traded on NYSE, NASDAQ or AMEX at the time of recommendation. The daily file contains all recommendations for all portfolios for the day. Both long and short recommendations are included. Long entries have duplicates. These are the portfolios: (Note other portfolios limited to ETFs (exchange traded funds) may be listed each day. Due to data issues those are incomplete across the time period.) 5. Number of Instances: 755 market days, 41 portfolios, 5 sizes of portfolios, Long and Short 6. Number of Attributes The data set submitted does not include attributes used for prediction. This data is provided as a benchmark of machine learning results in a longitudinal 3 yr period. Here is a sample content of a line in the files: Jan 04 2005_006 Big_100_5_LONG_SHORT_F.pdf, L, AA 0.959 =25.97/27.09, AMAT 0.950 =14.70/15.46, EBAY 0.930 =53.33/57.31, PFE 0.995 =19.84/19.95, UPS 0.980 =71.72/73.16, Avg of 5 = 0.963 The above indicates recommendations were made before market open on Jan 4, 2005. This portfolio was limited to the biggest 100 cap stocks and was of size 5. It was for 'L' or long recommendations. The five stocks recommended are shown by ticker, result, price at sale divided by price at purchase. The average for the five is shown. Note: The user of this data set must implement its own parser of these files. The contributor does NOT provide one. Note: The prices used are adjusted prices based on data when results were calculated. Back-calculation of the adjusted prices using newer data may give different prices, but the ratios will remain the same (+/- rounding errors). 7. For Each Attribute: Not applicable 8. Missing Attribute Values: On some days for a few portfolios, results may be missing. These are tagged as 'missing'. 9. Class Distribution: Distribution of positive and negative outcomes vary by portfolio, size, and day.

Related datasets