Ubah data saya agar bisa di cari algoritma dan rum...
Creado el: 31 de mayo de 2026
Respondido usando GPT-5.5 Thinking por Chat01
Creado el: 31 de mayo de 2026
Respondido usando GPT-5.5 Thinking por Chat01
Ubah data saya agar bisa di cari algoritma dan rumusnya untuk hasil selanjutnya dan buatkan scriptnya di google colab untuk covert data saya
pythonfrom pathlib import Path p=Path('/mnt/data/PRIZE.txt') text=p.read_text(encoding='utf-8', errors='replace') len(text), text[:500], text[-500:] (91799, '=== HASIL EKSTRAKSI DATA PRIZE ===\n\nTanggal: 10 Februari 2026 | Waktu: 21:30\n6112 0, 3, 4, 5, 7, 8, 9\n0236 1, 4, 5, 7, 8, 9\n7341 0, 2, 5, 6, 8, 9\n------------------------------\nTanggal: 11 Februari 2026 | Waktu: 03:30\n9876 0, 1, 2, 3, 4, 5\n2992 0, 1, 3, 4, 5, 6, 7, 8\n1685 0, 2, 3, 4, 7, 9\n------------------------------\nTanggal: 11 Februari 2026 | Waktu: 06:30\n2279 0, 1, 3, 4, 5, 6, 8\n9443 0, 1, 2, 5, 6, 7, 8\n1208 3, 4, 5, 6, 7, 9\n------------------------------\nTanggal: 11 Februari 2026 ', '| Waktu: 17:30\n9673 0, 1, 2, 4, 5, 8\n2613 0, 4, 5, 7, 8, 9\n8471 0, 2, 3, 5, 6, 9\n------------------------------\nTanggal: 30 Mei 2026 | Waktu: 21:30\n6334 0, 1, 2, 5, 7, 8, 9\n1059 2, 3, 4, 6, 7, 8\n7596 0, 1, 2, 3, 4, 8\n------------------------------\nTanggal: 31 Mei 2026 | Waktu: 03:30\n0991 2, 3, 4, 5, 6, 7, 8\n4420 1, 3, 5, 6, 7, 8, 9\n8025 1, 3, 4, 6, 7, 9\n------------------------------\nTanggal: 31 Mei 2026 | Waktu: 06:30\n2483 0, 1, 5, 6, 7, 9\n0259 1, 3, 4, 6, 7, 8\n5102 3, 4, 6, 7, 8, 9')
pythonimport re, json, math, statistics from pathlib import Path import pandas as pd from datetime import datetime text = Path('/mnt/data/PRIZE.txt').read_text(encoding='utf-8', errors='replace') bulan_map = { "Januari":1, "Februari":2, "Maret":3, "April":4, "Mei":5, "Juni":6, "Juli":7, "Agustus":8, "September":9, "Oktober":10, "November":11, "Desember":12 } records = [] current_date_str = None current_time_str = None current_dt = None slot = 0 draw_id = -1 date_re = re.compile(r"Tanggal:\s*(\d{1,2})\s+([A-Za-z]+)\s+(\d{4})\s*\|\s*Waktu:\s*(\d{2}:\d{2})") row_re = re.compile(r"^\s*(\d{4})\s+((?:\d\s*,\s*)*\d)\s*$") for line in text.splitlines(): line = line.strip() dm = date_re.search(line) if dm: day, month_name, year, time_str = dm.groups() month = bulan_map.get(month_name) if month is None: raise ValueError(f"Unknown month {month_name}") current_dt = datetime(int(year), month, int(day), int(time_str[:2]), int(time_str[3:])) current_date_str = f"{int(day):02d}-{month:02d}-{year}" current_time_str = time_str slot = 0 draw_id += 1 continue rm = row_re.match(line) if rm and current_dt is not None: result, missing_text = rm.groups() slot += 1 digits = [int(ch) for ch in result] present_set = set(digits) missing_list = [int(x.strip()) for x in missing_text.split(',') if x.strip()!=''] computed_missing = [d for d in range(10) if d not in present_set] records.append({ "draw_id": draw_id, "datetime": current_dt.isoformat(sep=' '), "tanggal": current_date_str, "waktu": current_time_str, "slot_dalam_jam": slot, "result_4d": result, "d1": digits[0], "d2": digits[1], "d3": digits[2], "d4": digits[3], "missing_digits_sumber": ",".join(map(str, missing_list)), "missing_digits_hitung": ",".join(map(str, computed_missing)), "valid_missing_digits": missing_list == computed_missing, "sum_digits": sum(digits), "digital_root": 0 if sum(digits)==0 else 1 + (sum(digits)-1) % 9, "even_count": sum(d % 2 == 0 for d in digits), "odd_count": sum(d % 2 == 1 for d in digits), "low_0_4_count": sum(d <= 4 for d in digits), "high_5_9_count": sum(d >= 5 for d in digits), "unique_digit_count": len(present_set), "repeat_count": 4 - len(present_set), "is_double": any(digits.count(d) == 2 for d in present_set), "is_triple": any(digits.count(d) == 3 for d in present_set), "is_quad": any(digits.count(d) == 4 for d in present_set), }) df = pd.DataFrame(records) len(df), df.head(), df.tail(), df['valid_missing_digits'].value_counts(dropna=False).to_dict() (1953, draw_id datetime tanggal ... is_double is_triple is_quad 0 0 2026-02-10 21:30:00 10-02-2026 ... True False False 1 0 2026-02-10 21:30:00 10-02-2026 ... False False False 2 0 2026-02-10 21:30:00 10-02-2026 ... False False False 3 1 2026-02-11 03:30:00 11-02-2026 ... False False False 4 1 2026-02-11 03:30:00 11-02-2026 ... True False False [5 rows x 24 columns], draw_id datetime tanggal ... is_double is_triple is_quad 1948 649 2026-05-31 03:30:00 31-05-2026 ... True False False 1949 649 2026-05-31 03:30:00 31-05-2026 ... False False False 1950 650 2026-05-31 06:30:00 31-05-2026 ... False False False 1951 650 2026-05-31 06:30:00 31-05-2026 ... False False False 1952 650 2026-05-31 06:30:00 31-05-2026 ... False False False [5 rows x 24 columns], {True: 1951, False: 2})
pythoninvalid=df[~df.valid_missing_digits] invalid[['datetime','slot_dalam_jam','result_4d','missing_digits_sumber','missing_digits_hitung']].to_string(index=False) ' datetime slot_dalam_jam result_4d missing_digits_sumber missing_digits_hitung\n2026-05-16 06:30:00 2 5993 0,1,2,4,5,6,7,8 0,1,2,4,6,7,8\n2026-05-18 11:30:00 3 7089 1,2,3,4,5,6,7 1,2,3,4,5,6'
pythonimport re, json, math, os from pathlib import Path import pandas as pd from datetime import datetime INPUT_PATH = Path('/mnt/data/PRIZE.txt') OUTPUT_LONG = Path('/mnt/data/PRIZE_clean_long.csv') OUTPUT_WIDE = Path('/mnt/data/PRIZE_draw_wide.csv') OUTPUT_SCRIPT = Path('/mnt/data/convert_prize_colab.py') OUTPUT_NB = Path('/mnt/data/convert_prize_colab.ipynb') OUTPUT_VALIDATION = Path('/mnt/data/PRIZE_validation_issues.csv') text = INPUT_PATH.read_text(encoding='utf-8', errors='replace') bulan_map = { "Januari":1, "Februari":2, "Maret":3, "April":4, "Mei":5, "Juni":6, "Juli":7, "Agustus":8, "September":9, "Oktober":10, "November":11, "Desember":12 } def parse_prize_text(text: str) -> pd.DataFrame: date_re = re.compile(r"Tanggal:\s*(\d{1,2})\s+([A-Za-z]+)\s+(\d{4})\s*\|\s*Waktu:\s*(\d{2}:\d{2})") row_re = re.compile(r"^\s*(\d{4})\s+((?:\d\s*,\s*)*\d)\s*$") current_dt = None slot = 0 draw_id = -1 rows = [] for line_no, line in enumerate(text.splitlines(), start=1): line = line.strip() dm = date_re.search(line) if dm: day, month_name, year, time_str = dm.groups() month = bulan_map[month_name] current_dt = datetime(int(year), month, int(day), int(time_str[:2]), int(time_str[3:])) slot = 0 draw_id += 1 continue rm = row_re.match(line) if rm and current_dt is not None: result, missing_text = rm.groups() slot += 1 digits = [int(ch) for ch in result] present = sorted(set(digits)) missing_source = [int(x.strip()) for x in missing_text.split(',') if x.strip()] missing_calc = [d for d in range(10) if d not in present] row = { "source_line": line_no, "draw_id": draw_id, "datetime": current_dt.strftime("%Y-%m-%d %H:%M:%S"), "tanggal": current_dt.strftime("%Y-%m-%d"), "waktu": current_dt.strftime("%H:%M"), "slot_dalam_jam": slot, "result_4d": result, "d1": digits[0], "d2": digits[1], "d3": digits[2], "d4": digits[3], "present_digits": ",".join(map(str, present)), "missing_digits_sumber": ",".join(map(str, missing_source)), "missing_digits_hitung": ",".join(map(str, missing_calc)), "valid_missing_digits": missing_source == missing_calc, "sum_digits": sum(digits), "digital_root": 0 if sum(digits) == 0 else 1 + (sum(digits)-1) % 9, "even_count": sum(d % 2 == 0 for d in digits), "odd_count": sum(d % 2 == 1 for d in digits), "low_0_4_count": sum(d <= 4 for d in digits), "high_5_9_count": sum(d >= 5 for d in digits), "unique_digit_count": len(present), "repeat_count": 4 - len(present), "is_double": any(digits.count(d) == 2 for d in present), "is_triple": any(digits.count(d) == 3 for d in present), "is_quad": any(digits.count(d) == 4 for d in present), } for d in range(10): row[f"present_{d}"] = int(d in present) row[f"missing_{d}"] = int(d in missing_calc) rows.append(row) df = pd.DataFrame(rows) if not df.empty: df["datetime_dt"] = pd.to_datetime(df["datetime"]) df = df.sort_values(["datetime_dt", "slot_dalam_jam"]).reset_index(drop=True) # Target "hasil selanjutnya" secara global dan per slot df["next_result_global"] = df["result_4d"].shift(-1) df["next_datetime_global"] = df["datetime"].shift(-1) for col in ["result_4d", "d1", "d2", "d3", "d4", "missing_digits_hitung"]: df[f"next_same_slot_{col}"] = df.groupby("slot_dalam_jam")[col].shift(-1) df = df.drop(columns=["datetime_dt"]) return df df = parse_prize_text(text) df.to_csv(OUTPUT_LONG, index=False, encoding='utf-8-sig') # Validation issues issues = df[~df["valid_missing_digits"]].copy() issues.to_csv(OUTPUT_VALIDATION, index=False, encoding='utf-8-sig') # Wide per draw wide = df.pivot(index=["draw_id", "datetime", "tanggal", "waktu"], columns="slot_dalam_jam", values="result_4d").reset_index() wide.columns = [f"result_slot_{c}" if isinstance(c, int) else c for c in wide.columns] for slot in [1,2,3]: col = f"result_slot_{slot}" if col in wide.columns: wide[f"next_{col}"] = wide[col].shift(-1) wide.to_csv(OUTPUT_WIDE, index=False, encoding='utf-8-sig') len(df), df.head().to_dict('records')[:1], issues[['datetime','slot_dalam_jam','result_4d','missing_digits_sumber','missing_digits_hitung']].to_dict('records'), wide.head() (1953, [{'source_line': 4, 'draw_id': 0, 'datetime': '2026-02-10 21:30:00', 'tanggal': '2026-02-10', 'waktu': '21:30', 'slot_dalam_jam': 1, 'result_4d': '6112', 'd1': 6, 'd2': 1, 'd3': 1, 'd4': 2, 'present_digits': '1,2,6', 'missing_digits_sumber': '0,3,4,5,7,8,9', 'missing_digits_hitung': '0,3,4,5,7,8,9', 'valid_missing_digits': True, 'sum_digits': 10, 'digital_root': 1, 'even_count': 2, 'odd_count': 2, 'low_0_4_count': 3, 'high_5_9_count': 1, 'unique_digit_count': 3, 'repeat_count': 1, 'is_double': True, 'is_triple': False, 'is_quad': False, 'present_0': 0, 'missing_0': 1, 'present_1': 1, 'missing_1': 0, 'present_2': 1, 'missing_2': 0, 'present_3': 0, 'missing_3': 1, 'present_4': 0, 'missing_4': 1, 'present_5': 0, 'missing_5': 1, 'present_6': 1, 'missing_6': 0, 'present_7': 0, 'missing_7': 1, 'present_8': 0, 'missing_8': 1, 'present_9': 0, 'missing_9': 1, 'next_result_global': '0236', 'next_datetime_global': '2026-02-10 21:30:00', 'next_same_slot_result_4d': '9876', 'next_same_slot_d1': 9.0, 'next_same_slot_d2': 8.0, 'next_same_slot_d3': 7.0, 'next_same_slot_d4': 6.0, 'next_same_slot_missing_digits_hitung': '0,1,2,3,4,5'}], [{'datetime': '2026-05-16 06:30:00', 'slot_dalam_jam': 2, 'result_4d': '5993', 'missing_digits_sumber': '0,1,2,4,5,6,7,8', 'missing_digits_hitung': '0,1,2,4,6,7,8'}, {'datetime': '2026-05-18 11:30:00', 'slot_dalam_jam': 3, 'result_4d': '7089', 'missing_digits_sumber': '1,2,3,4,5,6,7', 'missing_digits_hitung': '1,2,3,4,5,6'}], draw_id datetime ... next_result_slot_2 next_result_slot_3 0 0 2026-02-10 21:30:00 ... 2992 1685 1 1 2026-02-11 03:30:00 ... 9443 1208 2 2 2026-02-11 06:30:00 ... 8459 9350 3 3 2026-02-11 11:30:00 ... 6407 7690 4 4 2026-02-11 13:30:00 ... 3041 1575 [5 rows x 10 columns])
pythoncolab_code = r'''# ============================================================ # GOOGLE COLAB SCRIPT - Convert PRIZE.txt menjadi dataset analisis # ============================================================ # Cara pakai: # 1) Jalankan cell ini di Google Colab # 2) Upload file PRIZE.txt saat diminta # 3) Output: PRIZE_clean_long.csv, PRIZE_draw_wide.csv, PRIZE_validation_issues.csv # # Catatan: # - Script ini membuat data siap dianalisis algoritma/statistik/ML. # - Data angka undian/acak tidak punya rumus pasti untuk menebak hasil berikutnya. # Bagian model di bawah hanya baseline probabilitas, bukan jaminan prediksi. # ============================================================ import re from pathlib import Path from datetime import datetime from itertools import product import pandas as pd import numpy as np # ------------------------------------------------------------ # 1. Upload file TXT # ------------------------------------------------------------ try: from google.colab import files uploaded = files.upload() INPUT_FILE = next(iter(uploaded.keys())) except Exception: # Untuk dijalankan di lokal/Jupyter biasa INPUT_FILE = "PRIZE.txt" # ------------------------------------------------------------ # 2. Parser data mentah # ------------------------------------------------------------ BULAN_MAP = { "Januari": 1, "Februari": 2, "Maret": 3, "April": 4, "Mei": 5, "Juni": 6, "Juli": 7, "Agustus": 8, "September": 9, "Oktober": 10, "November": 11, "Desember": 12 } def parse_prize_text(text: str) -> pd.DataFrame: """ Mengubah format: Tanggal: 10 Februari 2026 | Waktu: 21:30 6112 0, 3, 4, 5, 7, 8, 9 Menjadi tabel: datetime, result_4d, d1-d4, present_digits, missing_digits, fitur statistik, target next. """ date_re = re.compile(r"Tanggal:\s*(\d{1,2})\s+([A-Za-z]+)\s+(\d{4})\s*\|\s*Waktu:\s*(\d{2}:\d{2})") row_re = re.compile(r"^\s*(\d{4})\s+((?:\d\s*,\s*)*\d)\s*$") rows = [] current_dt = None draw_id = -1 slot = 0 for line_no, line in enumerate(text.splitlines(), start=1): line = line.strip() date_match = date_re.search(line) if date_match: day, month_name, year, time_str = date_match.groups() month = BULAN_MAP[month_name] current_dt = datetime( int(year), month, int(day), int(time_str[:2]), int(time_str[3:]) ) draw_id += 1 slot = 0 continue row_match = row_re.match(line) if row_match and current_dt is not None: result_4d, missing_text = row_match.groups() slot += 1 digits = [int(ch) for ch in result_4d] present = sorted(set(digits)) missing_source = [int(x.strip()) for x in missing_text.split(",") if x.strip() != ""] missing_calc = [d for d in range(10) if d not in present] row = { "source_line": line_no, "draw_id": draw_id, "datetime": current_dt.strftime("%Y-%m-%d %H:%M:%S"), "tanggal": current_dt.strftime("%Y-%m-%d"), "waktu": current_dt.strftime("%H:%M"), "slot_dalam_jam": slot, # Hasil utama "result_4d": result_4d, "d1": digits[0], "d2": digits[1], "d3": digits[2], "d4": digits[3], # Rumus dasar: # present_digits = digit unik yang muncul pada result_4d # missing_digits = {0,1,2,3,4,5,6,7,8,9} - present_digits "present_digits": ",".join(map(str, present)), "missing_digits_sumber": ",".join(map(str, missing_source)), "missing_digits_hitung": ",".join(map(str, missing_calc)), "valid_missing_digits": missing_source == missing_calc, # Fitur statistik sederhana "sum_digits": sum(digits), "digital_root": 0 if sum(digits) == 0 else 1 + (sum(digits) - 1) % 9, "even_count": sum(d % 2 == 0 for d in digits), "odd_count": sum(d % 2 == 1 for d in digits), "low_0_4_count": sum(d <= 4 for d in digits), "high_5_9_count": sum(d >= 5 for d in digits), "unique_digit_count": len(present), "repeat_count": 4 - len(present), "is_double": any(digits.count(d) == 2 for d in present), "is_triple": any(digits.count(d) == 3 for d in present), "is_quad": any(digits.count(d) == 4 for d in present), } # One-hot digit muncul/tidak muncul for d in range(10): row[f"present_{d}"] = int(d in present) row[f"missing_{d}"] = int(d in missing_calc) rows.append(row) df = pd.DataFrame(rows) if df.empty: raise ValueError("Data tidak terbaca. Pastikan format file sama seperti PRIZE.txt.") df["datetime_dt"] = pd.to_datetime(df["datetime"]) df = df.sort_values(["datetime_dt", "slot_dalam_jam"]).reset_index(drop=True) # Target untuk analisis hasil selanjutnya # next_result_global = baris berikutnya secara urutan waktu df["next_result_global"] = df["result_4d"].shift(-1) df["next_datetime_global"] = df["datetime"].shift(-1) # next_same_slot_* = hasil berikutnya untuk slot yang sama # Contoh slot 1 jam sekarang dibanding slot 1 jam berikutnya for col in ["result_4d", "d1", "d2", "d3", "d4", "missing_digits_hitung"]: df[f"next_same_slot_{col}"] = df.groupby("slot_dalam_jam")[col].shift(-1) df = df.drop(columns=["datetime_dt"]) return df raw_text = Path(INPUT_FILE).read_text(encoding="utf-8", errors="replace") df = parse_prize_text(raw_text) print("Jumlah baris hasil:", len(df)) print("Jumlah waktu/draw:", df["draw_id"].nunique()) print("Jumlah valid missing digit:", int(df["valid_missing_digits"].sum())) print("Jumlah invalid missing digit:", int((~df["valid_missing_digits"]).sum())) display(df.head()) # ------------------------------------------------------------ # 3. Simpan dataset long dan wide # ------------------------------------------------------------ OUT_LONG = "PRIZE_clean_long.csv" OUT_WIDE = "PRIZE_draw_wide.csv" OUT_ISSUES = "PRIZE_validation_issues.csv" df.to_csv(OUT_LONG, index=False, encoding="utf-8-sig") issues = df[~df["valid_missing_digits"]].copy() issues.to_csv(OUT_ISSUES, index=False, encoding="utf-8-sig") wide = df.pivot( index=["draw_id", "datetime", "tanggal", "waktu"], columns="slot_dalam_jam", values="result_4d" ).reset_index() wide.columns = [ f"result_slot_{c}" if isinstance(c, int) else c for c in wide.columns ] for slot_no in [1, 2, 3]: col = f"result_slot_{slot_no}" if col in wide.columns: wide[f"next_{col}"] = wide[col].shift(-1) wide.to_csv(OUT_WIDE, index=False, encoding="utf-8-sig") print("File tersimpan:") print("-", OUT_LONG) print("-", OUT_WIDE) print("-", OUT_ISSUES) # ------------------------------------------------------------ # 4. Rumus/algoritma baseline untuk analisis hasil berikutnya # ------------------------------------------------------------ # Rumus probabilitas posisi: # P(digit = x pada posisi p) = jumlah kemunculan x di posisi p / jumlah data # # Rumus transisi posisi: # P(next_digit = y | current_digit = x) # = jumlah transisi x -> y / jumlah semua transisi dari x # # Skor sederhana: # score(y) = alpha * P_posisi(y) + beta * P_transisi(y | digit_terakhir) # Default alpha=0.60 dan beta=0.40 def score_next_digits_by_slot(data: pd.DataFrame, slot: int = 1, alpha: float = 0.60, beta: float = 0.40): """ Menghasilkan skor digit 0-9 untuk setiap posisi d1-d4 pada slot tertentu. Output bukan prediksi pasti; hanya ranking probabilitas historis sederhana. """ hist = data[data["slot_dalam_jam"] == slot].copy() hist = hist.sort_values(["datetime", "slot_dalam_jam"]).reset_index(drop=True) if hist.empty: raise ValueError(f"Tidak ada data untuk slot {slot}") result = {} for pos in ["d1", "d2", "d3", "d4"]: freq = hist[pos].value_counts(normalize=True).reindex(range(10), fill_value=0.0) # transition: digit saat ini -> digit berikutnya pada slot yang sama prev_digit = hist[pos].shift(1) curr_digit = hist[pos] trans_counts = pd.crosstab(prev_digit, curr_digit) last_digit = int(hist.iloc[-1][pos]) if last_digit in trans_counts.index: trans = trans_counts.loc[last_digit] trans = (trans / trans.sum()).reindex(range(10), fill_value=0.0) else: trans = pd.Series(0.0, index=range(10)) score = alpha * freq + beta * trans result[pos] = score.sort_values(ascending=False) return result def make_candidate_numbers(scored_digits: dict, top_k_per_pos: int = 3, max_candidates: int = 20) -> pd.DataFrame: """ Membuat kombinasi kandidat dari top digit per posisi berdasarkan skor. """ top_digits = { pos: list(scores.head(top_k_per_pos).index) for pos, scores in scored_digits.items() } rows = [] for combo in product(top_digits["d1"], top_digits["d2"], top_digits["d3"], top_digits["d4"]): number = "".join(map(str, combo)) score = sum(scored_digits[pos].loc[d] for pos, d in zip(["d1", "d2", "d3", "d4"], combo)) rows.append({"candidate_4d": number, "score": float(score)}) return pd.DataFrame(rows).sort_values("score", ascending=False).head(max_candidates).reset_index(drop=True) # Contoh penggunaan: ranking kandidat untuk slot 1, 2, 3 for slot_no in sorted(df["slot_dalam_jam"].unique()): scored = score_next_digits_by_slot(df, slot=slot_no) candidates = make_candidate_numbers(scored, top_k_per_pos=3, max_candidates=10) print(f"\nTop kandidat baseline untuk slot {slot_no}:") display(candidates) # ------------------------------------------------------------ # 5. Download otomatis di Colab # ------------------------------------------------------------ try: from google.colab import files files.download(OUT_LONG) files.download(OUT_WIDE) files.download(OUT_ISSUES) except Exception: pass ''' OUTPUT_SCRIPT.write_text(colab_code, encoding='utf-8') # Create notebook manually import json, textwrap notebook = { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convert PRIZE.txt menjadi dataset analisis\n", "\n", "Notebook ini mengubah file `PRIZE.txt` menjadi CSV long/wide, membuat fitur digit, validasi missing digits, target hasil berikutnya, dan baseline skor probabilitas sederhana.\n", "\n", "> Catatan: data angka acak/undian tidak memiliki rumus pasti untuk menebak hasil berikutnya. Baseline di notebook ini hanya untuk analisis pola historis." ], }, { "cell_type": "code", "execution_count": None, "metadata": {}, "outputs": [], "source": colab_code.splitlines(True), }, ], "metadata": { "colab": {"provenance": []}, "kernelspec": {"display_name": "Python 3", "name": "python3"}, "language_info": {"name": "python", "version": "3.x"}, }, "nbformat": 4, "nbformat_minor": 5, } OUTPUT_NB.write_text(json.dumps(notebook, ensure_ascii=False, indent=2), encoding='utf-8') # verify files exist and size for path in [OUTPUT_LONG, OUTPUT_WIDE, OUTPUT_VALIDATION, OUTPUT_SCRIPT, OUTPUT_NB]: print(path, path.stat().st_size) print(df.shape, wide.shape, issues.shape)
pythonimport zipfile ZIP = Path('/mnt/data/prize_converted_package.zip') with zipfile.ZipFile(ZIP, 'w', zipfile.ZIP_DEFLATED) as z: for path in [OUTPUT_LONG, OUTPUT_WIDE, OUTPUT_VALIDATION, OUTPUT_SCRIPT, OUTPUT_NB]: z.write(path, arcname=path.name) ZIP.stat().st_size 97825
Sudah saya ubah data PRIZE.txt menjadi format tabel yang siap dipakai untuk pencarian pola, rumus, statistik, atau model baseline di Google Colab. Struktur file asli berisi tanggal/waktu, 3 angka 4 digit per waktu, dan daftar digit yang tidak muncul pada angka tersebut. (PRIZE.txt)
Hasil konversi:
PRIZE_clean_long.csv
Dataset utama: 1.953 baris × 54 kolom. Isinya sudah ada tanggal, waktu, result_4d, d1-d4, digit yang muncul, digit yang hilang, fitur statistik, dan target next_result.
[PRIZE_draw_wide.csv]
Format per waktu/draw: 651 baris, dengan result_slot_1, result_slot_2, result_slot_3, dan hasil slot berikutnya.
[PRIZE_validation_issues.csv]
Berisi 2 baris yang daftar digit kanannya tidak cocok dengan hasil hitungan otomatis.
[convert_prize_colab.py]
Script Python lengkap untuk Google Colab.
convert_prize_colab.ipynb
Notebook Google Colab siap upload dan jalankan.
prize_converted_package.zip
Semua file di atas dalam satu paket ZIP.
Rumus dasar yang saya pakai:
textpresent_digits = digit unik yang muncul di result_4d missing_digits = {0,1,2,3,4,5,6,7,8,9} - present_digits
Untuk baseline algoritma hasil selanjutnya, script Colab juga menyiapkan rumus probabilitas sederhana:
textP(digit = x pada posisi p) = jumlah kemunculan x di posisi p / total data P(next_digit = y | current_digit = x) = jumlah transisi x → y / jumlah semua transisi dari x score(y) = 0.60 × P_posisi(y) + 0.40 × P_transisi(y | digit_terakhir)
Catatan penting: kalau data ini berasal dari hasil acak/undian, tidak ada rumus pasti yang bisa menjamin hasil selanjutnya. Script ini membuat data Anda rapi dan siap dianalisis, lalu memberi baseline probabilitas historis untuk riset pola.