RTICs are great for the emerging economy. For the foundational economy, where there is more likely to be an appropriate SIC, an issue remains. A company can choose the wrong SIC code, or the SIC code they’ve selected does not match their activities. We fixed this issue. Real-Time Standard Industrial Classifications (RSICs) use machine learning and a company’s website text to better classify company’s activities. For example, Veolia is a large waste management and recycling company. Because they are large, they report activities of head offices. A better description of their activities is provided through RSICs (and RTICs): collection of non-hazardous waste and treatment and disposal of non-hazardous waste.Documentation Index
Fetch the complete documentation index at: https://docs.thedatacity.com/llms.txt
Use this file to discover all available pages before exploring further.

- Fill gaps where SIC codes are missing
- Correct inaccuracies in existing SIC codes
- Add granularity where SIC codes are vague
Data Quality
To ensure quality we focus on methodological integrity: Trust in the methodology Primarily, we’ve built trust in our RSICs within the RSIC methodology itself. We do this through three distinct layers:- Evidence, not prediction: We treat classification as an evidence problem, not a prediction problem. RSICs are not arbitrary predictions. Instead, we evaluate the empirical likelihood of a classification based on the company’s website text, and what we uniquely understand about companies in each sector. If the data doesn’t support the code, we don’t assign it.
- Coherence Filtering: This validation layer which rejects codes that lack alignment with the company’s specific niche. This allows us to distinguish between a company mentioning a topic and actually doing it. We identify this distinction, and we classify appropriately.
- Specificity: We also penalise generic classifications. Broad, “catch-all” codes are rarely useful for decision-making, so we deprioritise them in favour of precise definitions.